Source Control in MSAI-DataPipeline AzureDataFactory
Background
By default, the Azure Data Factory user interface experience (UX) authors directly against the data factory service. This experience has the following limitations:
- The Data Factory service doesn't include a repository for storing the JSON entities for your changes. The only way to save changes is via the Publish All button and all changes are published directly to the data factory service.
- The Data Factory service isn't optimized for collaboration and version control.
- The Azure Resource Manager template required to deploy Data Factory itself is not included.
MSAI-DataPipeline is the critical AzureDataFactory resource for our MSAI Data Platform. It hosts the critical pipelines schedulers and data management.
Prerequisites
Please contact MSAI Data Platform Fundamental DRI if you have any questions about the prerequisites.
- Torus Account
- Join the "msaidataplat" eligibility on OSP portal.
- You have permission to access the msai-datapipeline-adf repository.
How to Update the AzureDataFactory
If you need to update the resources (pipelines/triggers/datasets/linkedservice etc.) in MSAI-DataPipeline AzureDataFactory, you need to follow these steps.
Request a Contributor role for the MSAI Data Platform subscription. (You can update the ADF resources in personal branch without a Contributor role and SAW machine, but if you need to debug run or publish the update, a Contributor role and SAW machine is required.) Run the following in DMS - Datacenter Management Services:
Request-AzureResourceRoleElevation -Role Contributor -SubscriptionId 8b8b2cf1-f7e3-4d86-8581-a15cd35463a4 -ResourceGroup MSAI-DataPipeline -Reason "ADF pipeline" -Duration 4
Login in the Msai-DataPipeline AzureDataFactory Studio in your SAW Machine
2.1 Open the Msai-DataPipeline AzureDataFactory Studio in your broswer with your torus account **(e.g. zhxin_jit@prdtrs01.prod.outlook.com)**.
2.2 Authorize the AzureDevOps repository access with your **AAD Account (e.g. zhxin@microsoft.com)**.
Check the repository branch
3.1 The Publish button is enabled only for the main branch.
3.2 You need to create a new branch for update the AzureDataFactory.
Update the AzureDataFactory resources and Save the changes. These changes will be commited to your branch.
4.1 Debug your changes for your changes.
Create a Pull Request in the msai-datapipeline-adf repository after your changes are done.
5.1 Follow the CI build and Pull Request check-in policy to complete your PR.
5.2 Suggest to create your Pull Request when your main changes are completed. Otherwise, any new change will commit a new iteration in your pull request. That makes your Pull Request more complex and triggers more CI pipelines.
Publish your changes on AzureDataFactory Studio Portal.
6.1 Complete your Pull Request.
6.2 Switch to the main branch on the AzureDataFactory studio portal. Then check if your changes has been synced up to the main branch.
6.3 Click the Publish button and check if the publish successfully on portal notification.
6.4 You can switch the branch to 'adf_publish' on msai-datapipeline-adf repository. This branch will be updated after each successful AzureDataFactory publish.