Guidance for DM.Auriga.Impression DiffDoctor Validation
For any question about the specific User Cases
- Contact MSAI Data Platform Fundamental Crew.
- Create a bug if you have any requirement/questions about DiffDoctor. BugTemplate
Users cases
1. Trigger the DiffDoctor vaidation pipeline from release pipeline based on her/his changes
Preconditions:
- Triggered one successful DM.Auriga.Impression - Artifacts Publish and Validation from your Pull Request before you trigger related validation pipeline
- We only trigger ADF against main branch. that means if the change needs to update the ADF pipeline too, currently we don't support from release pipeline. If you need to update ADF pipeline, you can refer to the below part Trigger the DiffDoctor Validation Pipeline in Azure Data Factory
Detail steps:
Note: There are two scenarios when users trigger a specific validation pipeline
- Customize value of paramters, then need to create one release pipeline manually and deploy the validation stage
- Use fixed vaule of paramters, then deploy from automatical created release pipeline directly.
Create a release pipeline with customized parameters or using existed release pipeline with fixed value of parameters
Navigate to the msai-datapipeline-test(Azure DevOps->Pipelines->Releases->msai-datapipeline-testenv)
Optional a: Trigger a Release with fixed parameters
Find out the successful build number in release candidates, for example Release-92 against the BuildNumber 20220302.3:
Optional b: Trigger a Release with customized parameters
In
msai-datapipeline-testenv
release pipeline, ClickCreate release
.In the pop-up, we need to select the Artifact
Version
first. Then, fill the parameters like the below screenshot:Note:
Currently we have two parameters in release pipeline, Date and HourSpan, Date is applied to all validation pipelines, HourSpan is only applied for CompleteFastSession. Both these two paramters need to limited in one day.
For Date parameter
- For hourly Job, you can set in the format of
yyyy-MM-ddTHH:mm:ssZ
- For daily Job, you can set it as
yyyy-MM-dd
oryyyy-MM-ddTHH:mm:ssZ
- For hourly Job, you can set in the format of
For HourSpan parameter (HourSpan parameter is only used for CompleteFastSession Validation Pipeline!)
The range is from
1
to24
according to the Date's value. e.g. IfDate
is2022-06-28T18:00:00Z
, TheHourSpan
value should be not bigger than6
.- If we set the value as 6, the hourly jobs timestamps will contain 2022-06-28T18:00:00Z, 2022-06-28T19:00:00Z, 2022-06-28T20:00:00Z, 2022-06-28T21:00:00Z, 2022-06-28T22:00:00Z and 2022-06-28T23:00:00Z.
- If we set the value as 7, it'll be invalid because 2022-06-28T18:00:00 + 7 hours will overflow the current date. It is an obvious invalid parameter in our daily pipeline.
Deploy the stages and check the stage logs for detail status.
Note:
- Logical Hourly Validation & Logical Daily Validation : Using the raw data as input and testline/baseline as pipeline to cook LogicalImpression data
- SearchImpression Hourly Validation & SearchImpression Daily Validation : Using the raw data as input and testline/baseline as pipeline to cook SearchImpression data
- FastSession Validation & SlowSession Validation : Using the same production impression data as input and testline/baseline as pipeline to cook session data
- CompleteFastSession Validation : Using the raw data as input and testline/baseline as pipeline to cook impression data and cook session data
After the stage is deployed, the test jobs in ADF will be triggered automatically. Now you can check the pipeline list whether it has been triggered from ADF via parameters, the related Logical hourly validation pipeline has been triggered.
Now you only need to wait the jobs finish running until you receive email notifaction like the below email body:
Then can check the result from DiffDoctor Result Path provided in the email.(e.g. https://aad.cosmos14.osdinfra.net/cosmos/exchange.storage.prod/local/Aggregated/Datasets/Public/Usage/DiffResult/20220418.1/LogicalDaily/Output_ReportSummary_2023-01-07.ss )
Note:
How to find your build number: Navigate to (Azure DevOps->Pipelines->DM.Auriga.Impression - Buddy->Find your successful buddy build, for example :#20220303.1 olo migration,
20220303.1
is your build number. please refer to the snapshot:How to find your test output and diffresult in corresponding folder with build number 20220303.1, like:
- Test output for Recommendation: https://aad.cosmos14.osdinfra.net/cosmos/exchange.storage.prod/local/Aggregated/Datasets/Public/Usage/TestOutput/20220303.1/Testline/Aggregated/Datasets/Public/Usage/Impression/LogicalSearchImpressionHourly/Recommendations/2022/06/28/RecommendationsLogicalImpressions_2022-06-28-21.ss
- Diff result for logical hourly: https://aad.cosmos14.osdinfra.net/cosmos/exchange.storage.prod/local/Aggregated/Datasets/Public/Usage/DiffResult/20220303.1/LogicalHourly/Output_ReportSummary_2022-06-10-00.ss
2. Trigger the DiffDoctor Validation Pipeline in Azure Data Factory
If we need some changes in ADF, we can trigger the ADF validation pipeline directly.
Preconditions:
- Triggered one successful DM.Auriga.Impression - Artifacts Publish and Validation from your Pull Request before you trigger related validation pipeline.
- You need to operate AzureDataFactor on a SAW Machine.
- You need to request a Contributor JIT to AzureDataFactory (Run the following torus command on DMS-DatacenterManagement)
Request-AzureResourceRoleElevation -Role Contributor -SubscriptionId 8b8b2cf1-f7e3-4d86-8581-a15cd35463a4 -ResourceGroup MSAI-DataPipeline -Reason "ADF pipeline" -Duration 4
Detail steps:
Sign in Azure Data Factory and find the folder DiffDoctor and select one DiffDoctor valiation pipeline.
Trigger the validation pipeline in ADF directly, only need to fill the values of parameters like this:
After the DiffDoctor Validation pipeline finished the running, you'll receive one email notification as well.
3. Trigger the test job from Azure Data Factory instead of Dragonfly
Preconditions:
If your pipeline change need a corresponding changes in ADF, you can trigger the common product pipeline directly. Currently we only support these pipelines
- CookingLogicalSearchImpression
- CookingLogicalSearchiImpressionHourly
- CookingLogicalSearchiImpressionHourly_PreProd
- CookingSearchImpression
- CookingSearchImpressionHourly
- SearchSessionComputaion_Fast
- SearchSessionComputaion
- PreProdSearchSessionComputaion_Fast
- Triggered one successful DM.Auriga.Impression - Artifacts Publish and Validation from your Pull Request before you trigger related validation pipeline
Detail steps:
Here take CookingLogicalSearchImpressionHourly as example
- Find the pipeline in Azure Data Factory CookingLogicalSearchImpressionHourly, and fill the value of parameters, like the below screenshot:
Note:
The
Env
should be testline
4. Test View changes with Customized path
We have added a new parameter CustomDependency(PR into search views to set the root path of binaries, which can be used in local view test with changed binary (ex. DataMining.Office.Impression.DataTypes.dll). Example usage in Visual Studio when submitting a test view job:
Example usage in dragonfly when submit a test session job:
5. The Path of Diffdoctor Testline output could be used in the scorecard job
We can take the preifx of path as the Custom_Input to configure it in the scorecard.
6. Manually submit diffdoctor script to run Diffdoctor
In some cases you want to diff the testline with another PR/sth instead of "baseline", then you have to manually submit DiffDoctor job.
- Find the diffdoctor script (e.g. DiffDoctorValidation_FastSessionComputation.script) in the DM.Auriga.Impression repo
- Change the input params and output place and submit the script using Visual Studio in other VCs. (e.g. office.engineering, office, adhoc).
Known DiffDoctor gap
- Session Fast Diff Doctor
- If you find Union (officeshared.officemobilefiles) has obvious diff, and mainly on SessionDuration column; it's a known issue. Refer to this bug link: Bug 4398052
- Logical Impression Hourly Diff Doctor
- If you find some entrypoints (incl. Recommendations & MSB) have primary key diff, due to
NonUniqueKey_Test; / NonUniqueKey_Baseline; NonUniqueKey_Test;
. Refer to this bug link: Bug 5005358
- If you find some entrypoints (incl. Recommendations & MSB) have primary key diff, due to