FVL Payload Registration
Payload Registration is a way to tell Data Catalog about your latest curated dataset, for example, where it is. There are 2 ways to do Payload Registration: the first one is manually register through DMS console, which requires SI team to approve. The second one is to register through Payload Registration ADF Pipeline.
Register Payload manually
Firstly, prepare a payload.json
file with the following content:
{
"PayloadIdentifiers": {
"DataDate": "2022-01-03T00:00:00Z"
},
"DataPaths": [
"abfss://fvl@o365ccoivicprodnamdpmsit.dfs.core.windows.net/output/DatasetCategory=MSIT/JobDate=2022-01-03/"
],
"PayloadInvisibleUntil": "2022-01-03T00:00:00Z"
}
PayloadIdentifiers
are defined in Data Catalog configuration. Key of the identifiers should match those defined in Data Catalog.DataPaths
is the FVL data path. Make sure to double check about- Storage Account name (PPE/PROD, and dp/dpmsit/dp28, etc.)
- DatasetCategory
- JobDate
Then run the following command in DMS:
.\DCTool.ps1 -Ring 'PROD' -RelativeUriWithParameters '/Datasets/4ebe73da-6163-426b-947a-7f260776381a/DatasetPayload' -HttpMethod 'PUT' -RequestBodyFilePath payload.json
DCTool.ps1
can be found here
4ebe73da-6163-426b-947a-7f260776381a
is the dataset id for FVL NAM MSIT. It is defined in Data Catalog as well.- FVL NAM Commercial/FVL NAM Consumer/FVL EUR datasets are with different IDs. Make sure to match their IDs.
Register Payload with ADF Payload Registration pipeline
To register payload automatically, we can use ADF pipeline to register payloads.
ADF Payload Registration pipeline
Setup Payload Registration pipeline
Refer to the pipeline template in MARS repo to create a payload registration pipeline.
There are multiple endpoints of Payload Registration services for different environments:
- dev:
substrateintelligencetest.microsoft.com/DataCatalogService/Datasets/@{DC_DatasetId}/DatasetPayload
- ppe:
substrateintelligenceppe.microsoft.com/DataCatalogService/Datasets/@{DC_DatasetId}/DatasetPayload
- prod:
nam.substrateintelligence.microsoft.com/DataCatalogService/Datasets/@{DC_DatasetId}/DatasetPayload
Grant API permissions
To be able to call payload registration services, API Permissions need to be granted. Refer to these 2 files: OIVIC-DC-PPE.json and OIVIC-DC-PROD.json as an example.
{
"name": "o365ccapp-oivic-workspace-ppe",
"clientServicePrincipalObjectId": "508c7fb0-316b-4740-8805-56412ebd3564",
"resourceName": "DataCatalogService-PPE",
"resourceServicePrincipalObjectId": "20135fd3-ee97-4112-a90c-e1c3e154f9b1",
"delegatedPermissions": "",
"applicationPermissions": "Payload.Delete Payload.Register Dataset.Read"
}
name
: the name of SPNclientServicePrincipalObjectId
: this is NOT the app id nor ANY ID you found on Azure portal. To get this ID for your SPN, run the following command in DMS console.AzureObjectId
is the ID to be put here.
PS C:\> Get-TorusAzureADServicePrincipal -ServicePrincipalName o365ccapp-oivic-workspace-ppe
AzureObjectId : 508c7fb0-316b-4740-8805-56412ebd3564
OwnerSamAccountName :
AppId : caf7cb3e-af10-4f3a-b68a-de64b61919ca
TenantId : cdc5aeea-15c5-4db6-b079-fcadd2505dc2
ServiceTreeServiceId : c981d712-df59-4b58-b916-4a425df0d2b2
SignInAudience : AzureADMyOrg
CreatedBy : zhengu
CreatedDateTime : 08/03/2021 20:55:03 +00:00
IsDeleted : False
ServicePrincipalType : Application
OrganizationName : M365 Core
ServiceGroupName : Microsoft Search Assistants & Intelligence (MSAI)
AccountEnabled : True
DisplayName : o365ccapp-oivic-workspace-ppe
Team : EuclidWSTeam
ExtraClearanceFlatNames :
OrganizationType : Global
ExtraRequiredClearances : {}
resourceName
: the resource we need to access. Here it isDataCatalogService-PPE
; for PROD env, it'sDataCatalogService-PROD
.resourceServicePrincipalObjectId
: the resource id we need to access. ForDataCatalogService-PPE
, it is20135fd3-ee97-4112-a90c-e1c3e154f9b1
; forDataCatalogService-PROD
, it isa60d292d-b4b0-4d9a-a5b7-b2ab6ee5f2ee
.applicationPermissions
: the permissions we need, here we putPayload.Delete Payload.Register Dataset.Read
.
Raise a PR in ControlPlane repo and ask M365 ProdSec team to review.
Generate payload in curation pipeline
Then we need to generate the payload data after curation is done. One way to do it is to use the PayloadRegistrationHelper
. Copy the code to your code repo and see the sample usage for FVL. Make sure to setup the correct payload identifiers as shown in the section above.