How to validate FVL pipeline with testing data
Unit test
Follow Build and run unit tests to setup the local spark environment.
Prepare testing data in
src/test/resources/curate/FVL/SearchFeatureValueLog.txt
Run unit test in
src/test/scala/com/microsoft/odinml/curate/FVLCurationTest.scala
E2E test with testing data from Cosmos
In Fundamental Repo we have a script to convert Nuowo FVL logs in cosmos to the SIGS JSON formats. Other rankers' logs on Cosmos might have different schemas so the script may need tuning.
In Heron AML workspace, register your data on Cosmos as a dataset at Datasets
Please note that the path of your file has to be under /local
as /user
is just a mount point and the actual storage is different than the VC itself.
Transfer your data into Compliant ADLS with this module. Sample pieline
Create a pipeline to read the data and run HdiFvlAdhocCuration to generate FVL dataset with testing data. Sample pipeline
After running the cooking pipeline, register the output data as a dataset.
Create a pipeline to validate the output FVL data with HdiFvlMetrics. Sample Pipeline