Ramp Up Doc For OLM New Member
[[TOC]]
[Step 1] Getting Start
[Task 1] Set up account
- Ensure your alias ready (obviously, your alias already ready when you can see this document).
- [optional] Install common applications, e.g. outlook, teams, o365
- [optional] If you want to install teams on your phone, and authentication fails, start by installing and registering the 'intune Company Portal' or '公司门户' app.
- Prepare a self-introduction in english, your manager will give you a template.
[Task 2] 1 on 1 with onboarding buddy
Talking with your onboarding buddy to familiar with the office environment and get some ramp up documents.
[Task 2] 1 on 1 with manager
Talking with your manager to know more about the team.
[Step 2] Apply for some permissions and devices
[Task 1] Request CSE Exchange Service Migration
Your exchange account should be in NAM region so that your search data can be collected and ingested into SIGS.
It may take about one month for your request to be approved, so please signup ASAP.
You only need finish step 1 at first. After the signup status become "compelete", then you can finish step 2-6.
[Task 2] Finish Required training
You can do it later, but must finish [2022 Security Foundations – Module 1 – Protect your identity] and [E+D privacy fundamentals] in task 5
Finish the Required Training you need to take on Required Training.
[Task 3] Join Distribution Groups
Join the following DGs on idweb.
Alias | Display Name | Purpose |
---|---|---|
udpfundamentalolm | MSAI Data Platform Fundamental & OLM | Announcements inside OLM team |
msaidc | MSAI Data Core | Announcements inside MSAI Data Platform team |
msaidataolmcrew | MSAI Data Offline Measurement Crew | Announcements inside MSAI Data Offline Measurement team |
udpfundamentalolm | MSAI Data Platform Fundamental & OLM | Announcements inside MSAI Data Platform Fundamental & Offline Measurement |
subdocs_contributors | Substrate Doc Contributors | Contributors to the Substrate Dev Center documentation |
substrate_pageowners | Substrate Docs Page Owners | DL for the Substrate Docs Page Owners |
If udpfundamentalolm, msaidc and udpfundamentalolm are not found, just skip them.
[Task 4] Join Security Groups
Join the following SGs on idweb.
Alias | Display Name | Purpose |
---|---|---|
3sdatadri | 3S Data DRI | For Kusto Database Access and MSAI Data Platform/3s Announcement |
EuclidOIVICAdmin | EuclidOIVICAdmin | For Reviews of EuclidOIVIC Repo and access of EuclidOIVIC EWS environment |
msaidataplatformstca | MSAI Data Platform STCA Team (Auriga) | SG for MSAI Data Platform STCA team |
msaistcadevs | MSAI STCA Engineers | SG for MSAI STCA team |
If 3sdatadri is not found, just skip it.
[Task 5] Request Eligibilities
Before requesting eligibilities, please check if you have been granted all the clearances below: Request Clearances.
ELIGIBILITY | WORKLOAD | Notes |
---|---|---|
CVC_OffConsPUID | Exchange | For VC Office.Consumer.PUID |
CVC_Off_Eng | Exchange | For VC Office.Engineering |
ExoStrProd_RWX | Exchange | For ADLS/ADLA exchange-storage-prod-c14 |
CVC_Off_Adhoc | Exchange | For VC Office.Adhoc |
Off_Adhoc_RWX | Exchange | For ADLS/ADLS off-adhoc-c14 |
Off_Eng_RWX | Exchange | For ADLS/ADLA off-eng-c14 |
CVC_ExoStrProd | Exchange | For VC Exchange.Storage.Prod |
SSSDSM | Exchange | For AurigaSentry monitors, AurigaSentryPortal and dsmkusto |
msaidataplat | Exchange | For Search ADF msai-datapipeline |
Griffin Partner Access | Exchange | For Cosmos VC access |
Heron | Exchange | For exploring compliant AI/ML experiments, run new experiments without Eyes-On access to data and debug Eyes-Off Heron HDI |
oivic | Exchange | For reading and triggering ADF pipelines in PPE and Prod |
Mars Basic Access | Exchange | Mars Basic Access, this can enable us to list files on clusters |
[Task 6] Change Lockbox Approver Team
You should change your team to [msaidataplat - Approver Team].
Change the team on Approver Team
[Task 7] Get notifications on TorusBot
Follow this guide to subscribe notification.
[Task 8] Get Started with Torus
Torus paves the way for its users to safely use corporate services, customer data, and online corporate assets. Read Get Started with Torus to know more.
Tips:
- Torus Debug Account
- After your requests in task 3-5 are approved, you will get torus debug account automatically, please pay attention to the message of "TorusBot".
- click here to know more.
- YubiKey
- The YubiKey is a hardware authentication device that protects access to computers, networks, and online services.
- You can request a YubiKey on M365 YubiKey Request.
- To know more about YubiKey,click here.
When you finish task 8, the Identity-OSP Overview will be shown as bellow:
[Task 9] Acquire SAW Machine
A Secure Admin Workstation (SAW) is a securely controlled and provisioned workstation designed for both managing valuable production systems, and as an option, daily activities like email, document editing, and development work are possible. Click here to know more.
Follow How to acquire a saw to get your SAW.
[Task 10] Get Access to Edit Doc Center
Follow How to get access to edit doc center to get access to doc center.
[Step 3] Be familiar with the wiki and project
[Task 1] Submit your first wiki PR
- You will receive an email from [ STCA Dev Agility Team stcadev@microsoft.com ] after onboarding, it will recommend you submit your first Pull Request within 2 weeks, you can ask your onboarding buddy for help.
- click here to learn how to create pull requests.
- click here to learn [DM.Auriga.Impression Pull Request/Code Reviewer Policy]
[Task 2] OLM Knowledge and Architecture
Here is a list of some common nouns and their full names.
Noun | Full Name | FYI |
---|---|---|
OLM | Offline Measurement | MSAI Data Platform Offline Dataset Release Notes - Substrate Dev Center (microsoft.net) |
FVL | Feature Value Logs | FVL - Feature Value Logs - Substrate Dev Center (microsoft.net) |
OIVIC | ODIN Impression View in Compliance | OIVIC - ODIN Impression View in Compliance - Substrate Dev Center (microsoft.net) |
MDP | Microsoft Data Program | FVL MDP - Substrate Dev Center (microsoft.net) |
D&D | Disconnected and De-Linked | FVL D&D - Substrate Dev Center (microsoft.net) |
SLA | Service Level Agreement | It is a contract between a service provider and a customer that defines the level of service expected from the provider. Mainly refers to the delivery time here. |
EWS | Euclid Workspace | |
OCL | Office Client Loader | |
ODL | Office Data Loader | |
ADF | Azure Data Factory | |
AML | Azure Machine Learning | |
LIV | Logical Impression View | LogicalImpression overview - Substrate Dev Center (microsoft.net) |
SSV | SearchSession View | SearchSession overview - Substrate Dev Center (microsoft.net) |
For other words, you can search in TPrompt Chat (tprompt-chatbot.azurewebsites.net), or directly search in Substrate Dev Center (microsoft.net).
- Substrate AI
- Substrate Search Service (3S) overview
- MSAI Data Platform
- MSAI Unified Data Platform.pptx
- MSAI Data Platform Map
[Task 3] 1 on 1 with mentor
Talking with your mentor, he or she will interpret the project to you.
Until now, I think you must have a lot of questions, you can also ask your mentor to get answer.
[Step 4] Ramp up by Practice
[Task 1] Data Collection
In Data Collection part, we use Control Tower experiments to sample SIGS data for different scenarios.
SearchSuggestionsDetails SIGS Write
Each iteration controls one sampling process for a scenario. For example, this iteration is for this scenario:
Route (Query, Suggestions, Recommendations) | AppName | AppScenario | Scope (MSIT, Commercial, Consumer) |
---|---|---|---|
Query | sharepointshared | teamsitesearch | MyData + MSIT + Commercial |
Tips:
- User Type
- If Scope is "MSIT", it means traffic only for Microsoft enterprise users, please select User Type as "Business", and deploy to SDFV2 + MSIT Ring;
- If Scope is "Commercial", it means traffic for all enterprise users, please select User Type as "Business", and deploy to SDFV2 + MSIT + SIP + WW Ring;
- If Scope is "Consumer", it means traffic for all individual users, please select User Type as "Consumer", and deploy to SDFV2 + MSIT + SIP + WW Ring.
- 24h Timed Gate
- The experiment has a 24h Timed Gate, which means that you can only advance each deploy step after the previous stage running 24 hours. Please follow the 24h timed gate rule if not emergency.
- If there is indeed an emergency which impacts production traffic, you could click the red clock button beside the sample rate to override the timed gate for a quick deploy.
- If you confirm there is a bug in Control Tower experimentation experience, you could raise the problem in Cerberus Support Teams Channel.
- Config Setting
- FYI that currently some signal writing traffic is controlled by this config file: SearchServiceCommon.settings.ini
How to Setup a New Iteration for New Scenario Onboarding
Click "Add New Progression" on the page, fill in the progression info as below, and also config the filters in "Traffic Filters".
How to Validate
- Sampled Traffic Dashboard : captures the sampled SIGS traffic of different scenarios. You can extend the time range to 30d to observe if the traffic volume is normal.
- Kusto Query
[Task 2] Data Cooking
Code Repo
- ResponseAction data is consumed from ImpressionV2 View and LogicalImpressionView on Cosmos. We join, cook and write the data to a specific path on Cosmos in ResponseActionsToAdls.script, and then SI team will use a data movement job to write to ADLS. ResponseActionsToAdls
- OIVIC data is cooked in OIVIC Curation Pipeline.
- FVL data is cooked in FVL Curation Pipeline.
The production job params for OIVIC Curation pipeline is defined in OIVICCurationProdTemplate.json.
Build and Test
Set up your local environment.
- Intellij IDEA Community Edition
- Scala 2.12
- Java JDK 1.8
- Maven 3.6.3
- Hadoop 3.2.2
- Spark 3.3
- Python 3.7
- OpenJDK 8.0(optional)
Follow HowToRunSparkUnitTestLocally to run spark unit test code in local mode.
Enable GitHub Copilot
Follow GitHub Copilot to get access to Copilot feature.
If Copilot X and GitHub Copilot Nightly extension are not found, try install GitHub Copilot Chat and GitHub Copilot Labs in VSCode for ChatGPT support.
Use Copilot to assist coding.
Big Data Community Sharing---Best practice for Github Copilot-20230417
Use Copilot to explain code (currently only available in VSCode).
Select the code, then right click to start chat, code explanation, etc.
Or start chat directly from the sidebar.
To learn more about GitHub Copilot, click GitHub Copilot Docs.
Deploy
To be updated...
[Task 3] Data Analytics & Model Building
If you have onboarded a new scenario, you need to validate if the data flows through the whole process as we expected, and whether it is well-prepared to be comsumed by data science team. Please follow the validation steps below.
1. Extract Dataset on Heron Portal
a. Log in Heron Portal with your microsoft account.
b. Click into "MSAI Data Platform" AI project.
c. Select "Consumer, Commercial, MSIT" Environment Type.
d. Click "+ Get Data" button left upper on the page, and select "Commercial/Consumer/MSIT SIGSOIVIC v2" depending on your user type and deploy ring.
e. Fill in the new extraction form and get data. Wait 30-60 mins for data extraction.
2. Create a pipeline run with AML module
a. Log in Azure Machine Learning Workspace with your Torus account.
b. Click "Designer" on left side bar -> New pipeline -> Easy-to-use prebuilt modules
c. Drag the dataset you just extract on Heron Portal to the right side. We usually select the dataset name ended with "_file".
d. Drag an AML module to the right side which output can help validate your dataset. For example, for OIVIC scenario onboarding, you can use "HdiOivicResponseActionStats" which will print the response action column null rate for your dataset.
e. Point the dataset to the AML module on the right side.
f. AML Module "Output Setting" -> Override default output settings -> set DataStore to "heron_sandbox_storage".
g. Click "Submit".
3. Validate AML pipeline output
a. Click "Experiments" on left side bar and find your last pipeline run. You can refer to the "Output+logs" of the AML module to debug if the run failed.
b. Find your spark application details in stdoutlogs.txt.
For example, the User Name is "livy", and the application id is application_1635++++++++++++_0001 for this run.
c. Check driver log on YARN page. Find your application id on YARN UI page(you can find it after clicking Azure HD Insight block in the workspace page) and check if the FinalStatus is "SUCCEEDED".
d. Click into an application ID to check the stdout in logs. The output should be the print result of your AML module. For example, here we use HdiOivicResponseActionStats module to check the null rate of ResponseAction column.
[Step 5] Get Ready For DRI
Refer to MSAI Data DRI Handbook - Get Ready For DRI
To be updated..