IMPORTANT: Request a review from incident manager and leads before reporting announcement.
When should DRI report announcement?
DRI needs to report the announcement to Partner for all significant incidents which have customer impact. Here are some typical cases for some sub-areas:
- Impact MSAI Data Set delivery SLA
- If it’s DataSet delivery delay, report announcement when DRI got alerting CosmosJobSLA for Session pipeline.
- If Data Loss detected when DataSet delivered and need backfill, report announcement once issue detected.
- Impact 3S scorecard SLA / Availability
- Report announcement once DRI receive 3S scorecard SLA / Availability Alerting.
- Impact DataSet & Scorecard metric trustworthiness (i.e. data quality)
- Report announcement once DRI receive the metric anomaly Alerting and cannot determine whether it’s a false alarm, and update later accordingly.
How to report announcement?
DRI should send out the email based on the template list below, add the announcement audiences and CC 3s exp DRI and incident manager, as well as Bot for syncing announcment to Teams Channel.
Request a review from incident manager and leads before reporting announcement.
Announcement Template
Generic Template
To : 3sinstannounce@microsoft.com; 3Sannounce@microsoft.com; msaidataplatann@microsoft.com; 3SInstrumentationAnnouncements@service.microsoft.com; <optional aliases>
CC : 3sexpdri@microsoft.com; msaidataplatformim@microsoft.com; Announcements - 3S Data + Metrics Support 7e57b8a6.microsoft.com@amer.teams.ms
Subject : [ANNOUNCEMENT | UPDATE | RESOLUTION] <Issue Summary>
Body:
<Issue Description. Please refer to detail template below for different types of incidents>
Impact Start Time
: <Impact start time. Typically the datetime for data.>
Impact Mitigated Time
: <N/A if it is not mitigated yet.>
Impacted Scope
: <Impacted Scope description. Please refer to detail template below for different types of incidents.>
Next Update ETA
: <ETA for resolution or next update. Typically next business day for Sev-2 live-site.>
Manual workaround
: <N/A if there's no manual work around.>
Updates
: <Corresponding update.>
Incident link
: <like https://portal.microsofticm.com/imp/v3/incidents/details/{**ID**}/home>
-- 3S Experimentation and Metrics DRI --
-- visit MSAI Data Docs --
Subject and Content Template for MSAI Data Set Delivery SLA delay
Issue Summary
[MSAI DataSet Delay] MSAI DataSet Not available / SLA Delay for XX Hours on/since <Data Date>
Issue Description
We observed MSAI DataSet delay on/since <Data Date> for all dataset pipelines / <specific scopes>.
- The DataSet is still not available till now.
- The DataSet is available now, but SLA delay to XXX Hours.
Per our investigation so far, it's caused by <upstream delay> OR <cosmos/blueshift exeuction duration regression> OR <cosmos/blueshift job failure> The tracking and mitigation for the root cause can be found in: <root cause IcM link> as well.
MSAI Online Measurement DRI is working on the mitigation for root cause and the recoverey for the MSAI DataSet delivery.
Impacted Scope
- Client/Entrypoint: <all or specific clients/entrypoints>
- Data Pipeline: <all or Fast Pipeline / Slow Pipeline>
- Data View: <all or V1 Impression / Logical Impression / Session view>
- Region Scope: <all or EU Only / Rest of the world (ROW, Non-EU)>
Subject and Content Template for 3S scorecard Delivery SLA / Availability regression
Issue Summary
[3S Scorecard Availability / SLA regression] 3S DataSet SLA Delay to <xx Hours> / Availability drop to <regressed availability> on/since <Data Date>
Issue Description
We observed 3S Scorecard Availability drop / SLA Delay on/since <Data Date>. The issue happens in all 3S scorecard / <specific scopes>. The overall Availability / SLA regress to <xxx>, and the <specific scopes> availability / SLA regress to <xxx>.
<Please paste the screenshot of corresponding diagram for regression from dashboard: https://msit.powerbi.com/groups/me/apps/5aa6434b-5b43-4f18-ae4a-5c26249a2227/reports/fea985f0-cacc-45c0-91a3-f7203b8ebedf/ReportSectiona680f904325424b26dd3 >
Detail can be found in the IcM incident ticket paste below.
MSAI Online Measurement DRI is working on the mitigation.
Impacted Scope
- Client/Entrypoint: <all or specific clients/entrypoints>
- MetricSet: <all or specific metricsets>
- Experiment Group: <all or specific Experiment groups (e.g. 3S server experiments / Outlook Mobile Client side experiemnts) >
Subject and Content Template for Scorecard metrics baseline untrustworthiness regression
Issue Summary
[Metric Baseline Untrustworthy | Metric Anomaly Indicating POTENTIAL data untrustworthiness][Entrypoints/Clients] <Metrics> drop/increase/move since <Data Date>
Issue Description
We observed <Metrics> in <Entrypoints/Clients> drop/increase/move significantly/slightly, since <Data Date>. Detail can be found in the IcM incident ticket paste below.
If you can identify the root cause type:
Per our investigation so far, the metric anomaly indicates untrustworthiness for <Entrypoints/Clients> baseline.
- It's caused by cooking pipeline issue. we are working on the fix, ETA is <xxx>.
- It's caused by client instrumentation issue. Evidence can be found below (paste the screenshot & link for client instrumentation quality regression). @<client_owner> to investigate the root cause from client side, and provide ETA for fix.
If you cannot identify whether it's real data trustworthiness issue or if you cannot identify the root cause in short time:
Per our investigation, we cannot confirm whether it indicates the baseline untrustworthiness yet so far.
@<client_owner>, can you please help us invsetigate whether there's any new feature release or known search experience regression from client side?
Impacted Scope
If you can identify it's real data issue:
Baseline untrustworthiness for:
- Client/Entrypoint: <all or specific clients/entrypoints>
Ring: <all or specific Griffin Rings>- Vertical: <all or specific verticals>
- Metrics: <all or specific metrics>
The baseline untrustworthiness will impact the new feature experiment flight review as well.
If you cannot identify whether it's real data issue:
<Change Impacted Scope
into Metric Anomaly Scope
>
Metric Anomaly detected for:
- Client/Entrypoint: <all or specific clients/entrypoints>
Ring: <all or specific Griffin Rings>- Vertical: <all or specific verticals>
- Metrics: <all or specific metrics>
Reference info
Find Optional aliases you can add for explicit announcing for each client
Kusto Queries to get the mapping among MetricSet, Vertical and supported entrypoints
Mapping between Scenario and Entrypoint
Client | Entrypoint |
---|---|
MSB | bingcom.msb.ux.workserp |
MSB | bingcom.msb.ux.webserp |
MSO | sharepointshared.onedriveweb |
MSO | officecom.officehome |
MSO | sharepointshared.commsitesearch |
MSO | sharepointshared.hubsitesearch |
MSO | sharepointshared.msw |
MSO | sharepointshared.spclassic_basicsearchcenter |
MSO | sharepointshared.spclassic_enterprisesearchcenter |
MSO | sharepointshared.spclassic_sitesearch |
MSO | sharepointshared.sphomeweb |
MSO | sharepointshared.teamsitesearch |
MSO | sharepointshared.spclassic_ui |
MSO | sharepointshared.sphomeweb.answers |
MSO | officecom.officehome.answers |
MSO | sharepointshared.commsitesearch.answers |
MSO | sharepointshared.hubsitesearch.answers |
MSO | sharepointshared.msw.answers |
MSO | sharepointshared.teamsitesearch.answers |
MSO | sharepointshared.sphomeweb.vertical |
MSO | sharepointshared.commsitesearch.vertical |
MSO | sharepointshared.msw.vertical |
MSO | sharepointshared.teamsitesearch.vertical |
Outlook Mobile | exchangeshared.outlookmobile.android.email |
Outlook Mobile | exchangeshared.outlookmobile.ios.email |
OWA React | exchangeshared.owa.react |
Outlook Desktop | officeshared.outlookdesktop |
Outlook Desktop | officeshared.outlookdesktop.compose |
Outlook Desktop | officeshared.outlookdesktop.people |
Outlook Desktop | officeshared.outlookdesktop.atmentions |
Outlook Mobile | exchangeshared.outlookmobile.android.people.compose |
Outlook Mobile | exchangeshared.outlookmobile.ios.people.compose |
Outlook Mac | exchangeshared.outlookmac.email |
OWA Mini | exchangeshared.owa.react.mini |
OWA | exchangeshared.outlookuniversal.email |
Teams Desktop | teams.powerbar |
Teams Mobile | teams.mobileios |
Teams Mobile | teams.mobileandroid |
Union | officeshared.officemobilefiles |