-
Notifications
You must be signed in to change notification settings - Fork 51
Unified modules and their roles
Checkor module checks the workflows in completed status in ReqMgr2.
-
completedtoclosed-outtransition:Calculate the expected and observed statistics for the outputs of the workflow in terms of lumisections. If the observed statistics are greater than or equal to the threshold (
fractionpass), then the workflow is moved toclosed-outstatus in ReqMgr2. This means that the workflow produced the satisfactory results for all the outputs. If the observed statistics are not satisfactory, then the workflow is labeled withassistancetag, meaning that the workflow requires manual intervention to tackle the issues that it had. Thefractionpassis 100% by default, but it can be overwritten in the campaign level. For instance, most MC workflows have 95%fractionpass. There is also some extra logic in the module which might lower thefractionpassif a certain criteria is met. -
Assistance labeling
As mentioned above, if the workflow did not reach to the satisfactory results, then it stays in
completedstatus and it's labeled with severalassistancetags. These tags show which kind of issue the workflow has and in which level of resubmission (ACDC) it is. -
Output lumisection size check:
Both too small and too big lumisections are problematic. This module checks for both too small and big lumisections. The lower limit is determined in Unified Configuration file. If the events/lumi of an output is lower than this value, then workflow is tagged with
assistance-smalllumilabel and a human checks the workflow.The upper limit is determined in the campaign level. If it
lumi_sizeis-1, then this means that there is no limit. If the events/lumi is greater than the upper limit, then the workflow is tagged withassistance-biglumiand a human checks the workflow. -
Filemismatch check:
For each output dataset, the module checks if the number of files in DBS matches with that of Rucio. If it does not match, then the workflow is tagged with
assistance-filemismatchlabel and a human checks the workflow.Note that there is a delay between file injection to DBS and Rucio in WMAgent, which causes a filemismatch temporarily. In this scenario, the workflow is tagged with
assistance-agentfilemismatchlabel and if the filemismatch is not resolved within 2 days, then the workflow is moved toassistance-filemismatch -
[CURRENTLY DISABLED] Duplicate check:
For each output, the module queries DBS and checks for duplicate events. In case of duplicate events, it invalidates the file(s) which is/are causing the duplicate.
Since this is a very expensive and heavy operation, this feature is currently disabled.
-
Invalid file(s) check:
If the number of invalid files in the output is above a threshold, then the workflow is tagged with
assistance-invalidfilesand a human checks the workflow. -
Create/Update JIRA ticket
Based on the checks done within the module, a JIRA ticket is created/updated automatically.
-
Create a lumisection summary webpage:
A webpage is created which shows the lumisections: E.g. https://cms-unified.web.cern.ch/cms-unified/datalumi/lumi.ReReco-Run2017C-JetHT-UL2017_MiniAODv1_NanoAODv2_pilot4-00001.html
-
Create notifications for the requestors:
Create a notification for the requestors about the issues that the workflow is having.