-
Notifications
You must be signed in to change notification settings - Fork 4
Dirac CWL Executor + testing with LHCb Workflows (simulation and analysis productions) #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…sider not using explicit name in CWL workflow which is unreadable
…with artifacts that go to LogSE
| - 0 | ||
| events: -1 | ||
| priority: 2 | ||
| multicore: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, with the latest changes made to dirac-production-request-launch, multicore is going to be false for analysis productions.
It's going to be merged within the week if everything works as expected 🤞
| for step_index, step in enumerate(steps): | ||
| step_name = _sanitizeStepName(step.get("name", f"step_{step_index}")) | ||
| step_names.append(step_name) | ||
| cwl_step = _buildCWLStep(production, step, step_index, workflow_inputs, step_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you need the changes I am making to lb-mc to integrate the submission-info section here to build subworkflows (that would become transformations).
workflows
aldbr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That looks very promising 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops: .yam -> .yaml
| - class: dirac:inputDataset | ||
| event_type: '90000000' | ||
| conditions_dict: | ||
| inTCKs: ALL | ||
| inProPass: Real Data/Reco17/Stripping29r2p2 | ||
| configName: LHCb | ||
| inFileType: CHARM.MDST | ||
| configVersion: Collision17 | ||
| inProductionID: ALL | ||
| inDataQualityFlag: OK | ||
| launch_parameters: | ||
| end_run: | ||
| start_run: | ||
| run_numbers: | ||
| conditions_description: Beam6500GeV-VeloClosed-MagUp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of context about the dirac:execution-hooks hint because I feel like the concepts are very similar here, but I might be wrong.
The current dirac:execution-hooks hint is used at 2 different locations:
- transformation system: use specific parameters to query a FileCatalog/Bookkeeping to get LFNs and attach them to a job to submit
- job: use the hooks to pre/post process the jobs and the specific parameters to register the outputs in a specific part of the FileCatalog/Bookkeeping
Now you are right, the current dirac:execution-hooks hint is not global to a CWL production, but actually defined in every transformation.
I think the reason is that I initially wrote CWL workflows where the input parameters were used to build the query (based on an example provided by CTAO).
Example with mandelbrot:
-
transformation1: no parameter defined because the input parameters of the workflow are used to define the query: https://github.com/aldbr/dirac-cwl-proto/blob/52cb47ad1406e108732aa4b3f03332f32ce95f6b/test/workflows/mandelbrot/type_dependencies/transformation/metadata-mandelbrot_imageprod.yaml
-
transformation2: no information about the query parameter because the input are different, so we have to provide the query parameters (corresponding to the input of transformation1): https://github.com/aldbr/dirac-cwl-proto/blob/52cb47ad1406e108732aa4b3f03332f32ce95f6b/test/workflows/mandelbrot/type_dependencies/transformation/metadata-mandelbrot_imagemerge.yaml
https://github.com/aldbr/dirac-cwl-proto/blob/52cb47ad1406e108732aa4b3f03332f32ce95f6b/test/workflows/mandelbrot/image-prod.cwl
Then CTAO wrote an example where they were using parameters not defined as inputs of the workflow: https://github.com/aldbr/dirac-cwl-proto/blob/dc2cee46c1b804a6ec5add6da5dd7123c990630e/src/dirac_cwl_proto/execution_hooks/plugins/core.py#L32-L35
Is that something you actually need @arrabito?
In this PR, you also define query parameters not defined as inputs of the workflow @ryuwd .
If that's the case for everyone, then may be we should have a discussion about:
- are the query parameters always different from the input parameters of the workflows? Can we use both as query parameters?
- are the query parameters the same for the whole workflow? (e.g. production with 3 transformations that would have the same query parameters)
Then we should think whether and how we should separate:
- execution hooks (pre/post processing of the jobs),
- query parameters (used by the transformation system to get LFNs, as well as the jobs to store LFNs at the end),
- other input/output parameters (e.g. output data and sandbox).
In the meantime, just for your context, you could have potentially created an execution hooks plugin such as (but you would have to define it for every transformation, which I agree, is not convenient in this context):
class LHCbBasedPlugin(ExecutionHooksBasePlugin):
# LFN parameters
in_tcks: str = Field(...)
in_pro_pass: Optional[str] = Field(...)
config_name: Optional[str] = Field(...)
in_file_type: Optional[str] = Field(...)
...|
|
||
| Generated by ProductionRequestToCWL converter | ||
| inputs: | ||
| - id: production-id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just curious, why do you need the production-id here?
Same question for the prod-job-id.
I would have been tempted, if possible, to remove anything DIRAC-related from the CWL workflow itself, and move DIRAC-specific attributes to a hint (may be this is not possible).
| doc: Production Job ID | ||
| default: 6789 | ||
| type: int | ||
| - id: input-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what do you expect here? LFNs you got from you bk query? Is this going to work if you have a very large number of files?
I had issues with some workflows with my converter. You can see how I dealt with it if you check the archive I gave you a few weeks/months ago but basically, I did:
input-data:
class: File
contents: '["/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023715_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023714_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023753_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023754_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023685_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023697_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023710_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023711_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023693_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023686_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023708_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023682_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023674_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023688_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023673_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023680_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023675_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023706_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023681_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023691_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023699_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023712_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023695_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023713_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023704_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023696_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023676_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023732_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023719_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023747_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023716_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023724_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023728_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023723_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023740_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023749_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023743_1.charm_d02hh_dvntuple.root",
"/lhcb/LHCb/Collision25/CHARM_D02HH_DVNTUPLE.ROOT/00296922/0002/00296929_00023748_1.charm_d02hh_dvntuple.root"]'| configuration: | ||
| cpu: '1000000' | ||
| priority: 2 | ||
| multicore: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prioritywould likely go in the scheduling hint (though I think it's going to disappear in diracx with the work on the new matcher)cpuwould likely be defined in the CWL itself as a requirements no?multicorewould also be part of the CWL requirements- I think
output_se,remove_inputs_flags,output_file_mask,output_mode,input_data_policywould be defined outside ofconfiguration(becauseconfigurationcontains your query parameters for the FC/Bookkeeping) - do you need to use
ancestore_depthandeventsas query parameters? Do you know why it's here?
| ) | ||
|
|
||
|
|
||
| def analyse_xml_summary(xml_path: Path) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it connected to #68?
| is_remote = not pfn.startswith("file://") | ||
| # Remove file:// prefix for local files | ||
| if pfn.startswith("file://"): | ||
| return pfn[7:], is_remote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That look a bit fragile 😅
What about using urllib.parse (urlparse and potentially unquote): https://docs.python.org/3/library/urllib.parse.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you call the same method
startswithtwice, you could just reuseis_remote, right?
| logger = logging.getLogger("dirac-cwl-run") | ||
|
|
||
|
|
||
| class DiracPathMapper(PathMapper): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to understand why we need this DiracPathMapper, can you explain please?
Because, I would have naively thought that adding the replica catalog to the active CLT directory would be enough: the application needing the replica catalog would read it and use whatever PFNs it needs.
|
|
||
|
|
||
| # Apply the monkey-patch | ||
| CommandLineTool.make_path_mapper = staticmethod(_custom_make_path_mapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you would need to override the CommandLineTool to avoid the monkey patch.
Overall, I think that would be a good idea to ask for guidance (just to make sure we are going in the right directions here) in the CWL forum. What do you think?
| ): | ||
| """Update catalog from job outputs. | ||
|
|
||
| After a job completes, the lbprodrun wrapper may have added new LFNs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| After a job completes, the lbprodrun wrapper may have added new LFNs | |
| After a job completes, the application may have added new LFNs |
for #52
Adds a new
dirac-cwl-runutility for running CWL workflows with input data resolution according to the concepts introduced in #69 with Dirac LFNs, the "replica catalog", and a proof-of-concept for using input data (in LHCb this is the "Bookkeeping") queries to pre-fill emptyinput-dataworkflow parameters.Also adds converters for LHCb production YAMLs --> CWL workflow files.
Tested with LHCb simulation and analysis productions workflows.