Skip to content

Modify quickrun to allow resuming#1848

Open
IAlibay wants to merge 32 commits intomainfrom
quickrun_resume
Open

Modify quickrun to allow resuming#1848
IAlibay wants to merge 32 commits intomainfrom
quickrun_resume

Conversation

@IAlibay
Copy link
Member

@IAlibay IAlibay commented Feb 16, 2026

Minimal changes to quickrun in order to allow resuming by serializing ProtocolDAGs.
Related to & requires OpenFreeEnergy/gufe#738

Checklist

  • All new code is appropriately documented (user-facing code must have complete docstrings).
  • Added a news entry, or the changes are not user-facing.
  • Ran pre-commit: you can run pre-commit locally or comment on this PR with pre-commit.ci autofix.

Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).

Developers certificate of origin

@IAlibay IAlibay closed this Feb 16, 2026
@IAlibay IAlibay reopened this Feb 16, 2026
@IAlibay
Copy link
Member Author

IAlibay commented Feb 16, 2026

pre-commit.ci autofix

@IAlibay
Copy link
Member Author

IAlibay commented Feb 16, 2026

I don't know what's up with the tyk2 CLI test, but locally the restart is working!

@atravitz atravitz added this to the 1.10.0 milestone Feb 16, 2026
@IAlibay IAlibay assigned atravitz and unassigned atravitz Feb 16, 2026
@atravitz atravitz changed the title [WIP] Modify quickrun to allow resuming Modify quickrun to allow resuming Mar 11, 2026
@atravitz atravitz marked this pull request as draft March 11, 2026 22:30
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.90%. Comparing base (cd17b54) to head (bd8efe6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1848      +/-   ##
==========================================
- Coverage   94.96%   91.90%   -3.06%     
==========================================
  Files         206      206              
  Lines       18181    18256      +75     
==========================================
- Hits        17265    16778     -487     
- Misses        916     1478     +562     
Flag Coverage Δ
fast-tests 91.90% <100.00%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@atravitz
Copy link
Contributor

question: do we want this to be the default behavior, or require openfe quickrun --resume?

@atravitz atravitz marked this pull request as ready for review March 13, 2026 00:13
@atravitz atravitz requested a review from mikemhenry March 19, 2026 05:20
Copy link
Member Author

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one question but otherwise lgtm. I'll retest on a real FEP job today too.

@atravitz atravitz self-requested a review March 20, 2026 21:21
@atravitz atravitz enabled auto-merge (squash) March 20, 2026 21:22
@atravitz atravitz disabled auto-merge March 20, 2026 21:27
dag = trans.create()
# Attempt to either deserialize or freshly create DAG
cache_basedir = work_dir / "quickrun_cache"
trans_DAG_json = cache_basedir / f"{trans.key}-ProtocolDAG.json"
Copy link
Contributor

@atravitz atravitz Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now the directory structure (with https://github.com/OpenFreeEnergy/gufe/pull/764/files) looks like:

quickrun_cache/
├── ProtocolDAG-8090c89c950e976b33829a15e662ed89-results_cache/
└── Transformation-dbea03c534737749bb413e01a382f3af-protocolDAG.json

is there a better way to indicate that Transformation-dbea03c534737749bb413e01a382f3af-ProtocolDAG.json is the ProtocolDAG corresponding to Transformation-dbea03c534737749bb413e01a382f3af?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move the dag creation up a little bit, and just use the dag key as part of the name?

I.e. of you define trans_DAG_json after you define dag, it could be f"ProtocolDAG-{dag.key}.json" instead.

Copy link
Contributor

@atravitz atravitz Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in offline conversation, @IAlibay and I agree that moving dag creation up will not work, because each time .create() is called, a new unique ProtocolDAG is created.

We also realize that, as of right now, the cache is only unique based on the transformation key and the -d argument. This means that if a user called the following two commands in succession:

openfe quickrun transformation.json -o result1.json -d results/
openfe quickrun transformation.json -o result2.json -d results/

openfe will treat these as a re-execution and raise an error telling the user that they should pass in --resume or delete the file and restart.

There are two possible solutions to this:

  1. Enforce user behavior to have separate -d values for each repeat (essentially what is implemented now)
  2. Build the hash based on the uniqueness of all 3 1. transformation key 2. -o, and 3. -d. Since -o can be any arbitrary filepath we may want to hash -o and store it as:
    [-d arg]/quickrun_cache/dagcache-[hash(transformation.key, -o arg)].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option 2 has been implemented in #1890

@IAlibay
Copy link
Member Author

IAlibay commented Mar 23, 2026

Testing locally with multiple interrupts / resumes is working as expected.

@github-actions
Copy link

No API break detected ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants