-
Notifications
You must be signed in to change notification settings - Fork 41
Modify quickrun to allow resuming #1848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
11710a7
322bc23
a61598e
6579c04
0f43f6f
5e1a21c
182562f
3180b8c
1c0fdf7
156320b
360882a
04d47bb
1055c1b
2b1000e
c8a03d8
d735c7f
6518499
5cd437e
61a97b6
48ab9c8
31a6589
eaebaac
fe746c8
fd9253b
4ed9b5f
494eb06
d9a2d5e
58b5642
e7de6b8
ec44cce
e1f742b
bd8efe6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| **Added:** | ||
|
|
||
| * Added ``--resume`` flag to ``openfe quickrun``. | ||
| Quickrun now temporarily caches ``protocolDAG`` information and when used with the ``--resume`` flag, quickrun will attempt resume execution of an incomplete transformation. | ||
|
|
||
| **Changed:** | ||
|
|
||
| * <news item> | ||
|
|
||
| **Deprecated:** | ||
|
|
||
| * <news item> | ||
|
|
||
| **Removed:** | ||
|
|
||
| * <news item> | ||
|
|
||
| **Fixed:** | ||
|
|
||
| * <news item> | ||
|
|
||
| **Security:** | ||
|
|
||
| * <news item> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,7 @@ | |
|
|
||
| import json | ||
| import pathlib | ||
| import warnings | ||
|
|
||
| import click | ||
|
|
||
|
|
@@ -30,8 +31,14 @@ def _format_exception(exception) -> str: | |
| type=click.Path(dir_okay=False, file_okay=False, path_type=pathlib.Path), | ||
| help="Filepath at which to create and write the JSON-formatted results.", | ||
| ) # fmt: skip | ||
| @click.option( | ||
| "--resume", | ||
| is_flag=True, | ||
| default=False, | ||
| help=("Attempt to resume this transformation's execution using the cache."), | ||
| ) | ||
| @print_duration | ||
| def quickrun(transformation, work_dir, output): | ||
| def quickrun(transformation, work_dir, output, resume): | ||
| """Run the transformation (edge) in the given JSON file. | ||
|
|
||
| Simulation JSON files can be created with the | ||
|
|
@@ -51,7 +58,9 @@ def quickrun(transformation, work_dir, output): | |
| import logging | ||
| import os | ||
| import sys | ||
| from json import JSONDecodeError | ||
|
|
||
| from gufe import ProtocolDAG | ||
| from gufe.protocols.protocoldag import execute_DAG | ||
| from gufe.tokenization import JSON_HANDLER | ||
| from gufe.transformations.transformation import Transformation | ||
|
|
@@ -94,13 +103,40 @@ def quickrun(transformation, work_dir, output): | |
| else: | ||
| output.parent.mkdir(exist_ok=True, parents=True) | ||
|
|
||
| write("Planning simulations for this edge...") | ||
| dag = trans.create() | ||
| # Attempt to either deserialize or freshly create DAG | ||
| cache_basedir = work_dir / "quickrun_cache" | ||
| trans_DAG_json = cache_basedir / f"{trans.key}-ProtocolDAG.json" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right now the directory structure (with https://github.com/OpenFreeEnergy/gufe/pull/764/files) looks like: is there a better way to indicate that
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could move the dag creation up a little bit, and just use the dag key as part of the name? I.e. of you define
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in offline conversation, @IAlibay and I agree that moving dag creation up will not work, because each time .create() is called, a new unique ProtocolDAG is created. We also realize that, as of right now, the cache is only unique based on the transformation key and the
openfe will treat these as a re-execution and raise an error telling the user that they should pass in There are two possible solutions to this:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. option 2 has been implemented in #1890 |
||
|
|
||
| if trans_DAG_json.is_file(): | ||
| if resume: | ||
| write(f"Attempting to resume execution using existing edges from '{trans_DAG_json}'") | ||
| try: | ||
| dag = ProtocolDAG.from_json(trans_DAG_json) | ||
| except JSONDecodeError: | ||
| errmsg = f"Recovery failed, please remove {trans_DAG_json} and any results from your working directory before continuing to create a new protocol." | ||
| raise click.ClickException(errmsg) | ||
| else: | ||
| errmsg = f"Transformation has been started but is incomplete. Please remove {trans_DAG_json} and rerun, or resume execution using the ``--resume`` flag." | ||
| raise click.ClickException(errmsg) | ||
|
|
||
| else: | ||
| if resume: | ||
| click.echo( | ||
| f"openfe quickrun was run with --resume, but no checkpoint found at {trans_DAG_json}. Starting new execution." | ||
| ) | ||
|
|
||
| # Create the DAG instead and then serialize for later resuming | ||
| write("Planning simulations for this edge...") | ||
| dag = trans.create() | ||
| cache_basedir.mkdir(exist_ok=True) | ||
| dag.to_json(trans_DAG_json) | ||
|
|
||
| write("Starting the simulations for this edge...") | ||
| dagresult = execute_DAG( | ||
| dag, | ||
| shared_basedir=work_dir, | ||
| scratch_basedir=work_dir, | ||
| cache_basedir=cache_basedir, | ||
| keep_shared=True, | ||
| raise_error=False, | ||
| n_retries=2, | ||
|
|
@@ -126,6 +162,9 @@ def quickrun(transformation, work_dir, output): | |
| with open(output, mode="w") as outf: | ||
| json.dump(out_dict, outf, cls=JSON_HANDLER.encoder) | ||
|
|
||
| # remove the checkpoint since the job has completed | ||
| os.remove(trans_DAG_json) | ||
|
|
||
| write(f"Here is the result:\n\tdG = {estimate} ± {uncertainty}\n") | ||
| write("") | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.