Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
11710a7
Modify quickrun to allow resuming
IAlibay Feb 16, 2026
322bc23
fix the gather tests
IAlibay Feb 16, 2026
a61598e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 16, 2026
6579c04
Merge branch 'main' of github.com:OpenFreeEnergy/openfe into quickrun…
atravitz Mar 11, 2026
0f43f6f
add check for protocol_dag.json
atravitz Mar 11, 2026
5e1a21c
add basic test
atravitz Mar 11, 2026
182562f
clearer language, hopefully
atravitz Mar 11, 2026
3180b8c
store protocol dag using transformation key
atravitz Mar 11, 2026
1c0fdf7
another tmpdir -> tmp_path fix
atravitz Mar 12, 2026
156320b
add error handling check
atravitz Mar 12, 2026
360882a
fix naming in test
atravitz Mar 13, 2026
04d47bb
add news item
atravitz Mar 13, 2026
1055c1b
use assert_click_success
atravitz Mar 13, 2026
2b1000e
Merge branch 'main' into quickrun_resume
atravitz Mar 17, 2026
c8a03d8
add test for interrupted job
atravitz Mar 18, 2026
d735c7f
remove checkpoint when a job has completed successfully
atravitz Mar 18, 2026
6518499
add handling for checkpoint error handling without --resume
atravitz Mar 18, 2026
5cd437e
clean up logic
atravitz Mar 18, 2026
61a97b6
check for warning
atravitz Mar 18, 2026
48ab9c8
add docs
atravitz Mar 18, 2026
31a6589
make a cache dir
atravitz Mar 18, 2026
f8ecc73
Update the CLI quickrun help info
IAlibay Mar 19, 2026
10b42df
Add userguide documentation on how to use quickrun
IAlibay Mar 19, 2026
17aad5c
this reference instead?
IAlibay Mar 19, 2026
7d870dc
fix things a little bit
IAlibay Mar 19, 2026
99ae399
update the title
IAlibay Mar 19, 2026
cbbbd7c
Merge branch 'main' into resume_cli_ug
IAlibay Mar 23, 2026
0501033
Update environment.yml
IAlibay Mar 23, 2026
9cf25a5
Update docs/guide/cli/quickrun.rst
IAlibay Mar 23, 2026
3cd7121
Update docs/guide/cli/quickrun.rst
IAlibay Mar 23, 2026
7690ac3
update for new name for now
IAlibay Mar 23, 2026
af51f23
Merge branch 'resume_cli_ug' of github.com:OpenFreeEnergy/openfe into…
IAlibay Mar 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/guide/cli/cli_basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ the subcommand name, e.g., ``openfe quickrun --help``, which returns
exist, it will be created at runtime.
-o PATH Filepath at which to create and write the JSON-
formatted results.
--resume Attempt to resume this transformation's execution
using the cache.
-h, --help Show this message and exit.

For more details on various commands, see the :ref:`cli-reference`.
1 change: 1 addition & 0 deletions docs/guide/cli/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ into non-Python workflows.
.. toctree::
cli_basics
cli_yaml
quickrun
85 changes: 85 additions & 0 deletions docs/guide/cli/quickrun.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
.. _userguide_cli_quickrun:

Using Quickrun to execute Transformations
=========================================

The ``openfe quickrun`` command executes a single alchemical Transformation.
This is currrently the primary way to execute Transformations after they
have been created during network planning.


Basic Usage
-----------

To run a Transformation (``transformation.json``) and save results to ``results.json``:

.. code:: none

openfe quickrun transformation.json -d workdir/ -o workdir/results.json

The ``-d`` / ``--work-dir`` flag controls where working files (checkpoints,
trajectory data, etc...) are written. If it is ommited, the current directory
will be used.

The ``-o`` flag controls where the results file will be written. If it is omitted,
results are written to a file named ``<transformation_key>_results.json`` in the working directory, where `<transformation_key>` is a unique identifier.


Resuming a halted Job
---------------------

When ``openfe quickrun`` starts, it saves a plan of the simulation to a
cache file before execution begins:

.. code:: none

<work-dir>/quickrun_cache/<transformation_key>-ProtocolDAG.json

This cache is automatically removed once the job completes successfully.

If a job is interrupted (e.g. due to a wall-time limit, node failure, or
manual cancellation), you can resume the interrupted job by passing the ``--resume`` flag:

.. code:: none

openfe quickrun transformation.json -d workdir/ -o workdir/results.json --resume

The planned simulation cache will be used to identify where in the simulation
process it is and, if supported by the Transformation Protocol, how to resume.

.. note::

The same ``-d`` / ``--work-dir`` used in the original run
must be specified so that ``quickrun`` can locate the cache file.

If you pass ``--resume`` but no cache file is found (e.g. the job never
started), the following warning is printed and a fresh execution begins:

.. code:: none

No checkpoint found at <work-dir>/quickrun_cache/<transformation_key>-protocolDAG.json!
Starting new execution.

If the cache file is corrupted (e.g. due to an incomplete write at
the moment of interruption), ``quickrun --resume`` will raise an error with instructions to rerun the simulation:

.. code:: none

Recovery failed, please remove <work-dir>/quickrun_cache/<transformation_key>-protocolDAG.json
and any results from your working directory before continuing to create a new protocol, or run without `--resume`.

Comment on lines +67 to +70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Recovery failed, please remove <work-dir>/quickrun_cache/<transformation_key>-protocolDAG.json
and any results from your working directory before continuing to create a new protocol, or run without `--resume`.
Recovery failed, please remove <work-dir>/quickrun_cache/<transformation_key>-protocolDAG.json
and any results from your working directory before continuing to create a new protocol.

updating to reflect the changed behavior.

If you do not pass the ``--resume`` flag, the code will detect the partially
complete transformation and prevent you from accidentally starting a duplicate
run. The following error will be raised:

.. code:: none

RuntimeError: Transformation has been started but is incomplete. Please
remove <path>/quickrun_cache/<key>-protocolDAG.json and rerun, or resume
execution using the ``--resume`` flag.

See Also
--------

- :ref:`cli-reference` - full CLI reference for ``openfe quickrun``
- :ref:`rbfe_cli_tutorial` - a tutorial on how to use the CLI to run hybrid topology relative binding free energy calculations.
24 changes: 24 additions & 0 deletions news/quickrun_resume.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
**Added:**

* Added ``--resume`` flag to ``openfe quickrun``.
Quickrun now temporarily caches ``protocolDAG`` information and when used with the ``--resume`` flag, quickrun will attempt resume execution of an incomplete transformation.

**Changed:**

* <news item>

**Deprecated:**

* <news item>

**Removed:**

* <news item>

**Fixed:**

* <news item>

**Security:**

* <news item>
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,12 @@ def patcher():
yield


def test_gather(benzene_complex_dag, patcher, tmpdir):
def test_gather(benzene_complex_dag, patcher, tmp_path):
# check that .gather behaves as expected
dagres = gufe.protocols.execute_DAG(
benzene_complex_dag,
shared_basedir=tmpdir,
scratch_basedir=tmpdir,
shared_basedir=tmp_path,
scratch_basedir=tmp_path,
keep_shared=True,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,12 @@ def patcher():
yield


def test_gather(benzene_solvation_dag, patcher, tmpdir):
def test_gather(benzene_solvation_dag, patcher, tmp_path):
# check that .gather behaves as expected
dagres = gufe.protocols.execute_DAG(
benzene_solvation_dag,
shared_basedir=tmpdir,
scratch_basedir=tmpdir,
shared_basedir=tmp_path,
scratch_basedir=tmp_path,
keep_shared=True,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ def test_unit_tagging(solvent_protocol_dag, tmpdir):
assert len(repeats) == 3


def test_gather(solvent_protocol_dag, tmpdir):
def test_gather(solvent_protocol_dag, tmp_path):
# check .gather behaves as expected
with mock.patch(
"openfe.protocols.openmm_md.plain_md_methods.PlainMDProtocolUnit.run",
Expand All @@ -519,8 +519,8 @@ def test_gather(solvent_protocol_dag, tmpdir):
):
dagres = gufe.protocols.execute_DAG(
solvent_protocol_dag,
shared_basedir=tmpdir,
scratch_basedir=tmpdir,
shared_basedir=tmp_path,
scratch_basedir=tmp_path,
keep_shared=True,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1257,12 +1257,12 @@ def test_unit_tagging(solvent_protocol_dag, unit_mock_patcher, tmpdir):
assert len(setup_results) == len(sim_results) == len(analysis_results) == 3


def test_gather(solvent_protocol_dag, unit_mock_patcher, tmpdir):
def test_gather(solvent_protocol_dag, unit_mock_patcher, tmp_path):
# check .gather behaves as expected
dagres = gufe.protocols.execute_DAG(
solvent_protocol_dag,
shared_basedir=tmpdir,
scratch_basedir=tmpdir,
shared_basedir=tmp_path,
scratch_basedir=tmp_path,
keep_shared=True,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1295,7 +1295,7 @@ def test_unit_tagging(benzene_toluene_dag, tmpdir):
assert len(complex_repeats) == len(solv_repeats) == 2


def test_gather(benzene_toluene_dag, tmpdir):
def test_gather(benzene_toluene_dag, tmp_path):
# check that .gather behaves as expected
with (
mock.patch(
Expand Down Expand Up @@ -1339,8 +1339,8 @@ def test_gather(benzene_toluene_dag, tmpdir):
):
dagres = gufe.protocols.execute_DAG(
benzene_toluene_dag,
shared_basedir=tmpdir,
scratch_basedir=tmpdir,
shared_basedir=tmp_path,
scratch_basedir=tmp_path,
keep_shared=True,
)

Expand Down
42 changes: 39 additions & 3 deletions src/openfecli/commands/quickrun.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import json
import pathlib
import warnings

import click

Expand Down Expand Up @@ -30,8 +31,14 @@ def _format_exception(exception) -> str:
type=click.Path(dir_okay=False, file_okay=False, path_type=pathlib.Path),
help="Filepath at which to create and write the JSON-formatted results.",
) # fmt: skip
@click.option(
"--resume",
is_flag=True,
default=False,
help=("Attempt to resume this transformation's execution using the cache."),
)
@print_duration
def quickrun(transformation, work_dir, output):
def quickrun(transformation, work_dir, output, resume):
"""Run the transformation (edge) in the given JSON file.

Simulation JSON files can be created with the
Expand All @@ -51,7 +58,9 @@ def quickrun(transformation, work_dir, output):
import logging
import os
import sys
from json import JSONDecodeError

from gufe import ProtocolDAG
from gufe.protocols.protocoldag import execute_DAG
from gufe.tokenization import JSON_HANDLER
from gufe.transformations.transformation import Transformation
Expand Down Expand Up @@ -94,13 +103,37 @@ def quickrun(transformation, work_dir, output):
else:
output.parent.mkdir(exist_ok=True, parents=True)

write("Planning simulations for this edge...")
dag = trans.create()
# Attempt to either deserialize or freshly create DAG
trans_DAG_json = work_dir / "quickrun_cache" / f"{trans.key}-protocolDAG.json"

if trans_DAG_json.is_file():
if resume:
write(f"Attempting to resume execution using existing edges from '{trans_DAG_json}'")
try:
dag = ProtocolDAG.from_json(trans_DAG_json)
except JSONDecodeError:
errmsg = f"Recovery failed, please remove {trans_DAG_json} and any results from your working directory before continuing to create a new protocol, or run without `--resume`."
raise click.ClickException(errmsg)
else:
errmsg = f"Transformation has been started but is incomplete. Please remove {trans_DAG_json} and rerun, or resume execution using the ``--resume`` flag."
raise RuntimeError(errmsg)

else:
if resume:
warnings.warn(f"No checkpoint found at {trans_DAG_json}! Starting new execution.")

# Create the DAG instead and then serialize for later resuming
write("Planning simulations for this edge...")
dag = trans.create()
pathlib.Path(work_dir, "quickrun_cache").mkdir(exist_ok=True)
dag.to_json(trans_DAG_json)

write("Starting the simulations for this edge...")
dagresult = execute_DAG(
dag,
shared_basedir=work_dir,
scratch_basedir=work_dir,
unitresults_basedir=work_dir,
keep_shared=True,
raise_error=False,
n_retries=2,
Expand All @@ -126,6 +159,9 @@ def quickrun(transformation, work_dir, output):
with open(output, mode="w") as outf:
json.dump(out_dict, outf, cls=JSON_HANDLER.encoder)

# remove the checkpoint since the job has completed
os.remove(trans_DAG_json)

write(f"Here is the result:\n\tdG = {estimate} ± {uncertainty}\n")
write("")

Expand Down
Loading
Loading