Skip to content

feat: rgpot-compatible RPC serve mode with gateway#316

Merged
HaoZeke merged 17 commits intoTheochemUI:mainfrom
HaoZeke:feat/serve-mode
Feb 24, 2026
Merged

feat: rgpot-compatible RPC serve mode with gateway#316
HaoZeke merged 17 commits intoTheochemUI:mainfrom
HaoZeke:feat/serve-mode

Conversation

@HaoZeke
Copy link
Collaborator

@HaoZeke HaoZeke commented Feb 22, 2026

Summary

  • Add eonclient --serve mode that wraps any eOn potential as an rgpot PotentialBase over Cap'n Proto RPC
  • Four serving modes: single-potential, multi-model, replicated (N sequential ports), and gateway (single port, round-robin pool)
  • Config-driven serving via [Serve] INI section (host, port, replicas, gateway_port, endpoints)
  • Two-TU architecture (ServeMode.cpp / ServeRpcServer.cpp) to avoid eOn Potential vs capnp Potential name collision
  • Meson build integration (-Dwith_serve=true), rgpot subproject wrap, Catch2 unit tests
  • Python schema and config.yaml updates, pixi serve environment, CI workflow
  • User documentation with compilation, usage, and configuration reference

Functionality

Mode CLI Description
Single -p lj --serve-port 12345 One potential, one port
Multi-model --serve "lj:12345,eam_al:12346" Different potentials on different ports
Replicated -p lj --serve-port 12345 --replicas 4 Same potential on ports 12345--12348
Gateway -p lj --serve-port 12345 --replicas 6 --gateway Single port, round-robin pool
Config-driven --config serve.ini All options via [Serve] INI section

@github-actions
Copy link

github-actions bot commented Feb 22, 2026

eOn Documentation Preview

Download: documentation.zip

Unzip and open index.html to view.

@github-actions
Copy link

github-actions bot commented Feb 22, 2026

Benchmark Results

Note

All benchmarks unchanged

Count
⚪ Unchanged 8
8 unchanged benchmark(s)
Benchmark Before After Ratio
bench_eonclient.TimeMinimizationLJCluster.peakmem_minimization_lbfgs 27.1M 27.3M ~1.01x
bench_eonclient.TimeMinimizationLJCluster.time_minimization_lbfgs 36.8±0ms 37.3±0ms ~1.01x
bench_eonclient.TimeNEBMorsePt.peakmem_neb 27.1M 27.3M ~1.01x
bench_eonclient.TimeNEBMorsePt.time_neb 507±0ms 516±0ms ~1.02x
bench_eonclient.TimePointMorsePt.peakmem_point_evaluation 27.1M 27.3M ~1.01x
bench_eonclient.TimePointMorsePt.time_point_evaluation 6.77±0ms 7.04±0ms ~1.04x
bench_eonclient.TimeSaddleSearchMorseDimer.peakmem_saddle_search_dimer 27.1M 27.3M ~1.01x
bench_eonclient.TimeSaddleSearchMorseDimer.time_saddle_search_dimer 60.1±0ms 60.7±0ms ~1.01x
Details
  • Base: cd337927
  • Head: c6187a1b
  • Runner: ubuntu-22.04
Raw asv-spyglass output
All benchmarks:

| Change   | Before   | After    |   Ratio | Benchmark (Parameter)                                                             |
|----------|----------|----------|---------|-----------------------------------------------------------------------------------|
|          | 27.1M    | 27.3M    |    1.01 | bench_eonclient.TimeMinimizationLJCluster.peakmem_minimization_lbfgs              |
|          | 36.8±0ms | 37.3±0ms |    1.01 | bench_eonclient.TimeMinimizationLJCluster.time_minimization_lbfgs                 |
|          | 27.1M    | 27.3M    |    1.01 | bench_eonclient.TimeNEBMorsePt.peakmem_neb                                        |
|          | 507±0ms  | 516±0ms  |    1.02 | bench_eonclient.TimeNEBMorsePt.time_neb                                           |
|          | 27.1M    | 27.3M    |    1.01 | bench_eonclient.TimePointMorsePt.peakmem_point_evaluation                         |
|          | 6.77±0ms | 7.04±0ms |    1.04 | bench_eonclient.TimePointMorsePt.time_point_evaluation                            |
|          | 27.1M    | 27.3M    |    1.01 | bench_eonclient.TimeSaddleSearchMorseDimer.peakmem_saddle_search_dimer            |
|          | 60.1±0ms | 60.7±0ms |    1.01 | bench_eonclient.TimeSaddleSearchMorseDimer.time_saddle_search_dimer               |
|          | 27.1M    | 27.3M    |    1.01 | benchmarks.bench_eonclient.TimeMinimizationLJCluster.peakmem_minimization_lbfgs   |
|          | 35.8±0ms | 36.7±0ms |    1.03 | benchmarks.bench_eonclient.TimeMinimizationLJCluster.time_minimization_lbfgs      |
|          | 27.1M    | 27.3M    |    1.01 | benchmarks.bench_eonclient.TimeNEBMorsePt.peakmem_neb                             |
|          | 508±0ms  | 509±0ms  |    1    | benchmarks.bench_eonclient.TimeNEBMorsePt.time_neb                                |
|          | 27.1M    | 27.3M    |    1.01 | benchmarks.bench_eonclient.TimePointMorsePt.peakmem_point_evaluation              |
|          | 6.73±0ms | 7.14±0ms |    1.06 | benchmarks.bench_eonclient.TimePointMorsePt.time_point_evaluation                 |
|          | 27.1M    | 27.3M    |    1.01 | benchmarks.bench_eonclient.TimeSaddleSearchMorseDimer.peakmem_saddle_search_dimer |
|          | 60.2±0ms | 61.8±0ms |    1.03 | benchmarks.bench_eonclient.TimeSaddleSearchMorseDimer.time_saddle_search_dimer    |

Add serve mode to eonclient that wraps any eOn potential as an rgpot
PotentialBase served over Cap'n Proto RPC. Supports four operating modes:

- Single-potential: `eonclient -p lj --serve-port 12345`
- Multi-model: `eonclient --serve "lj:12345,eam_al:12346"`
- Replicated: `eonclient -p lj --serve-port 12345 --replicas 4`
- Gateway: `eonclient -p lj --serve-port 12345 --replicas 6 --gateway`

Gateway mode exposes a single port backed by a round-robin pool of
potential instances, so clients need only one address.

All modes are also configurable via INI config file with a [Serve]
section (host, port, replicas, gateway_port, endpoints).

Implementation uses a two-TU architecture to avoid naming collision
between eOn's Potential class and capnp-generated Potential interface.

Includes meson build integration (with_serve option, rgpot subproject
wrap), Catch2 unit tests for parseServeSpec, Python schema (schema.py),
config.yaml section, and user documentation.
- Set pure_lib: false for rgpot subproject (serve mode needs full lib)
- Move serve sources to eonclib_sources (fixes test linking)
- Trim whitespace after colon-split in parseServeSpec
  (fixes "lj : 12345" parsing where trailing space broke enum_cast)
Replace rgpot::PotentialBase virtual interface with a flat-array
ForceCallback (std::function) to avoid the name collision between
eOn's Eigen-based AtomMatrix and rgpot's custom AtomMatrix type.
Both were defined at global scope, causing segfaults when capnp
dispatched through the wrong vtable layout.

The serve code now only links ptlrpc_dep (capnp schema) from rgpot
with with_rpc_client_only:true -- no rgpot types cross the TU
boundary. ServeMode.cpp wraps eOn's Potential::force() in a lambda,
and ServeRpcServer.cpp converts capnp data to flat arrays directly.
Torch 2.9 bundles fmt 11.2 internally; pixi's spdlog was compiled
against pixi's fmt.  With fmt>=12 the ABI mismatch caused linker
errors when building metatomic+serve together.  Pin fmt to v11 for
the metatomic feature so both torch and spdlog use the same fmt ABI.

Also adds towncrier fragment for the AtomMatrix collision fix and
updates serve_mode.md with architecture notes and corrected Julia API.
capnproto is available on win-64 and the serve code uses no POSIX APIs,
so enable the serve feature on Windows. Also remove unused <csignal>
and <atomic> includes from ServeMode.cpp.
HaoZeke and others added 10 commits February 24, 2026 06:05
Add run_*.py scripts to each major example directory (akmc-al, akmc-pt,
akmc-cu-vacancy, basin-hopping, parallel-replica, neb-al) showing the
equivalent config.ini expressed as Python dicts via write_eon_config.
Also adds an advanced NEB example with climbing image, energy-weighted
springs, and MMF options.

New tutorial page (dict_config.md) documents the approach with
side-by-side INI/Python comparisons and a parameter sweep example.
References eon.schema as the single source of truth for option names.
…ples

docs: add dictionary-style config examples using `rgpycrumbs`
Add serve mode to eonclient that wraps any eOn potential as an rgpot
PotentialBase served over Cap'n Proto RPC. Supports four operating modes:

- Single-potential: `eonclient -p lj --serve-port 12345`
- Multi-model: `eonclient --serve "lj:12345,eam_al:12346"`
- Replicated: `eonclient -p lj --serve-port 12345 --replicas 4`
- Gateway: `eonclient -p lj --serve-port 12345 --replicas 6 --gateway`

Gateway mode exposes a single port backed by a round-robin pool of
potential instances, so clients need only one address.

All modes are also configurable via INI config file with a [Serve]
section (host, port, replicas, gateway_port, endpoints).

Implementation uses a two-TU architecture to avoid naming collision
between eOn's Potential class and capnp-generated Potential interface.

Includes meson build integration (with_serve option, rgpot subproject
wrap), Catch2 unit tests for parseServeSpec, Python schema (schema.py),
config.yaml section, and user documentation.
- Set pure_lib: false for rgpot subproject (serve mode needs full lib)
- Move serve sources to eonclib_sources (fixes test linking)
- Trim whitespace after colon-split in parseServeSpec
  (fixes "lj : 12345" parsing where trailing space broke enum_cast)
Replace rgpot::PotentialBase virtual interface with a flat-array
ForceCallback (std::function) to avoid the name collision between
eOn's Eigen-based AtomMatrix and rgpot's custom AtomMatrix type.
Both were defined at global scope, causing segfaults when capnp
dispatched through the wrong vtable layout.

The serve code now only links ptlrpc_dep (capnp schema) from rgpot
with with_rpc_client_only:true -- no rgpot types cross the TU
boundary. ServeMode.cpp wraps eOn's Potential::force() in a lambda,
and ServeRpcServer.cpp converts capnp data to flat arrays directly.
Torch 2.9 bundles fmt 11.2 internally; pixi's spdlog was compiled
against pixi's fmt.  With fmt>=12 the ABI mismatch caused linker
errors when building metatomic+serve together.  Pin fmt to v11 for
the metatomic feature so both torch and spdlog use the same fmt ABI.

Also adds towncrier fragment for the AtomMatrix collision fix and
updates serve_mode.md with architecture notes and corrected Julia API.
capnproto is available on win-64 and the serve code uses no POSIX APIs,
so enable the serve feature on Windows. Also remove unused <csignal>
and <atomic> includes from ServeMode.cpp.
Replace manual config table with autopydantic directive, add pixi serve
environment to compilation instructions, fix ASCII punctuation, remove
internal function name references, and link rgpot integration guide.
@HaoZeke HaoZeke merged commit 164d2f8 into TheochemUI:main Feb 24, 2026
12 checks passed
@HaoZeke HaoZeke deleted the feat/serve-mode branch February 24, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant