Skip to content

ECO: Add C++ Graph Edit Distance implementation with MCS optimization#3813

Open
alirezazd wants to merge 26 commits intogoogle:mainfrom
alirezazd:mcs_ged
Open

ECO: Add C++ Graph Edit Distance implementation with MCS optimization#3813
alirezazd wants to merge 26 commits intogoogle:mainfrom
alirezazd:mcs_ged

Conversation

@alirezazd
Copy link
Copy Markdown

@alirezazd alirezazd commented Feb 9, 2026

This PR adds a C++ implementation of Graph Edit Distance (GED) for the ECO system, replacing the existing Python/NetworkX approach.

What's included

  • Core GED algorithm with LAP solver integration
  • MCS (Maximum Common Subgraph) preprocessing for performance (~10-100x speedup)
  • Graph data structures for representing XLS IR
  • CLI tool (ged_main) and IR patch generation
  • Bazel build rules and integration tests
  • Third-party dependencies organized in libs/ (tinyxml2, scipy_lap)

Regarding the new C++ GED

The Python version works but is slow on large graphs. This C++ implementation is faster, integrates directly with XLS IR APIs, and supports advanced features like pinned nodes and MCS optimization.

Migration

Python tools (ir_diff_main.py, ir2gxl.py) are marked for deprecation but still functional. They'll be removed in a future PR once the C++ version is proven stable.

Testing

Builds cleanly and includes tests with real DSLX examples (crc32, apfloat_fmac, riscv_simple, etc).

@grebe Please consider reviewing this PR. I tried to organize it into separate commits for convenience.

@google-cla
Copy link
Copy Markdown

google-cla bot commented Feb 9, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@alirezazd alirezazd force-pushed the mcs_ged branch 2 times, most recently from 167e123 to 24d50c2 Compare February 9, 2026 18:44
@erinzmoore erinzmoore requested a review from grebe February 11, 2026 15:04
Copy link
Copy Markdown
Contributor

@grebe grebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all: thank you! This is exciting to see. Looks like a lot of good work and a big improvement.

Here's some initial feedback, I'm still working my way through. I think I have enough here to get the ball rolling, though, so I'm sending it out. Feel free to reach out directly or go back and forth here if anything I said was unclear or if you disagree.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deps shouldn't be under xls/, they should be managed from dependency_support/ or (ideally) BCR. tinyxml2 looks like it's in BCR. scipy_lsap.cpp is a bit trickier b/c our python deps aren't super well integrated into bazel (it's essentially pip installed, which afaik largely hides the internals). We typically try to avoid holding on to third_party/ source in here. I believe OR-tools has some implementations of linear assignment stuff and we already pull it in as a dep - is a pain to move to using it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially used OR-tools LASP solver but the performance took a hit. I'll do more experiments and will try to replace the external LSAP solver with something more convenient.

Regarding TinyXML, I think it would be nice to get rid custom parses (ir2nx, ir2gxl) and solely use XLS IR and generate graph directly from it before we merge this PR and after the current code is stable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: using XLS IR, I think it's fine to land this w/ tinyxml as long as you use the BCR version, but agree it would be better not to have custom parsing.

Comment thread xls/eco/test/BUILD Outdated
Comment thread xls/eco/eco_build_defs.bzl Outdated
Comment thread xls/eco/eco_build_defs.bzl Outdated
Comment thread xls/eco/ged.h Outdated
Comment thread xls/eco/ged_main.cc Outdated
Comment thread xls/eco/ged_main.cc Outdated
Comment thread xls/eco/ged_main.cc Outdated
Comment thread xls/eco/ged_main.cc Outdated
Comment thread xls/eco/graph.cc Outdated
@alirezazd
Copy link
Copy Markdown
Author

@grebe Ready for next round of review. I addressed most of the comments but the cost functions are still in a separate header, if you prefer them to live inside ged header, I can move them in a follow-up commit.

@proppy proppy requested a review from grebe March 3, 2026 07:07
Copy link
Copy Markdown
Contributor

@grebe grebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread xls/dev_tools/check_ir_equivalence_main.cc Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: using XLS IR, I think it's fine to land this w/ tinyxml as long as you use the BCR version, but agree it would be better not to have custom parsing.

Comment thread xls/eco/eco_build_defs.bzl Outdated
Comment thread xls/eco/eco_build_defs.bzl Outdated
Comment thread xls/eco/eco_build_defs.bzl Outdated
Comment thread xls/eco/eco_build_defs.bzl Outdated
Comment thread xls/eco/ged.h Outdated
Comment thread xls/eco/ged_cost_functions.h Outdated
Comment thread xls/contrib/eco/ged_main.cc
Comment thread xls/eco/graph.h Outdated
@alirezazd
Copy link
Copy Markdown
Author

alirezazd commented Mar 6, 2026

@grebe comments applied, ready for more :)
@proppy previous CI issues (unavailble build targets) should be fixed now.

Copy link
Copy Markdown
Contributor

@grebe grebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait.

Hopefully this isn't too annoying, but... I think it would be good to move this into contrib (xls/contrib/eco).

We're getting close!

Comment thread xls/eco/eco_build_defs.bzl Outdated
Comment thread xls/eco/graph.cc Outdated
Comment thread xls/eco/ir2nx.py Outdated
Comment thread xls/eco/ir_patch_gen.cc Outdated
Comment thread xls/contrib/eco/mcs.h
- Move graph data structures: graph.h/cpp -> graph.h/cc
- Move GXL parser: gxl_parser.h/cpp -> gxl_parser.h/cc
- Move LAP solver: lap.h/cc -> lap_solver.h/cc
- Move GED algorithm: ged.h/cc
- Move MCS algorithm: mcs.h/cpp -> mcs.h/cc
- Move cost functions: cost_functions.h -> ged_cost_functions.h
- Move main binary: main.cpp -> ged_main.cc
- Move IR patch gen: ir_patch_gen.h/cc
- Update all header guards to XLS_ECO_* format
- Update all includes to use xls/eco/ prefix
- Rename .cpp -> .cc for XLS convention

This flattens the directory structure and makes the codebase more
consistent with XLS conventions. The mcs_ged/ directory now only
contains BUILD file and third-party libs/.
- Move tinyxml2 files into libs/tinyxml2/ subdirectory
- Add scipy_lap in libs/scipy_lap/ subdirectory
- Update BUILD paths for new lib structure
- Each third-party lib now has its own directory
- graph.h/cc: XLSGraph, XLSNode, XLSEdge for representing XLS IR as graphs
- gxl_parser.h/cc: Parse GXL (Graph eXchange Language) format
- Foundation for cpp graph edit distance algorithm
- lap_solver.cc: Wrapper for scipy LAP solver
- Used by GED algorithm for optimal node matching
- ged.h/cc: Core GED algorithm with cost matrix and LAP integration
- Supports node/edge substitution, insertion, deletion costs
- Handles pinned nodes (nodes that must match)
- Add cc_library targets: graph, gxl_parser, lap_solver, ged, mcs
- Add cc_binary: ged_main
- Add ir_patch_gen_lib target
- All GED infrastructure now buildable
- mcs.h/cc: Find maximum common subgraph between two graphs
- Preprocessing step that accelerates GED by identifying node correspondences
- Reduces GED search space significantly
- ged_main.cc: CLI tool to compute GED between two GXL graphs
- Supports MCS preprocessing option
- Outputs edit operations and cost
- ir_patch_gen.h/cc: Convert GED results to IR patch protobuf
- ir_patch.proto: Updated proto definitions
- patch_ir.h/cc/main.cc: Apply patches to XLS IR
- ir_patch_gen.py: Python utilities for patch generation
- Replace requirement() with @xls_pip_deps// for old Python tools
- Update Receive node construction to match new upstream API signature
- Add payload_type parameter required by new Receive constructor
This commit contains two types of changes:
1. Upstream updates to ir2nx.py merged during rebase (large diff)
2. Our deprecation TODOs added to Python GED toolchain:
   - ir_diff.py: NetworkX GED → ged.h/cc (10-100x faster)
   - ir_diff_main.py: Python tool → ged_main.cc CLI
   - ir_patch_gen.py: Python patcher → patch_ir.h/cc
   - ir2nx.py: NetworkX converter → xls_ir_to_cytoscape.cc
   - ir2gxl.py: GXL exporter → gxl_parser.h/cc (from upstream)
Replace the SciPy LSAP path with a native C++ LSAP solver.
Switch GED cost construction from sparse-style handling to dense matrix paths.

Add C++ ECO components for GED, MCS, graph modeling, GXL parsing, and patch generation.
Introduce eco-specific Starlark build defs and wire patching plus equivalence validation targets.

Standardize main-flag handling and add optional execution statistics reporting.

Add ECO regression fixtures and test targets for crc32, riscv, apfloat, fir, histogram, and vector_core flows.
Use path-based equivalence report flag and write reports for mismatch results
Replace run_shell wrapper with direct Bazel run
Remove short flag aliases from patch_ir_main
Normalize ECO build rule args/report handling and clean stale target references
 Fixed  init parsing bug in IrParser
 Fixed a critical  bug  in MakePruneFunction
@alirezazd
Copy link
Copy Markdown
Author

alirezazd commented Apr 13, 2026

@grebe Sorry for late response I was busy with some academic tasks. I applied most of the recent comments, currently working on moving the xls/eco to xls/contrib/eco. Please consider reviewing again.

P.S. The test //xls/contrib/eco/test:vector_core_patched_opt_ir (~1k nodes) now completes in under 4 seconds with <220 MB RAM when MCS is enabled. :)

@alirezazd alirezazd requested a review from grebe April 13, 2026 04:28
Relocate the ECO package from xls/eco to xls/contrib/eco and update references.

This includes Bazel labels, Starlark loads, C++ include paths, Python imports, runfiles paths, and tooling path filters.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use BCR rather than checking this in

"Path to the IR file to patch."); // NOLINT
namespace {

using namespace xls;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: more typically, we'll do

namespace xls {
namespace {
...
}
}

@alirezazd
Copy link
Copy Markdown
Author

@grebe Do you have a preference here:

  1. Keep TinyXML and the existing Python tooling (parser, GED, patch generation, NetworkX), or
  2. Deprecate that stack in favor of xls/contrib/eco/xls_ir_to_cytoscape.cc?

Happy to go either way,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants