BE-428: HashQL: Simplify traversal tracking to path recording#8498
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
|
This pull request is too large for Augment to review. The PR exceeds the maximum size limit of 100000 tokens (approximately 400000 characters) for automated code review. Please consider breaking this PR into smaller, more focused changes. |
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
PR SummaryMedium Risk Overview The placement solver is rewired to operate on a new Written by Cursor Bugbot for commit 7014842. This will update automatically on new commits. Configure here. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## bm/be-434-hashql-entity-path-resolution-and-storage-mapping #8498 +/- ##
===============================================================================================
+ Coverage 63.03% 63.15% +0.11%
===============================================================================================
Files 1303 1313 +10
Lines 132791 133550 +759
Branches 5507 5498 -9
===============================================================================================
+ Hits 83709 84342 +633
- Misses 48170 48297 +127
+ Partials 912 911 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Stale comment and unnecessary reverse after removing traversal costs
- Removed the unnecessary reverse-order iteration and stale traversal-cost comment so target statement placement now reflects its order-independent behavior.
Or push these changes by commenting:
@cursor push 58ca5bc4a4
Preview (58ca5bc4a4)
diff --git a/libs/@local/hashql/mir/src/pass/execution/mod.rs b/libs/@local/hashql/mir/src/pass/execution/mod.rs
--- a/libs/@local/hashql/mir/src/pass/execution/mod.rs
+++ b/libs/@local/hashql/mir/src/pass/execution/mod.rs
@@ -72,10 +72,7 @@
let mut statement_costs: TargetArray<_> = TargetArray::from_fn(|_| None);
- let mut targets = TargetId::all();
- targets.reverse(); // We reverse the order, so that earlier targets (aka the interpreter) can have access to traversal costs
-
- for target in targets {
+ for target in TargetId::all() {
let mut statement = TargetPlacementStatement::new_in(target, &self.scratch);
let statement_cost =
statement.statement_placement_in(context, body, vertex, &self.scratch);There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |


🌟 What is the purpose of this PR?
Replaces the traversal extraction transform pass (which rewrote MIR bodies by splicing load statements and creating new locals) with a pure analysis pass that records which entity paths each block accesses as bitsets. Introduces per-block path transfer cost charging so the placement solver can account for data locality when assigning blocks to backends.
🔍 What does this change?
traversal/analysis/): walks the body, resolves vertex projections viaEntityPath::resolve(), records accessed paths asTraversalPathBitSet. No MIR mutation.cost/analysis.rs): newBasicBlockCostAnalysiscombines per-statement costs with a path transfer premium. For each block, charges a transfer premium on targets that are not the natural origin for the accessed paths (e.g., Interpreter pays a premium forVectorsbecause Embedding is the origin). Premiums are charged once per block, not per statement.BasicBlockCostVec: the placement solver now operates on precomputed per-block costs.PlacementSolverContexttakesblocks: &BasicBlockCostVecinstead of separate statement costs and assignment arrays.BlockPartitionedVec<T>: generic offset-table-backed per-block storage, extracted fromStatementCostVecinternals.TraversalLattice: lattice structure carryingVertexTypefor dataflow analysis over path bitsets.EntityPath::origin(),EntityPath::estimate_size(): per-path origin backend and transfer size estimation.TransferCostConfig: variable-size parameters (properties, embeddings, provenance sizes, per-target cost multiplier).FiniteBitSet::negate()in hashql-core.pass/transform/traversal_extraction/(the MIR transform and its tests).Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
Path transfer premiums use pure over-approximation: every non-origin backend pays the full premium with no discounting for co-location. This can over-penalize non-origin assignment when multiple blocks access the same path on the same backend (cross-block dedup is not implemented). This is intentional: separable costs are a prerequisite for the PBQP solver planned in BE-436.
The ideal cost model is non-separable: the first block on a backend pays the full data fetch cost, subsequent co-labeled blocks pay zero ("shared-fetch" / pay-once-per-group). This maps to submodular clique costs in the optimization literature. Two frameworks formalize this:
The multi-label extension via alpha-expansion (Boykov, Veksler, Zabih 2001) reduces k-label to a sequence of binary cooperative cuts (each solvable exactly), but only guarantees a local minimum. Our
TransMatrixis not metric, so no global approximation factor holds. For k=3 backends (growing to 6-7), the overhead of iterative cooperative cut calls on ~100-node graphs is not justified when PBQP R1/R2 reductions on the separable over-approximation give exact solutions in microseconds.Pure over-approximation does not distort the relative ranking between non-origin backends (the premium cancels in pairwise comparisons) but inflates the absolute cost of non-origin assignment, biasing toward origin backends. Intra-block dedup (implemented here) is the first mitigation; cross-block dedup remains an open problem for k > 2 on cyclic graphs.
:
🛡 What tests cover this?
BasicBlockCostAnalysiscovering: no vertex access, single origin/non-origin paths, multiple path accumulation, composite path expansion, restricted target domains, cross-block independenceTraversalAnalysisVisitorcovering path resolution from MIR projectionsEntityPathBitSetcovering composite swallowing, lattice operations, normalizationBasicBlockCostVec; newpath_premiums_influence_placementandprovenance_variants_produce_different_premiumsintegration testsmixed_postgres_embedding_interpretertest verifying three-backend splitting❓ How to test this?
cargo nextest run --package hashql-mircargo test --package hashql-mir --doc