Skip to content

Fail closed prompt cache signatures for opaque, unstable, and recursive inputs#12936

Open
xmarre wants to merge 52 commits intoComfy-Org:masterfrom
xmarre:master
Open

Fail closed prompt cache signatures for opaque, unstable, and recursive inputs#12936
xmarre wants to merge 52 commits intoComfy-Org:masterfrom
xmarre:master

Conversation

@xmarre
Copy link
Copy Markdown

@xmarre xmarre commented Mar 14, 2026

Summary

This hardens prompt cache signature generation so ComfyUI fails closed when node inputs contain opaque runtime objects, mutated live containers, recursive or pathologically deep built-in structures, or otherwise non-canonical data during prompt setup.

The failure being addressed here happens during prompt setup, inside cache signature generation, before sampling starts.

The key change in the current version is that cache signatures for prompt inputs are now built in a single fail-closed canonicalization pass, instead of first sanitizing into an intermediate nested structure and then traversing that structure again in a second pass.

In addition, execution-time is_changed values are no longer deep-canonicalized through the generic path. They are now reduced through a shallow primitive-only signature helper, and non-canonical or nested runtime payloads fail closed to Unhashable().

Problem

get_immediate_node_signature() appends raw non-link input values into the cache signature, and it also incorporates execution-time is_changed state into the immediate-node signature path.

Historically, the cache-signature path then walked nested values broadly enough that foreign runtime objects could be treated like prompt-safe structures. In complex custom-node workflows, that could lead to unsafe recursion or inspection during cache signature building.

Even plain built-in containers are not automatically safe. They can still be:

  • cyclic
  • pathologically deep
  • shared across multiple branches
  • mutated while being traversed
  • ambiguously ordered when canonicalization depends on unstable live container state

That is especially undesirable in core prompt setup code. Even if a custom node or execution wrapper feeds an unexpected object into inputs or is_changed state, ComfyUI should not destabilize while building cache keys.

Observed behavior

In the affected workflow, ComfyUI could fail during prompt setup with a fatal crash in the cache/signature path, before sampling started.

The relevant stack ended in:

  • comfy_execution.caching.get_immediate_node_signature
  • comfy_execution.caching.get_node_signature
  • comfy_execution.caching.add_keys
  • comfy_execution.caching.set_prompt

Follow-up reproductions also showed that repeated queued generations could still crash when later prompts re-entered cache signature generation and touched problematic nested state.

Root cause

The cache signature path assumed node inputs and execution-time is_changed values were composed of plain prompt-native values and stable built-in containers.

In practice, custom nodes and execution wrappers can inject non-prompt-native objects into inputs or execution-time state during things like:

  • on-prompt graph rewriting
  • dynamic subgraph construction
  • execution wrapping

Those objects should not be recursively walked by the cache key builder.

In addition, even built-in containers can still be unsafe to traverse as live data structures if they are cyclic, excessively deep, shared in DAG-like ways, mutated during traversal, or ambiguously ordered.

I did not narrow this down to one specific extension or node as the definitive injector. The important issue at the core level is that prompt-signature generation should handle such inputs safely regardless of where they came from.

Changes

This patch makes five targeted changes.

1. Strict link detection

graph_utils.is_link() now only accepts plain Python lists in the expected [node_id, output_index] form, and it requires the output index to be a plain integer.

This avoids treating foreign list-like objects as graph links and rejects malformed link shapes earlier.

2. Build prompt-input cache signatures in a single fail-closed canonicalization pass

For CacheKeySetInputSignature, the assembled signature is now canonicalized directly into its final hashable representation in one pass.

Instead of:

  • sanitizing into an intermediate nested built-in structure, then
  • traversing that rebuilt structure again in a second hash-conversion pass

the current path now:

  • snapshots live built-in containers before descending into them
  • only recurses through plain built-in container types
  • preserves built-in container type in the final representation
  • memoizes shared built-in substructures
  • fails closed immediately when canonicalization becomes unsafe or ambiguous

This removes the second deep traversal from the affected prompt-input cache-signature path.

3. Fail closed for unsafe or unstable signature input

The canonicalization path now returns Unhashable() for the full signature when it encounters conditions such as:

  • opaque runtime objects
  • cycles
  • excessive depth
  • traversal budget exhaustion
  • ambiguous dict or set/frozenset ordering
  • RuntimeError raised while iterating live containers
  • live container mutation observed during snapshot/traversal-sensitive paths

This avoids continuing deeper into suspicious nested state and prevents stale cache reuse based on unstable runtime data.

4. Stop deep canonicalization of execution-time is_changed values

get_immediate_node_signature() no longer feeds raw is_changed payloads into the generic deep canonicalization path.

Instead, is_changed values are reduced through a shallow signature helper:

  • primitive values remain hashable directly
  • primitive-only list / tuple values are preserved under explicit is_changed_list / is_changed_tuple tags
  • nested containers, recursive payloads, and traversal-time failures degrade to Unhashable() immediately

This prevents execution-time is_changed state from reintroducing deep traversal over non-canonical runtime payloads.

5. Harden legacy hash conversion for plain built-in containers

to_hashable() remains hardened for plain built-in containers and still:

  • restricts recursion to plain built-in container types
  • preserves built-in container type in the resulting representation
  • handles shared built-in substructures more safely
  • canonicalizes dicts independently of insertion order when safe
  • fails closed for ambiguous unordered container ordering
  • degrades to Unhashable() on traversal-time RuntimeError

In addition, ordered and unordered containers now fail closed at the container level when child canonicalization fails, instead of preserving partially failed child sentinels inside the resulting structure.

This keeps the helper itself safer where it is still used directly, while the affected prompt-input cache-signature path no longer relies on it as a second-pass traversal step.

Why this approach

This keeps normal cache behavior intact for standard prompt data while hardening the signature builder against objects that should never have been traversed in the first place.

The goal here is not to declare those foreign or unstable objects valid prompt inputs. The goal is to make core caching logic robust when custom-node ecosystems do unusual things.

Failing closed to Unhashable() is the safe fallback:

  • no unsafe recursion into foreign container semantics
  • no reliance on live container state during canonicalization
  • no stale cache reuse based on unstable runtime data
  • no second deep traversal of suspicious intermediate structures
  • no deep canonicalization of non-canonical is_changed payloads
  • no crash in prompt setup

At the same time, preserving built-in container type avoids introducing new cache-key collisions for normal inputs, and keeping is_changed_list / is_changed_tuple distinct avoids collapsing execution-time is_changed shapes into ordinary input-value signatures.

Compatibility impact

This does not change normal behavior for prompt-native values, stable built-in containers, or shallow primitive-only is_changed values.

The change only affects cases where signature generation encounters opaque runtime objects, unstable live containers, recursive or pathologically deep built-ins, ambiguous container ordering, nested non-canonical is_changed payloads, or other inputs that cannot be canonicalized safely. In those cases, the cache builder now fails safe instead of continuing unsafe traversal.

It also keeps cache keys type-stable for normal built-in containers, so type changes in user inputs do not get flattened into the same signature shape.

Testing

Tested against a complex custom-node workflow that includes dynamic graph behavior and execution-affecting extensions.

Validation performed:

  • confirmed normal prompt-safe inputs still produce signatures
  • confirmed built-in container types remain distinguishable in cache signatures
  • confirmed foreign or opaque runtime objects fail closed instead of being recursively walked
  • confirmed malformed float output indices are rejected at link validation time
  • confirmed cyclic built-in containers fail closed safely
  • confirmed deeply nested built-in containers fail closed at the configured depth limit
  • confirmed shared built-in substructures are handled without repeated exponential traversal
  • confirmed ambiguous dict ordering fails closed
  • confirmed ambiguous set/frozenset ordering fails closed
  • confirmed live container snapshotting is used before recursive descent
  • confirmed traversal-time RuntimeError degrades to Unhashable()
  • confirmed primitive-only is_changed lists remain hashable
  • confirmed nested is_changed containers fail closed
  • confirmed recursive is_changed payloads fail closed before deep canonicalization
  • confirmed ordered containers with opaque children now degrade to top-level Unhashable()
  • added regression coverage for the hardening cases above
  • confirmed touched files still compile

Scope

This patch intentionally hardens the crash site in core rather than trying to special-case any one extension or workflow.

The core issue here is that cache signature generation should handle opaque, unstable, cyclic, ambiguous, pathologically deep, or otherwise non-canonical inputs safely instead of destabilizing prompt setup.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c1bfad0df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread comfy_execution/caching.py Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 83acba8b-a3a5-4e4a-8c25-c8f71fc63704

📥 Commits

Reviewing files that changed from the base of the PR and between ece9063 and 1548aee.

📒 Files selected for processing (1)
  • comfy/sd.py

📝 Walkthrough

Walkthrough

Reimplements deterministic signature canonicalization via a new iterative, memoized to_hashable(obj, max_nodes=...) with depth and visit limits and fail-closed Unhashable sentinel. Replaces Unhashable with a marker class. Adds signature/type constants and helpers (_shallow_is_changed_signature, _primitive_signature_sort_key, _sanitized_sort_key, _signature_to_hashable_impl). Changes CacheKeySetInputSignature.get_node_signature and get_immediate_node_signature to return tuples and propagate Unhashable. Tightens graph_utils.is_link to accept only real lists with a string first element and an int second element. Adds extensive unit tests for canonicalization, fail-closed behavior, and node-signature integration.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main focus of the PR: hardening cache signature generation to fail safely for problematic inputs (opaque, unstable, recursive data).
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, root cause, specific changes, and testing performed for the cache signature hardening work.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 62-67: The current dict handling in _sanitize_signature_input
calls repr() on raw dict keys before sanitization which can execute unsafe
custom __repr__; change the logic to first sanitize each key and value (call
_sanitize_signature_input on keys), then sort by the sanitized key
representation (or use the sanitized key directly) before building the tuple of
pairs, so no raw key repr() is invoked; likewise in to_hashable(), remove the
explicit sort of original keys and instead build the frozenset from sanitized
key/value pairs (or rely on the frozenset of sanitized items) so sorting of raw
objects is not performed.

In `@comfy_execution/graph_utils.py`:
- Around line 13-14: The validator currently allows floats for the output index
(checks "type(obj[1]) not in (int, float)"), so change the check to only accept
integers: replace the condition with a strict int check (e.g., require
type(obj[1]) is int or use isinstance(obj[1], int) but avoid allowing float) so
that malformed links like ["node", 0.0] are rejected early; update the
validation around obj[1] in the function that contains this snippet to only
accept integer indices.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9146377b-3877-470f-a635-2c8f64664631

📥 Commits

Reviewing files that changed from the base of the PR and between 16cd8d8 and 0c1bfad.

📒 Files selected for processing (2)
  • comfy_execution/caching.py
  • comfy_execution/graph_utils.py

Comment thread comfy_execution/caching.py Outdated
Comment thread comfy_execution/graph_utils.py Outdated
@xmarre xmarre changed the title Harden prompt cache signature building against foreign runtime objects in node inputs Harden prompt cache signature building against opaque and recursive inputs Mar 14, 2026
@xmarre xmarre changed the title Harden prompt cache signature building against opaque and recursive inputs Harden prompt cache signature building against foreign runtime objects in node inputs Mar 14, 2026
@xmarre xmarre changed the title Harden prompt cache signature building against foreign runtime objects in node inputs Harden prompt cache signature building against opaque and recursive inputs Mar 14, 2026
@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 14, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 14, 2026

✅ Actions performed

Reviews resumed.

@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 14, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 88-127: The sanitizer currently only limits recursion depth and
can blow up on branched cyclic containers; modify _sanitize_signature_input to
accept or create a visited set (e.g., visited=None -> visited=set()), track
container identities via id(obj) before walking, and if id(obj) is already in
visited return Unhashable() immediately to break cycles; add id(obj) to visited
before recursing into dict/list/tuple/set/frozenset and remove it after
processing (or use a copy of visited for immutability) so shared but acyclic
structures still get processed; keep using Unhashable() for cycles and preserve
all existing behavior and return types for functions/methods named in the diff
(_sanitize_signature_input, _sanitized_sort_key, Unhashable).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: da468cb1-ef6e-4c6d-80c4-8b4a9c8c01f7

📥 Commits

Reviewing files that changed from the base of the PR and between 4d9516b and 1451001.

📒 Files selected for processing (2)
  • comfy_execution/caching.py
  • comfy_execution/graph_utils.py

Comment thread comfy_execution/caching.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
comfy_execution/caching.py (1)

58-62: Consider removing unused self.value field.

The Unhashable class correctly relies on object identity for uniqueness (each instance is distinct, preventing false cache hits). However, self.value = float("NaN") is set but never used anywhere in the module — the class functions purely on Python's default identity-based __hash__ and __eq__.

If the NaN value was intended for debugging or future use, a comment clarifying its purpose would help; otherwise, removing it simplifies the class.

♻️ Suggested simplification
 class Unhashable:
     """Hashable sentinel for values that cannot be represented safely in cache keys."""
     def __init__(self):
         """Initialize a sentinel that stays hashable while never comparing equal."""
-        self.value = float("NaN")
+        pass  # Relies on object identity for uniqueness
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_execution/caching.py` around lines 58 - 62, The Unhashable class sets
an unused instance attribute self.value = float("NaN") which is never
referenced; remove that assignment from the Unhashable.__init__ method (or
alternatively add a clarifying comment if the NaN was intentionally kept for
debugging/future use) so the class relies solely on default identity-based
equality and hashing (refer to the Unhashable class and its __init__ method to
make the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@comfy_execution/caching.py`:
- Around line 58-62: The Unhashable class sets an unused instance attribute
self.value = float("NaN") which is never referenced; remove that assignment from
the Unhashable.__init__ method (or alternatively add a clarifying comment if the
NaN was intentionally kept for debugging/future use) so the class relies solely
on default identity-based equality and hashing (refer to the Unhashable class
and its __init__ method to make the change).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1941f665-0c0d-44dd-af17-7f1f807f12b1

📥 Commits

Reviewing files that changed from the base of the PR and between 31ba844 and 17863f6.

📒 Files selected for processing (1)
  • comfy_execution/caching.py

@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 14, 2026

Follow-up testing found a remaining crash path that this PR does not fully fix yet.

Current status:

  • the patch does improve stability for the original prompt-signature crash path
  • normal queueing behavior appears improved
  • however, interrupting a generation mid-run and then immediately queueing another prompt can still produce a fatal crash in the cache-signature path, with the stack ending in to_hashable()

My current suspicion is that this is not just a plain input-shape problem anymore, but may involve stale or partially mutated state being left behind after interruption and then reused during the next prompt's cache-signature generation.

Because of that, I do not consider this fully resolved yet, and I do not want this PR merged in its current state.

I'm leaving the PR open because the hardening here still appears valid and useful, but I want to keep testing and narrow down the remaining interrupt/requeue failure mode before this is ready.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 70-83: _shallow_is_changed_signature currently rejects nested
structured fingerprints (e.g. lists of tuples or dicts returned by
execution.IsChangedCache.get) by only allowing primitive types, causing valid
IS_CHANGED fingerprints to become Unhashable; update
_shallow_is_changed_signature to instead canonicalize/validate nested built-in
structures with a shallow, fail-closed canonicalizer (reuse the existing
canonicalizer function used elsewhere or call the project's
canonicalize/fail_closed routine) with a small depth and node budget so tuples,
lists, and dicts composed of stable built-ins (e.g., ("seed", 42) or {"cfg": 8})
are returned as canonical, hashable signatures while still returning
Unhashable() for opaque or deeply nested/unstable payloads; keep the same
container tagging logic (container_tag "is_changed_list"/"is_changed_tuple") and
ensure compatibility with execution.IsChangedCache.get’s list output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0f9fe9d7-6b0f-41a1-9e63-31afe3912c97

📥 Commits

Reviewing files that changed from the base of the PR and between f1d91a4 and fce22da.

📒 Files selected for processing (3)
  • comfy_execution/caching.py
  • tests-unit/execution_test/caching_test.py
  • tests/execution/test_caching.py

Comment thread comfy_execution/caching.py Outdated
@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 16, 2026

Follow-up: I was able to reproduce another fatal prompt-setup crash after the previous revision.

The new stack still ended in:

  • comfy_execution.caching._signature_to_hashable_impl
  • comfy_execution.caching._signature_to_hashable
  • comfy_execution.caching.get_node_signature

So the remaining bug was not raw live input traversal anymore, but the fact that CacheKeySetInputSignature.get_node_signature() still ran _signature_to_hashable(...) over the already-canonicalized fragments returned by get_immediate_node_signature().

This update removes that remaining outer recursive pass:

  • get_immediate_node_signature() now returns tuple fragments
  • get_node_signature() now returns a tuple of those fragments directly

That keeps per-input fail-closed canonicalization intact, while removing the second recursive traversal from the crashing prompt-signature path.

Added regression tests to verify that this path no longer invokes the outer signature canonicalizer and that deep / large pre-canonicalized fragments are preserved without being revisited.

@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 16, 2026

Follow-up: the previous revision still had one more bug in the new iterative to_hashable() path.

I hit this traceback during prompt setup:

  • comfy_execution.caching.get_immediate_node_signature
  • comfy_execution.caching.to_hashable
  • AttributeError: 'dict' object has no attribute 'append'

The issue was not another native crash. It was a plain Python bug in the iterative walker: the traversal stack variable could be rebound away from the pending-work list while walking dict-valued inputs, and later code still treated it like the work stack.

This update fixes that by keeping traversal state in a dedicated work_stack variable so it stays separate from container snapshots / result state throughout the walk.

I also added a regression test that exercises nested dict/list input canonicalization so this exact failure mode is covered.

So at this point the prompt-signature path is doing all three of the things I was aiming for here:

  • fail-closed canonicalization of hostile input values
  • no outer re-walk of already-canonicalized immediate signature fragments for CacheKeySetInputSignature
  • no accidental rebinding of the iterative traversal worklist inside to_hashable()

@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 16, 2026

Follow-up: I hit one more prompt-setup failure after the previous revision, and this one narrowed the remaining issue further into to_hashable()’s dict handling.

The stack ended in:

  • comfy_execution.caching.to_hashable
  • comfy_execution.caching.get_immediate_node_signature
  • comfy_execution.caching.get_node_signature

The remaining bad assumption was that plain built-in dicts were still treated as safe to recurse through for both keys and values. That is broader than the prompt-signature path should allow.

For normal prompt-native data, dict keys should already be simple stable primitives. Recursing into arbitrary runtime objects stored as dict keys just reintroduces another unsafe traversal surface.

This update tightens the dict path in to_hashable() so that:

  • dict keys must already be primitive signature types
  • non-primitive dict keys fail closed to Unhashable()
  • only dict values are traversed recursively
  • the canonicalized dict representation keeps the primitive keys directly instead of trying to resolve/traverse them

I also added a regression test covering opaque dict keys so this exact failure mode is exercised.

So at this point the cache-signature hardening on this path is now also failing closed for hostile dict-key shapes instead of descending into them.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 466-474: The get_node_signature function must fail-closed when any
immediate fragment is tainted by the Unhashable sentinel: after computing each
await self.get_immediate_node_signature(dynprompt, ..., order_mapping) (both for
node_id and for each ancestor_id), check whether the returned value is the
Unhashable sentinel (referencing Unhashable) and if so immediately return/raise
the top-level Unhashable instead of appending it and forming a tuple; this
ensures get_node_signature (and callers relying on it) never embeds
identity-based Unhashable leaves into cache keys—modify get_node_signature to
detect Unhashable results from get_immediate_node_signature and propagate a
deterministic fail-closed Unhashable outcome.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f0e9287-0866-4c32-94ea-fb9b37042d3a

📥 Commits

Reviewing files that changed from the base of the PR and between 6158cd5 and 1a00f77.

📒 Files selected for processing (3)
  • comfy_execution/caching.py
  • tests-unit/execution_test/caching_test.py
  • tests/execution/test_caching.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/execution/test_caching.py

Comment thread comfy_execution/caching.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 181-197: _signature_to_hashable_impl currently recurses into dict
keys which allows non-primitive keys; change it to reject non-primitive dict
keys the same way to_hashable does: before recursing on a dict key inside
_signature_to_hashable_impl, check if the key is a primitive (use the same
primitive-type check used by to_hashable (e.g., str/int/float/bool/bytes/None)
or a shared helper if available) and if it is not, return _FAILED_SIGNATURE
immediately instead of calling _signature_to_hashable_impl on the key so
dict-key acceptance is consistent with to_hashable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b9274397-e843-4a6f-b66c-d2dc1dd27aea

📥 Commits

Reviewing files that changed from the base of the PR and between 1a00f77 and ce05e37.

📒 Files selected for processing (3)
  • comfy_execution/caching.py
  • tests-unit/execution_test/caching_test.py
  • tests/execution/test_caching.py

Comment thread comfy_execution/caching.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests-unit/execution_test/caching_test.py (1)

294-313: Test doesn't exercise the collision detection path it claims to test.

The test uses object() instances as dict keys, but _signature_to_hashable_impl rejects non-primitive keys at line 182-183 before any recursive canonicalization. The monkeypatch for colliding_key_canonicalize is never invoked because the primitive-type check fails first. The test passes (returns Unhashable), but due to non-primitive key rejection, not collision detection.

To actually test key-sort collision detection, use primitive keys that naturally collide. float('nan') works because NaN values are not equal to each other (so they can be distinct dict keys) but have identical repr():

♻️ Suggested fix using NaN keys
-def test_signature_to_hashable_fails_closed_on_dict_key_sort_collisions_even_with_distinct_values(caching_module, monkeypatch):
+def test_signature_to_hashable_fails_closed_on_dict_key_sort_collisions_even_with_distinct_values(caching_module):
     """Different values must not mask dict key-sort collisions during canonicalization."""
     caching, _ = caching_module
-    original = caching._signature_to_hashable_impl
-    key_a = object()
-    key_b = object()
-
-    def colliding_key_canonicalize(obj, *args, **kwargs):
-        """Force two distinct raw keys to share the same canonical sort key."""
-        if obj is key_a:
-            return ("key-a", ("COLLIDE",))
-        if obj is key_b:
-            return ("key-b", ("COLLIDE",))
-        return original(obj, *args, **kwargs)
-
-    monkeypatch.setattr(caching, "_signature_to_hashable_impl", colliding_key_canonicalize)
-
-    sanitized = caching._signature_to_hashable({key_a: 1, key_b: 2})
+    # NaN values are not equal to each other (can be distinct dict keys)
+    # but have identical repr() (will collide on sort key)
+    nan_a = float("nan")
+    nan_b = float("nan")
+
+    sanitized = caching._signature_to_hashable({nan_a: 1, nan_b: 2})

     assert isinstance(sanitized, caching.Unhashable)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests-unit/execution_test/caching_test.py` around lines 294 - 313, The test
currently uses object() keys so the primitive-type fast-path rejects them before
your monkeypatched colliding canonicalizer runs; change the test to use
primitive NaN floats (e.g. key_a = float("nan"); key_b = float("nan")) so
_signature_to_hashable_impl will be invoked for each key, and modify
colliding_key_canonicalize to detect NaNs (via math.isnan(obj)) and return
different canonical labels for the first vs second NaN (e.g. track seen NaNs
with a small local state or counter and return ("key-a", ("COLLIDE",)) for the
first seen NaN and ("key-b", ("COLLIDE",)) for the second), leaving other paths
to call original(obj, *args, **kwargs); this will exercise the dict key-sort
collision path in caching._signature_to_hashable_impl and ensure the test
asserts Unhashable as intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests-unit/execution_test/caching_test.py`:
- Around line 294-313: The test currently uses object() keys so the
primitive-type fast-path rejects them before your monkeypatched colliding
canonicalizer runs; change the test to use primitive NaN floats (e.g. key_a =
float("nan"); key_b = float("nan")) so _signature_to_hashable_impl will be
invoked for each key, and modify colliding_key_canonicalize to detect NaNs (via
math.isnan(obj)) and return different canonical labels for the first vs second
NaN (e.g. track seen NaNs with a small local state or counter and return
("key-a", ("COLLIDE",)) for the first seen NaN and ("key-b", ("COLLIDE",)) for
the second), leaving other paths to call original(obj, *args, **kwargs); this
will exercise the dict key-sort collision path in
caching._signature_to_hashable_impl and ensure the test asserts Unhashable as
intended.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb6acd37-d78a-4f3f-8a28-c3e878ffb5cd

📥 Commits

Reviewing files that changed from the base of the PR and between ce05e37 and 6e3bd33.

📒 Files selected for processing (2)
  • comfy_execution/caching.py
  • tests-unit/execution_test/caching_test.py

@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 17, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Reviews resumed.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_execution/caching.py`:
- Around line 485-487: The missing-node branch currently returns (float("NaN"),)
which introduces a self-unequal value into cache keys; instead, return the
fail-closed sentinel Unhashable() to indicate "cannot cache" consistently.
Modify the branch that checks dynprompt.has_node(node_id) in caching.py so that
when the node is absent it returns Unhashable() (the existing fail-closed
sentinel) rather than a NaN tuple; this keeps cache signatures stable and aligns
with other cache-miss paths.

In `@tests-unit/execution_test/caching_test.py`:
- Around line 279-302: The tests use _OpaqueValue() so they short-circuit on
non-primitive keys/values and don't exercise the new ambiguous-sort logic;
update test_to_hashable_fails_closed_for_ambiguous_dict_ordering and
test_signature_to_hashable_fails_closed_for_ambiguous_dict_ordering to use
primitive inputs (e.g., simple ints/strings) instead of _OpaqueValue(), and
force equal sort keys so the dict-key/dict-value ordering ambiguity path is
taken by monkeypatching or stubbing caching._primitive_signature_sort_key (for
to_hashable) and caching._sanitized_sort_key or the internal
_signature_to_hashable path (for _signature_to_hashable) to return an identical
sort key for both entries; ensure assertions still expect caching.Unhashable
from caching.to_hashable and caching._signature_to_hashable respectively.
- Around line 166-192: The dict variant in container_factory doesn't trigger the
RuntimeError path because `{marker: "value"}` puts marker in the key position;
change that variant to place the marker in the value (e.g. `lambda marker:
{"key": marker}`) so the raising_canonicalize wrapper (monkeypatched on
caching._signature_to_hashable_impl) is invoked when
caching._signature_to_hashable is called and the test still asserts an instance
of caching.Unhashable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 826959d0-10fc-4d3e-b2a8-872dbd1fb12b

📥 Commits

Reviewing files that changed from the base of the PR and between ce05e37 and c1ce002.

📒 Files selected for processing (2)
  • comfy_execution/caching.py
  • tests-unit/execution_test/caching_test.py

Comment thread comfy_execution/caching.py Outdated
Comment thread tests-unit/execution_test/caching_test.py
Comment thread tests-unit/execution_test/caching_test.py Outdated
@xmarre
Copy link
Copy Markdown
Author

xmarre commented Mar 18, 2026

Updated the is_changed handling since my last PR comment.

The previous revision over-tightened _shallow_is_changed_signature() to a primitive-only list / tuple check. That avoided routing execution-time is_changed payloads back through the generic to_hashable() path, but it also regressed valid structured builtin fingerprints (for example lists containing tuples or dicts).

I have now adjusted that path so it still does not use the generic to_hashable() helper, but it also no longer rejects stable structured builtins outright:

  • _shallow_is_changed_signature() now returns Unhashable() immediately for non-container opaque runtime objects
  • plain builtin containers are reduced through the fail-closed builtin canonicalizer (_signature_to_hashable(..., max_nodes=64)) instead of the generic to_hashable() path
  • top-level list / tuple is_changed payloads still keep the explicit is_changed_list / is_changed_tuple tags so they stay distinct from ordinary input-value signatures
  • opaque, recursive, ambiguous, excessively deep, or otherwise unsafe builtin payloads still fail closed to Unhashable()

I also updated the tests to cover the corrected contract:

  • primitive-only is_changed lists remain hashable
  • primitive-only is_changed tuples remain hashable
  • structured builtin fingerprints such as [("seed", 42), {"cfg": 8}] remain cacheable
  • _shallow_is_changed_signature() no longer calls to_hashable() on this path

So the current behavior is narrower than the old generic path, but no longer “primitive-only”:
is_changed / fingerprint_inputs values can still use stable builtin structured payloads, while execution-time state no longer re-enters the generic cache-signature conversion path that this PR is trying to harden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant