Fail closed prompt cache signatures for opaque, unstable, and recursive inputs#12936
Fail closed prompt cache signatures for opaque, unstable, and recursive inputs#12936xmarre wants to merge 52 commits intoComfy-Org:masterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c1bfad0df
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughReimplements deterministic signature canonicalization via a new iterative, memoized 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_execution/caching.py`:
- Around line 62-67: The current dict handling in _sanitize_signature_input
calls repr() on raw dict keys before sanitization which can execute unsafe
custom __repr__; change the logic to first sanitize each key and value (call
_sanitize_signature_input on keys), then sort by the sanitized key
representation (or use the sanitized key directly) before building the tuple of
pairs, so no raw key repr() is invoked; likewise in to_hashable(), remove the
explicit sort of original keys and instead build the frozenset from sanitized
key/value pairs (or rely on the frozenset of sanitized items) so sorting of raw
objects is not performed.
In `@comfy_execution/graph_utils.py`:
- Around line 13-14: The validator currently allows floats for the output index
(checks "type(obj[1]) not in (int, float)"), so change the check to only accept
integers: replace the condition with a strict int check (e.g., require
type(obj[1]) is int or use isinstance(obj[1], int) but avoid allowing float) so
that malformed links like ["node", 0.0] are rejected early; update the
validation around obj[1] in the function that contains this snippet to only
accept integer indices.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9146377b-3877-470f-a635-2c8f64664631
📒 Files selected for processing (2)
comfy_execution/caching.pycomfy_execution/graph_utils.py
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_execution/caching.py`:
- Around line 88-127: The sanitizer currently only limits recursion depth and
can blow up on branched cyclic containers; modify _sanitize_signature_input to
accept or create a visited set (e.g., visited=None -> visited=set()), track
container identities via id(obj) before walking, and if id(obj) is already in
visited return Unhashable() immediately to break cycles; add id(obj) to visited
before recursing into dict/list/tuple/set/frozenset and remove it after
processing (or use a copy of visited for immutability) so shared but acyclic
structures still get processed; keep using Unhashable() for cycles and preserve
all existing behavior and return types for functions/methods named in the diff
(_sanitize_signature_input, _sanitized_sort_key, Unhashable).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: da468cb1-ef6e-4c6d-80c4-8b4a9c8c01f7
📒 Files selected for processing (2)
comfy_execution/caching.pycomfy_execution/graph_utils.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
comfy_execution/caching.py (1)
58-62: Consider removing unusedself.valuefield.The
Unhashableclass correctly relies on object identity for uniqueness (each instance is distinct, preventing false cache hits). However,self.value = float("NaN")is set but never used anywhere in the module — the class functions purely on Python's default identity-based__hash__and__eq__.If the NaN value was intended for debugging or future use, a comment clarifying its purpose would help; otherwise, removing it simplifies the class.
♻️ Suggested simplification
class Unhashable: """Hashable sentinel for values that cannot be represented safely in cache keys.""" def __init__(self): """Initialize a sentinel that stays hashable while never comparing equal.""" - self.value = float("NaN") + pass # Relies on object identity for uniqueness🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy_execution/caching.py` around lines 58 - 62, The Unhashable class sets an unused instance attribute self.value = float("NaN") which is never referenced; remove that assignment from the Unhashable.__init__ method (or alternatively add a clarifying comment if the NaN was intentionally kept for debugging/future use) so the class relies solely on default identity-based equality and hashing (refer to the Unhashable class and its __init__ method to make the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@comfy_execution/caching.py`:
- Around line 58-62: The Unhashable class sets an unused instance attribute
self.value = float("NaN") which is never referenced; remove that assignment from
the Unhashable.__init__ method (or alternatively add a clarifying comment if the
NaN was intentionally kept for debugging/future use) so the class relies solely
on default identity-based equality and hashing (refer to the Unhashable class
and its __init__ method to make the change).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1941f665-0c0d-44dd-af17-7f1f807f12b1
📒 Files selected for processing (1)
comfy_execution/caching.py
|
Follow-up testing found a remaining crash path that this PR does not fully fix yet. Current status:
My current suspicion is that this is not just a plain input-shape problem anymore, but may involve stale or partially mutated state being left behind after interruption and then reused during the next prompt's cache-signature generation. Because of that, I do not consider this fully resolved yet, and I do not want this PR merged in its current state. I'm leaving the PR open because the hardening here still appears valid and useful, but I want to keep testing and narrow down the remaining interrupt/requeue failure mode before this is ready. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_execution/caching.py`:
- Around line 70-83: _shallow_is_changed_signature currently rejects nested
structured fingerprints (e.g. lists of tuples or dicts returned by
execution.IsChangedCache.get) by only allowing primitive types, causing valid
IS_CHANGED fingerprints to become Unhashable; update
_shallow_is_changed_signature to instead canonicalize/validate nested built-in
structures with a shallow, fail-closed canonicalizer (reuse the existing
canonicalizer function used elsewhere or call the project's
canonicalize/fail_closed routine) with a small depth and node budget so tuples,
lists, and dicts composed of stable built-ins (e.g., ("seed", 42) or {"cfg": 8})
are returned as canonical, hashable signatures while still returning
Unhashable() for opaque or deeply nested/unstable payloads; keep the same
container tagging logic (container_tag "is_changed_list"/"is_changed_tuple") and
ensure compatibility with execution.IsChangedCache.get’s list output.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0f9fe9d7-6b0f-41a1-9e63-31afe3912c97
📒 Files selected for processing (3)
comfy_execution/caching.pytests-unit/execution_test/caching_test.pytests/execution/test_caching.py
|
Follow-up: I was able to reproduce another fatal prompt-setup crash after the previous revision. The new stack still ended in:
So the remaining bug was not raw live input traversal anymore, but the fact that This update removes that remaining outer recursive pass:
That keeps per-input fail-closed canonicalization intact, while removing the second recursive traversal from the crashing prompt-signature path. Added regression tests to verify that this path no longer invokes the outer signature canonicalizer and that deep / large pre-canonicalized fragments are preserved without being revisited. |
|
Follow-up: the previous revision still had one more bug in the new iterative I hit this traceback during prompt setup:
The issue was not another native crash. It was a plain Python bug in the iterative walker: the traversal stack variable could be rebound away from the pending-work list while walking dict-valued inputs, and later code still treated it like the work stack. This update fixes that by keeping traversal state in a dedicated I also added a regression test that exercises nested dict/list input canonicalization so this exact failure mode is covered. So at this point the prompt-signature path is doing all three of the things I was aiming for here:
|
|
Follow-up: I hit one more prompt-setup failure after the previous revision, and this one narrowed the remaining issue further into The stack ended in:
The remaining bad assumption was that plain built-in dicts were still treated as safe to recurse through for both keys and values. That is broader than the prompt-signature path should allow. For normal prompt-native data, dict keys should already be simple stable primitives. Recursing into arbitrary runtime objects stored as dict keys just reintroduces another unsafe traversal surface. This update tightens the dict path in
I also added a regression test covering opaque dict keys so this exact failure mode is exercised. So at this point the cache-signature hardening on this path is now also failing closed for hostile dict-key shapes instead of descending into them. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_execution/caching.py`:
- Around line 466-474: The get_node_signature function must fail-closed when any
immediate fragment is tainted by the Unhashable sentinel: after computing each
await self.get_immediate_node_signature(dynprompt, ..., order_mapping) (both for
node_id and for each ancestor_id), check whether the returned value is the
Unhashable sentinel (referencing Unhashable) and if so immediately return/raise
the top-level Unhashable instead of appending it and forming a tuple; this
ensures get_node_signature (and callers relying on it) never embeds
identity-based Unhashable leaves into cache keys—modify get_node_signature to
detect Unhashable results from get_immediate_node_signature and propagate a
deterministic fail-closed Unhashable outcome.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6f0e9287-0866-4c32-94ea-fb9b37042d3a
📒 Files selected for processing (3)
comfy_execution/caching.pytests-unit/execution_test/caching_test.pytests/execution/test_caching.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/execution/test_caching.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_execution/caching.py`:
- Around line 181-197: _signature_to_hashable_impl currently recurses into dict
keys which allows non-primitive keys; change it to reject non-primitive dict
keys the same way to_hashable does: before recursing on a dict key inside
_signature_to_hashable_impl, check if the key is a primitive (use the same
primitive-type check used by to_hashable (e.g., str/int/float/bool/bytes/None)
or a shared helper if available) and if it is not, return _FAILED_SIGNATURE
immediately instead of calling _signature_to_hashable_impl on the key so
dict-key acceptance is consistent with to_hashable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b9274397-e843-4a6f-b66c-d2dc1dd27aea
📒 Files selected for processing (3)
comfy_execution/caching.pytests-unit/execution_test/caching_test.pytests/execution/test_caching.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests-unit/execution_test/caching_test.py (1)
294-313: Test doesn't exercise the collision detection path it claims to test.The test uses
object()instances as dict keys, but_signature_to_hashable_implrejects non-primitive keys at line 182-183 before any recursive canonicalization. The monkeypatch forcolliding_key_canonicalizeis never invoked because the primitive-type check fails first. The test passes (returnsUnhashable), but due to non-primitive key rejection, not collision detection.To actually test key-sort collision detection, use primitive keys that naturally collide.
float('nan')works because NaN values are not equal to each other (so they can be distinct dict keys) but have identicalrepr():♻️ Suggested fix using NaN keys
-def test_signature_to_hashable_fails_closed_on_dict_key_sort_collisions_even_with_distinct_values(caching_module, monkeypatch): +def test_signature_to_hashable_fails_closed_on_dict_key_sort_collisions_even_with_distinct_values(caching_module): """Different values must not mask dict key-sort collisions during canonicalization.""" caching, _ = caching_module - original = caching._signature_to_hashable_impl - key_a = object() - key_b = object() - - def colliding_key_canonicalize(obj, *args, **kwargs): - """Force two distinct raw keys to share the same canonical sort key.""" - if obj is key_a: - return ("key-a", ("COLLIDE",)) - if obj is key_b: - return ("key-b", ("COLLIDE",)) - return original(obj, *args, **kwargs) - - monkeypatch.setattr(caching, "_signature_to_hashable_impl", colliding_key_canonicalize) - - sanitized = caching._signature_to_hashable({key_a: 1, key_b: 2}) + # NaN values are not equal to each other (can be distinct dict keys) + # but have identical repr() (will collide on sort key) + nan_a = float("nan") + nan_b = float("nan") + + sanitized = caching._signature_to_hashable({nan_a: 1, nan_b: 2}) assert isinstance(sanitized, caching.Unhashable)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests-unit/execution_test/caching_test.py` around lines 294 - 313, The test currently uses object() keys so the primitive-type fast-path rejects them before your monkeypatched colliding canonicalizer runs; change the test to use primitive NaN floats (e.g. key_a = float("nan"); key_b = float("nan")) so _signature_to_hashable_impl will be invoked for each key, and modify colliding_key_canonicalize to detect NaNs (via math.isnan(obj)) and return different canonical labels for the first vs second NaN (e.g. track seen NaNs with a small local state or counter and return ("key-a", ("COLLIDE",)) for the first seen NaN and ("key-b", ("COLLIDE",)) for the second), leaving other paths to call original(obj, *args, **kwargs); this will exercise the dict key-sort collision path in caching._signature_to_hashable_impl and ensure the test asserts Unhashable as intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests-unit/execution_test/caching_test.py`:
- Around line 294-313: The test currently uses object() keys so the
primitive-type fast-path rejects them before your monkeypatched colliding
canonicalizer runs; change the test to use primitive NaN floats (e.g. key_a =
float("nan"); key_b = float("nan")) so _signature_to_hashable_impl will be
invoked for each key, and modify colliding_key_canonicalize to detect NaNs (via
math.isnan(obj)) and return different canonical labels for the first vs second
NaN (e.g. track seen NaNs with a small local state or counter and return
("key-a", ("COLLIDE",)) for the first seen NaN and ("key-b", ("COLLIDE",)) for
the second), leaving other paths to call original(obj, *args, **kwargs); this
will exercise the dict key-sort collision path in
caching._signature_to_hashable_impl and ensure the test asserts Unhashable as
intended.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cb6acd37-d78a-4f3f-8a28-c3e878ffb5cd
📒 Files selected for processing (2)
comfy_execution/caching.pytests-unit/execution_test/caching_test.py
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_execution/caching.py`:
- Around line 485-487: The missing-node branch currently returns (float("NaN"),)
which introduces a self-unequal value into cache keys; instead, return the
fail-closed sentinel Unhashable() to indicate "cannot cache" consistently.
Modify the branch that checks dynprompt.has_node(node_id) in caching.py so that
when the node is absent it returns Unhashable() (the existing fail-closed
sentinel) rather than a NaN tuple; this keeps cache signatures stable and aligns
with other cache-miss paths.
In `@tests-unit/execution_test/caching_test.py`:
- Around line 279-302: The tests use _OpaqueValue() so they short-circuit on
non-primitive keys/values and don't exercise the new ambiguous-sort logic;
update test_to_hashable_fails_closed_for_ambiguous_dict_ordering and
test_signature_to_hashable_fails_closed_for_ambiguous_dict_ordering to use
primitive inputs (e.g., simple ints/strings) instead of _OpaqueValue(), and
force equal sort keys so the dict-key/dict-value ordering ambiguity path is
taken by monkeypatching or stubbing caching._primitive_signature_sort_key (for
to_hashable) and caching._sanitized_sort_key or the internal
_signature_to_hashable path (for _signature_to_hashable) to return an identical
sort key for both entries; ensure assertions still expect caching.Unhashable
from caching.to_hashable and caching._signature_to_hashable respectively.
- Around line 166-192: The dict variant in container_factory doesn't trigger the
RuntimeError path because `{marker: "value"}` puts marker in the key position;
change that variant to place the marker in the value (e.g. `lambda marker:
{"key": marker}`) so the raising_canonicalize wrapper (monkeypatched on
caching._signature_to_hashable_impl) is invoked when
caching._signature_to_hashable is called and the test still asserts an instance
of caching.Unhashable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 826959d0-10fc-4d3e-b2a8-872dbd1fb12b
📒 Files selected for processing (2)
comfy_execution/caching.pytests-unit/execution_test/caching_test.py
…heck Enforce shallow is_changed signature handling
|
Updated the The previous revision over-tightened I have now adjusted that path so it still does not use the generic
I also updated the tests to cover the corrected contract:
So the current behavior is narrower than the old generic path, but no longer “primitive-only”: |
Fix 2D tiled VAE encode memory admission estimation
Summary
This hardens prompt cache signature generation so ComfyUI fails closed when node inputs contain opaque runtime objects, mutated live containers, recursive or pathologically deep built-in structures, or otherwise non-canonical data during prompt setup.
The failure being addressed here happens during prompt setup, inside cache signature generation, before sampling starts.
The key change in the current version is that cache signatures for prompt inputs are now built in a single fail-closed canonicalization pass, instead of first sanitizing into an intermediate nested structure and then traversing that structure again in a second pass.
In addition, execution-time
is_changedvalues are no longer deep-canonicalized through the generic path. They are now reduced through a shallow primitive-only signature helper, and non-canonical or nested runtime payloads fail closed toUnhashable().Problem
get_immediate_node_signature()appends raw non-link input values into the cache signature, and it also incorporates execution-timeis_changedstate into the immediate-node signature path.Historically, the cache-signature path then walked nested values broadly enough that foreign runtime objects could be treated like prompt-safe structures. In complex custom-node workflows, that could lead to unsafe recursion or inspection during cache signature building.
Even plain built-in containers are not automatically safe. They can still be:
That is especially undesirable in core prompt setup code. Even if a custom node or execution wrapper feeds an unexpected object into inputs or
is_changedstate, ComfyUI should not destabilize while building cache keys.Observed behavior
In the affected workflow, ComfyUI could fail during prompt setup with a fatal crash in the cache/signature path, before sampling started.
The relevant stack ended in:
comfy_execution.caching.get_immediate_node_signaturecomfy_execution.caching.get_node_signaturecomfy_execution.caching.add_keyscomfy_execution.caching.set_promptFollow-up reproductions also showed that repeated queued generations could still crash when later prompts re-entered cache signature generation and touched problematic nested state.
Root cause
The cache signature path assumed node inputs and execution-time
is_changedvalues were composed of plain prompt-native values and stable built-in containers.In practice, custom nodes and execution wrappers can inject non-prompt-native objects into inputs or execution-time state during things like:
Those objects should not be recursively walked by the cache key builder.
In addition, even built-in containers can still be unsafe to traverse as live data structures if they are cyclic, excessively deep, shared in DAG-like ways, mutated during traversal, or ambiguously ordered.
I did not narrow this down to one specific extension or node as the definitive injector. The important issue at the core level is that prompt-signature generation should handle such inputs safely regardless of where they came from.
Changes
This patch makes five targeted changes.
1. Strict link detection
graph_utils.is_link()now only accepts plain Python lists in the expected[node_id, output_index]form, and it requires the output index to be a plain integer.This avoids treating foreign list-like objects as graph links and rejects malformed link shapes earlier.
2. Build prompt-input cache signatures in a single fail-closed canonicalization pass
For
CacheKeySetInputSignature, the assembled signature is now canonicalized directly into its final hashable representation in one pass.Instead of:
the current path now:
This removes the second deep traversal from the affected prompt-input cache-signature path.
3. Fail closed for unsafe or unstable signature input
The canonicalization path now returns
Unhashable()for the full signature when it encounters conditions such as:RuntimeErrorraised while iterating live containersThis avoids continuing deeper into suspicious nested state and prevents stale cache reuse based on unstable runtime data.
4. Stop deep canonicalization of execution-time
is_changedvaluesget_immediate_node_signature()no longer feeds rawis_changedpayloads into the generic deep canonicalization path.Instead,
is_changedvalues are reduced through a shallow signature helper:list/tuplevalues are preserved under explicitis_changed_list/is_changed_tupletagsUnhashable()immediatelyThis prevents execution-time
is_changedstate from reintroducing deep traversal over non-canonical runtime payloads.5. Harden legacy hash conversion for plain built-in containers
to_hashable()remains hardened for plain built-in containers and still:Unhashable()on traversal-timeRuntimeErrorIn addition, ordered and unordered containers now fail closed at the container level when child canonicalization fails, instead of preserving partially failed child sentinels inside the resulting structure.
This keeps the helper itself safer where it is still used directly, while the affected prompt-input cache-signature path no longer relies on it as a second-pass traversal step.
Why this approach
This keeps normal cache behavior intact for standard prompt data while hardening the signature builder against objects that should never have been traversed in the first place.
The goal here is not to declare those foreign or unstable objects valid prompt inputs. The goal is to make core caching logic robust when custom-node ecosystems do unusual things.
Failing closed to
Unhashable()is the safe fallback:is_changedpayloadsAt the same time, preserving built-in container type avoids introducing new cache-key collisions for normal inputs, and keeping
is_changed_list/is_changed_tupledistinct avoids collapsing execution-timeis_changedshapes into ordinary input-value signatures.Compatibility impact
This does not change normal behavior for prompt-native values, stable built-in containers, or shallow primitive-only
is_changedvalues.The change only affects cases where signature generation encounters opaque runtime objects, unstable live containers, recursive or pathologically deep built-ins, ambiguous container ordering, nested non-canonical
is_changedpayloads, or other inputs that cannot be canonicalized safely. In those cases, the cache builder now fails safe instead of continuing unsafe traversal.It also keeps cache keys type-stable for normal built-in containers, so type changes in user inputs do not get flattened into the same signature shape.
Testing
Tested against a complex custom-node workflow that includes dynamic graph behavior and execution-affecting extensions.
Validation performed:
RuntimeErrordegrades toUnhashable()is_changedlists remain hashableis_changedcontainers fail closedis_changedpayloads fail closed before deep canonicalizationUnhashable()Scope
This patch intentionally hardens the crash site in core rather than trying to special-case any one extension or workflow.
The core issue here is that cache signature generation should handle opaque, unstable, cyclic, ambiguous, pathologically deep, or otherwise non-canonical inputs safely instead of destabilizing prompt setup.