Skip to content

Fix #2950: tolerate null equality state in ConfigurationCacheHackList input fingerprinting#2951

Open
diukarev wants to merge 1 commit into
diffplug:mainfrom
diukarev:fix-2950-configcachehacklist-null-equality-state
Open

Fix #2950: tolerate null equality state in ConfigurationCacheHackList input fingerprinting#2951
diukarev wants to merge 1 commit into
diffplug:mainfrom
diukarev:fix-2950-configcachehacklist-null-equality-state

Conversation

@diukarev
Copy link
Copy Markdown

Fixes #2950 (also relates to #2927).

Problem

In a clean environment (e.g. CI), spotlessGroovyGradle / any greclipse() step — and more generally any step wrapped in toggleOffOn() whose equality state is null — fails during Gradle's input fingerprinting:

java.lang.IllegalStateException: If the initializer was null, then one of roundtripStateInternal or equalityStateInternal should be non-null, and neither was
	at com.diffplug.spotless.FormatterStepSerializationRoundtrip.writeObject(FormatterStepSerializationRoundtrip.java:74)
	at com.diffplug.spotless.ConfigurationCacheHackList.hashCode(ConfigurationCacheHackList.java:151)
	at org.gradle.api.internal.tasks.execution.TaskExecution.visitMutableInputs(TaskExecution.java:336)

It reproduces with the configuration cache disabled and independent of the build cache, because the failure is in ConfigurationCacheHackList.hashCode() during task input snapshotting.

Root cause

ConfigurationCacheHackList.hashCode() serializes each step's HackClone. For an equality-optimized clone, HackClone.writeObject sets cleaned.equalityStateInternal = original.stateSupplier(). When that equality state is null (e.g. a FenceStep/toggleOffOn sub-step, or a greclipse step whose promised JAR/settings state isn't provisioned yet in a clean environment), the cleaned step has both roundtripStateInternal and equalityStateInternal null, and FormatterStepSerializationRoundtrip.writeObject throws.

An equality-optimized clone is only ever serialized for its cache-key bytes — it is never rehydrated for use (rehydration uses the roundtrip-optimized clones / original). A null state serializes deterministically, so it is a legitimate, safe value here.

Fix

Drop the guard for the initializer == null (clone) case in FormatterStepSerializationRoundtrip.writeObject. The normal-instance branch is unchanged.

Test

Adds ConfigurationCacheHackListTest, which fingerprints (ConfigurationCacheHackList.forEquality().hashCode()) a toggleOffOn() fence wrapping a step whose equality state is null. It fails on main with the exact exception above and passes with this change. Existing FenceStepTest and ToggleOffOnTest (with and without configuration cache) remain green.

…HackList

During Gradle's input fingerprinting, ConfigurationCacheHackList.hashCode()
serializes each step's HackClone. For an equality-optimized clone whose step
has a null equality state (e.g. greclipse / a toggleOffOn()-wrapped step that
is not yet provisioned in a clean CI environment), the cleaned
FormatterStepSerializationRoundtrip had both roundtripStateInternal and
equalityStateInternal null, so writeObject threw IllegalStateException and
broke the build during task input snapshotting.

An equality-optimized clone is only ever serialized for its cache-key bytes
(never rehydrated for use), and a null state serializes deterministically, so
this is safe. Drop the guard for that case.

Adds a regression test reproducing the failure via toggleOffOn() wrapping a
step whose equality state is null.
@diukarev
Copy link
Copy Markdown
Author

Root cause: regression introduced by #2944

This is a regression between 8.5.1 and 8.6.0 (lib 4.6.1 → 4.6.2). Notably, the serialization / cache-hack code (FormatterStepSerializationRoundtrip, ConfigurationCacheHackList, FenceStep) is unchanged across that range — so the guard isn't new. The only relevant change is #2944 ("Allow setting the local P2 cache dir"), the single greclipse-provisioning change in that release.

#2944 added a default P2 cache directory ($GRADLE_USER_HOME/caches/p2-data) and threads an effectiveCacheDirectory into the P2 provisioning request/key in GradleProvisioner. greclipse's equality state is resolved lazily from a promised JarState through that provisioner (whose cachedOnly path returns nothing / throws when the P2 bundles aren't provisioned yet). During Gradle's input fingerprinting — which runs before the task executes, and in a clean CI environment nothing is provisioned yet — that resolution yields a null/empty equality state for the greclipse step. Wrapped in toggleOffOn()'s ConfigurationCacheHackList, the null state then hits the pre-existing both-null guard in FormatterStepSerializationRoundtrip.writeObject.

That also explains why it only reproduces in clean CI: a warm local cache means the P2 bundles are already provisioned, so the equality state is non-null and nothing throws.

This PR fixes the symptom in the shared serialization layer, since a null equality state is legitimate and serializes deterministically. It may additionally be worth ensuring greclipse does not produce a null state during input fingerprinting (the #2944 path) — happy to follow up on that separately if you'd prefer the fix there as well.

@diukarev
Copy link
Copy Markdown
Author

Note: this fixes the symptom, not the root cause

To be transparent about scope: this PR removes the crash (the both-null guard in FormatterStepSerializationRoundtrip.writeObject), so a clone that ends up with neither internal state set serializes deterministically instead of throwing during Gradle's input fingerprinting. I believe that's correct and at the right layer — a degenerate/empty state should serialize rather than blow up the build, and the equality-optimized clone is only ever serialized for its cache-key bytes (never rehydrated for use). But it is a tolerate-the-bad-state fix, not a fix for why the clone becomes both-null.

What I have and haven't established:

  • Regression source: this started in 8.6.0 (lib 4.6.2). The serialization/cache-hack code (FormatterStepSerializationRoundtrip, ConfigurationCacheHackList, FenceStep) is unchanged across 4.6.1 → 4.6.2; the only relevant change is Allow setting the local P2 cache dir in the Spotless Gradle plugin #2944 (greclipse P2 cache-dir / GradleProvisioner). So Allow setting the local P2 cache dir in the Spotless Gradle plugin #2944 is what made this surface, but I did not change it.
  • Exact mechanism — not fully pinned (correcting my earlier comment): I said the greclipse step has a "null equality state," but that's imprecise — EclipseBasedStepBuilder's State object is never literally null (EclipseStep.state() always constructs a State, even if its inner jarState is unresolved). So the precise path by which the cleaned clone ends up with both internal states null is subtler than I stated, and I haven't proven it.
  • Why I stopped at the symptom: the failure only reproduces in a clean CI environment (warm local caches mask it), which I can't reproduce locally, so any greclipse-/#2944-side change would be unverified and speculative. The unit test in this PR reproduces the crash deterministically via a toggleOffOn() fence around a step whose equality state is null, which is a legitimate representative case, but it may not be byte-identical to greclipse's path.

So: happy for a maintainer to treat this as the defensive fix and separately decide whether greclipse should avoid producing a both-null clone during input fingerprinting (the #2944 path). I can help dig into the exact greclipse mechanism if you'd like a reproduction in the Spotless test harness.

@diukarev
Copy link
Copy Markdown
Author

Validated end-to-end in CI (where it actually reproduces)

Since this only reproduces in a clean CI environment, I validated the fix there rather than only via the unit test:

  • Built this patched branch as a local 8.6.x snapshot (8.6.x base + this fix) and used it in a real multi-module project whose groovyGradle { toggleOffOn(); greclipse() } build fails repo-wide on stock 8.6.0 with this exact exception (only in CI — warm local caches mask it).
  • On a clean CI runner (JDK 26, fresh GRADLE_USER_HOME), spotlessCheck — which runs :spotlessGroovyGradle across all modules — went green with the patched build, where stock 8.6.0 reliably fails.

So regardless of the exact internal path that makes the clone both-null, dropping the guard resolves the real-world failure end-to-end, not just the synthetic unit test.

@nedtwigg
Copy link
Copy Markdown
Member

so when out.defaultWriteObject() gets called at the bottom of writeObject, what gets written? If all the fields related to state are null, what is the state of the object?

The point of the state is equality for up-to-dateness checks. I don't see how they could work after this was merged, and I don't understand how #2944 could have caused this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in 8.6.0: groovyGradle/greclipse fails with "roundtripStateInternal or equalityStateInternal should be non-null" during input fingerprinting

2 participants