Support graceful script isolation fallback when filesystem does not support locking#1838
Support graceful script isolation fallback when filesystem does not support locking#1838rhysparry wants to merge 29 commits intorhys/eft-251/script-isolationfrom
Conversation
…quire delegate CachedDriveInfo is extracted from its nested position inside LockDirectory into its own file. DetectLockSupport gains an overload that accepts a Func<LockOptions, ILockHandle> delegate in place of the direct FileLock.Acquire call, making the four lock-probe tests (exclusive, shared, exclusive-blocks-shared, shared-blocks-exclusive) unit-testable without real filesystem operations. The existing public overload forwards to FileLock.Acquire.
MountedDrives is extracted from its nested position inside LockDirectory into its own file with no logic changes.
The public GetLockDirectory(string) now delegates to an internal overload that accepts a MountedDrives instance and an optional acquire delegate. This allows tests to supply controlled drive lists and lock-acquisition behaviour without real filesystem probing. The refactor also closes a design gap: GetAssociatedDrive calls that throw DirectoryNotFoundException (e.g. when MountedDrives is empty or a path does not sit under any known drive root) are now caught, and the affected path is treated as having no known capability rather than propagating the exception.
The fallback decision logic (routing between IsFullySupported, IsSupported, PromoteToExclusiveLock, and returning null) is separated into an internal static ResolveLockOptions(LockOptions, bool) method. PrepareLockOptions becomes a thin wrapper that builds the options, logs, then delegates. This makes all fallback branches unit-testable by constructing a LockOptions with a controlled LockCapability, without any filesystem access.
…metic tests GetLockDirectory tests against non-existent paths (e.g. /home/octopus/tentacle) were failing because DetectLockSupport called Directory.CreateDirectory unconditionally before probing, causing the outer catch to return Unsupported instead of running the fake lock service. Thread a createDirectory Action through GetLockDirectory and DetectLockSupport (defaulting to the real implementation), and pass a no-op in the two affected Group D fixture tests.
The two delegates (acquireDelegate and createDirectory) passed through GetLockDirectory and DetectLockSupport were always used together and represented a single concern: interacting with the filesystem during lock-support probing. Introducing IFileLockService makes that coupling explicit and gives tests a single seam to inject. - Add IFileLockService with AcquireLock and CreateDirectory - Add FileLockService (singleton) as the real implementation - Collapse the multi-overload chain in CachedDriveInfo.DetectLockSupport and LockDirectory.GetLockDirectory to a single IFileLockService parameter - FakeLockService in LockDirectoryFixture now implements IFileLockService (CreateDirectory is a no-op; Acquire renamed to AcquireLock)
…oryFallback Move GetTemporaryCandidates logic into a new ITemporaryDirectoryFallback interface with real implementation TemporaryDirectoryFallback (singleton). This allows the internal GetLockDirectory overload to accept an injectable fallback, making tests hermetic and independent of environment variables ($TMPDIR) and filesystem existence checks (/tmp, /dev/shm). Also fix bug: the /dev/shm candidate was incorrectly joined under /tmp instead of /dev/shm. Add FakeTemporaryDirectoryFallback to tests returning a fixed list of candidates. All Group D tests now inject this fake, removing any dependency on $TMPDIR, /tmp, and /dev/shm. Add 2 new Group D tests: - Verify temp candidates with no matching drive are silently skipped - Verify multiple temp candidates: the first Supported one is chosen All 89 tests pass.
Consolidate ITemporaryDirectoryFallback into IFileLockService by adding GetFallbackTemporaryDirectories. This removes the separate interface and its injectable seam, simplifying the design: callers only need to inject one service. TemporaryDirectoryFallback becomes a static helper used exclusively by FileLockService. Tests updated to use FakeLockService with temp directory params instead of the now-deleted FakeTemporaryDirectoryFallback.
…d platform-aware path comparison Introduce IPathResolutionService (+ DefaultPathResolutionService) to address three gaps in the original prefix-matching logic: 1. Symlink resolution: input paths are now resolved (via BCL FileInfo.ResolveLinkTarget) before prefix-matching against drive roots. This fixes the macOS /tmp → /private/tmp case where DriveInfo returns /private/tmp as the mount root but callers may pass /tmp/foo. 2. Platform-aware case comparison: StringComparison is now sourced from IPathResolutionService.PathComparison (OrdinalIgnoreCase on Windows/macOS, Ordinal on Linux) rather than hardcoded OrdinalIgnoreCase. 3. Path normalisation: Path.GetFullPath is applied before any symlink resolution, eliminating relative paths and .. components. The IPathResolutionService seam keeps the production change entirely in DefaultPathResolutionService (a thin BCL wrapper), while a FakePathResolutionService in the test fixture allows all new Group E tests to run hermeticly without real symlinks or filesystem state.
All Group D tests in LockDirectoryFixture used fake paths (e.g. /home/octopus/tentacle, /tmp/tentacle) that don't exist on macOS. DefaultPathResolutionService would walk up ancestors and resolve /tmp to /private/tmp, causing IsAncestor to fail against the /tmp drive root. Added pathResolver: FakePathResolutionService.PassThrough to all Group D test calls so they bypass real filesystem resolution, matching the intent of hermetic unit tests.
…gic to extension method IPathResolutionService now exposes only two thin BCL delegates — GetFullPath (Path.GetFullPath) and ResolveLinkTarget (FileInfo.ResolveLinkTarget) — plus PathComparison. The ancestor-walk resolution logic is extracted into PathResolutionServiceExtensions.ResolvePath so it can be unit-tested in isolation via FakePathResolutionService. DefaultPathResolutionService becomes a trivial wrapper with no logic of its own. FakePathResolutionService is updated to implement the new interface with separate fullPathMap (normalisation) and symlinkMap (symlink simulation) dictionaries, and the PassThrough preset throws FileNotFoundException from ResolveLinkTarget so the extension method falls back to returning the GetFullPath result unchanged. Added Group G tests (6 new) that directly exercise ResolvePath extension method logic: non-existent path fallback, existing non-symlink, direct symlink, ancestor-symlink walk, GetFullPath pre-normalisation, and multi-segment tail re-attachment.
…lock others Path.GetFullPath(string) documents five exception types that can be thrown for invalid, inaccessible, or oversized paths: ArgumentException, ArgumentNullException, SecurityException, NotSupportedException, and PathTooLongException. Previously these propagated out of ResolvePath, which would have aborted the entire drive-selection loop in LockDirectory for all remaining candidate and temp paths. ResolvePath now catches all five and returns the original raw path as a safe fallback, preserving the invariant that one malformed path cannot prevent other options from being evaluated. Changes: - IPathResolutionService.GetFullPath: added <exception> docs for all five exception types - PathResolutionServiceExtensions.ResolvePath: added try/catch around the GetFullPath call; updated XML docs to describe the fallback behaviour - FakePathResolutionService: added getFullPathException constructor parameter to inject any of the five exception types for testing - Added 5 parameterised tests (one per exception type) in LockDirectoryFixture
dea3c2a to
9ce1b18
Compare
On Linux, FileInfo.ResolveLinkTarget throws UnauthorizedAccessException (wrapping IOException: Permission denied) for paths whose parent directories exist but are not accessible to the current user. The ancestor-walk loop in ResolvePath only caught FileNotFoundException, DirectoryNotFoundException, and IOException; UnauthorizedAccessException is not a subclass of IOException so it escaped the handler and propagated to callers. Add UnauthorizedAccessException to the catch filter so that inaccessible path components are treated the same as non-existent ones — the walk peels off the last segment and retries the parent.
The previous hardcoded paths (C:\Octopus\Tentacle / /home/octopus/tentacle) do not exist on developer or CI machines and cause UnauthorizedAccessException on Linux when ResolveLinkTarget tries to inspect an inaccessible parent directory. Switching to Path.GetTempPath() guarantees the path exists and is accessible on all platforms, making the fixture hermetic without requiring any mock infrastructure.
The LockDirectory used to probe filesystem lock support was constructed with LockCapability.Unknown, causing LockFile.Open to throw NotSupportedException before any flock() attempt was made. This meant every filesystem requiring runtime detection was classified as Unsupported rather than being probed. Fix: construct the probe LockDirectory with LockCapability.Supported so the probe calls can actually reach the kernel.
gb-8
left a comment
There was a problem hiding this comment.
I've had a scan of the PR and while the approach seems good, and the test coverage seems good, I did find the code hard to follow.
I had to really work to follow the logic through, and see exactly where the determination of locking support was performed.
The logic is all hidden inside things that don't make it clear what they are doing.
Once you are happy that the functionality is all correct, then I'd like to pair on seeing if we can refactor to make it more readable and understandable.
|
|
||
| public enum LockCapability | ||
| { | ||
| Unknown, |
There was a problem hiding this comment.
Claude reckons we are using these values wrongly, since when we are unable to determine whether locking is supported, we fallback on Unsupported when it thinks it technically should be Unknown.
I think we are actually using Unknown to mean Untested, so could rename it as such.
|
|
||
| public enum LockCapability | ||
| { | ||
| Unknown, |
There was a problem hiding this comment.
I think it is good practice to set the default value explicitly to 0.
| yield return ApplyNamespace(tmp); | ||
| } | ||
|
|
||
| const string devShm = "/dev/shm"; |
There was a problem hiding this comment.
I wonder if we should actually preference /dev/shm over other locations, as it has the nice behaviour that locks will automatically get cleaned up on reboot.
Script isolation lock directory fallback for filesystems without full locking support
Motivation
Calamari's script isolation mechanism uses filesystem file locks to serialise or permit concurrent script execution. The lock file has historically been written into the Tentacle working directory (
TentacleHome).On some deployment targets the working directory resides on a filesystem that does not fully support file locking semantics — NFS mounts, certain FUSE-based filesystems, and some SMB shares being common examples. On these filesystems, lock acquisition either silently succeeds when it should block, or throws errors that are not meaningful to the caller. The result is that script isolation is silently broken: scripts that should be mutually exclusive may run concurrently, or scripts that should run in parallel are incorrectly serialised (or errored).
This PR introduces runtime filesystem lock capability detection and an automatic fallback to a temporary directory that does support locking, so that script isolation remains as strong as the host OS allows — without requiring any manual configuration from operators.
What changed
New abstractions
ScriptIsolation/LockCapability.csUnknown,Unsupported,ExclusiveOnly,SupportedScriptIsolation/LockFile.csLockDirectorywith a specific file pathScriptIsolation/LockDirectory.csLockCapability; hosts theGetLockDirectoryfallback algorithmScriptIsolation/CachedDriveInfo.csScriptIsolation/MountedDrives.csScriptIsolation/TemporaryDirectoryFallback.csScriptIsolation/IFileLockService.cs+FileLockService.csCreateDirectory,AcquireLock, andGetFallbackTemporaryDirectories— enables hermetic unit testsScriptIsolation/IPathResolutionService.cs+DefaultPathResolutionService.cs+PathResolutionServiceExtensions.csPath.GetFullPathand symlink resolution — necessary to correctly match paths on macOS where/tmp → /private/tmpModified files
ScriptIsolation/Isolation.csPrepareLockOptionsandResolveLockOptions— the degradation policy when a directory with full lock support cannot be foundScriptIsolation/LockOptions.csIsFullySupported/IsSupported/Supports(LockType)toLockFile; addsFromScriptIsolationOptionsOrNullpath throughLockDirectory.GetLockDirectoryScriptIsolation/FileLock.csLockFilerecord shapePlumbing/Commands/CommonOptions.csscriptIsolationSharedFallbackCLI option andPromoteToExclusiveLockWhenSharedLockUnavailablepropertyDecision trees
Phase 1 — Lock directory selection (
LockDirectory.GetLockDirectory)flowchart TD Start([Candidate path]) --> MapDrive[Map candidate to its drive] MapDrive --> CandidateKnown{Candidate LockSupport?} CandidateKnown -- Supported --> ReturnCandidate([Return candidate\nLockCapability.Supported]) CandidateKnown -- Unknown or Unsupported --> ProbeCandidateDrive ProbeCandidateDrive[DetectLockSupport on candidate drive] ProbeCandidateDrive --> CandidateProbeResult{Candidate\ndetected capability?} CandidateProbeResult -- Supported --> ReturnCandidate2([Return candidate\nLockCapability.Supported]) CandidateProbeResult -- ExclusiveOnly or Unsupported or Unknown --> IterateFallbacks IterateFallbacks[Iterate temp fallback directories] IterateFallbacks --> HasTemp{More temp\ncandidates?} HasTemp -- No --> AnyExclusiveTemp HasTemp -- Yes --> MapTempDrive[Map temp dir to its drive] MapTempDrive --> ProbeTemp[DetectLockSupport on temp drive] ProbeTemp --> TempResult{Temp capability?} TempResult -- Supported --> ReturnTemp([Return temp dir\nLockCapability.Supported]) TempResult -- ExclusiveOnly --> RecordFirstExclusive[Record as best-so-far\nif none recorded] TempResult -- Unsupported or Unknown --> HasTemp AnyExclusiveTemp{Any ExclusiveOnly\ntemp found?} AnyExclusiveTemp -- Yes, and candidate is ExclusiveOnly --> ReturnCandidate3([Return candidate\nLockCapability.ExclusiveOnly]) AnyExclusiveTemp -- Yes, and candidate is Unsupported/Unknown --> ReturnBestTemp([Return best temp dir\nLockCapability.ExclusiveOnly]) AnyExclusiveTemp -- No --> ReturnUnsupported([Return candidate\nLockCapability.Unsupported])Phase 2 — Lock options resolution (
Isolation.ResolveLockOptions)flowchart TD Start([LockOptions\nType + LockDirectory capability]) --> FullySupported{IsFullySupported?\nShared + Exclusive work\ncorrectly} FullySupported -- Yes --> ReturnOriginal([Use requested lock\nNo warning]) FullySupported -- No --> IsSupported{IsSupported?\nRequested LockType\nis supported} IsSupported -- Yes, Exclusive requested --> PromoteFlag1{promoteToExclusiveLock?} PromoteFlag1 -- false --> ReturnExclusiveWarn([Use Exclusive lock\nWarn: shared scripts\nmay run concurrently]) PromoteFlag1 -- true --> ReturnExclusiveNoWarn([Use Exclusive lock\nNo warning]) IsSupported -- No, Shared requested\non ExclusiveOnly dir --> ExclusiveAvail{Supports\nExclusive?} ExclusiveAvail -- Yes --> PromoteFlag2{promoteToExclusiveLock?} PromoteFlag2 -- true --> PromoteToExclusive([Promote to Exclusive lock\nWarn: shared unavailable]) PromoteFlag2 -- false --> NoLockShared([No lock acquired\nWarn: shared unavailable]) ExclusiveAvail -- No --> NoLockUnsupported([No lock acquired\nWarn: no isolation available])Areas especially relevant for review
CachedDriveInfo.DetectLockSupport— live filesystem probingFour sequential probe operations are run against a temporary
.tmpfile in the lock directory to empirically verify lock semantics. The probes useTimeSpan.Zerotimeout so they fail fast if the filesystem rejects the operation. The test file is always cleaned up in afinallyblock. Any exception during probing is treated conservatively asUnsupported. Reviewers should check whether the probe-file cleanup is sufficient and whether theTimeSpan.Zerostrategy is reliable across all target filesystems.PathResolutionServiceExtensions.ResolvePath— ancestor-walk symlink resolutionThe ancestor-walk is necessary because a lock directory might not exist yet at the time of drive matching (e.g. first ever run). The algorithm peels off non-existent path segments from the end, resolves the nearest existing ancestor's symlink, then re-attaches the original tail. This is especially important on macOS where
Path.GetFullPath("/tmp/x")returns/tmp/xbut the drive is mounted at/private/tmp. All fiveGetFullPathexception types are caught and cause the input to be returned unchanged.MountedDrives.GetAssociatedDrive— longest-prefix mount matchingThe match is done on the symlink-resolved path, with platform-appropriate case sensitivity (
OrdinalIgnoreCaseon Windows/macOS,Ordinalon Linux). Reviewers should verify theStartsWithprefix check correctly handles the case where the drive root IS the path (no trailing separator needed sinceRootDirectory.FullNameends with a separator on all .NET platforms).TemporaryDirectoryFallback.GetCandidates— platform-specific candidate orderOn Linux,
/dev/shmis included as a candidate after/tmp.tmpfs-backed/dev/shmis one of the most reliably lock-capable paths on systems where the working directory is NFS-mounted. The candidates are namespace-scoped using the last segment of the candidate path to prevent cross-deployment lock file collisions.CommonOptions.ScriptIsolationOptions— new CLI parameterThe
scriptIsolationSharedFallbackoption is additive and defaults to the safe value (Exclusive). Callers that do not pass the option getPromoteToExclusiveLockWhenSharedLockUnavailable = true.Test coverage
LockDirectoryFixture— 1,063 lines, all groups taggedRunOnceOnWindowsAndLinuxAll tests in this fixture use injected fakes (
FakeLockService,FakePathResolutionService) and run entirely in-memory — no real filesystem access.LockDirectory.SupportsLockCapability×LockTypecombinations (6 cases)CachedDriveInfo.LockSupportstaticapfs,ntfs,ext4, etc.) →Supported; network drives →Unknown;DetectedLockSupportoverrides (13 cases)CachedDriveInfo.DetectLockSupportUnsupported, three variants ofExclusiveOnly(broken shared, broken exclusive-blocks-shared, broken shared-blocks-exclusive), andSupported(4 cases)LockDirectory.GetLockDirectorySupported; candidateUnknown→ detectedSupportedbefore temps inspected (candidate preferred over aSupportedtemp); candidateUnsupported→Supportedtemp used; candidateUnknown→ detectedSupportedbeforeExclusiveOnlytemps inspected; both candidate and temp areExclusiveOnly→ stay on candidate; tempExclusiveOnly+ candidateUnsupported→ use temp; nothing works →Unsupported; empty drive table; temp with no associated drive skipped; firstSupportedtemp wins when multiple candidatesMountedDrives.GetAssociatedDriveDirectoryNotFoundExceptionwhen no match; ancestor symlink resolved when child doesn't existDefaultPathResolutionServicePathResolutionServiceExtensions.ResolvePathGetFullPathexception types caught (5 parameterised), non-existent path, existing non-symlink, symlink followed, ancestor symlink + tail re-attachedIsolationFixture— extended withResolveLockOptionsunit testsIsolation.ResolveLockOptions:Supported→ original returned;ExclusiveOnly+ Exclusive + promote=false → warning;ExclusiveOnly+ Exclusive + promote=true → no warning;ExclusiveOnly+ Shared + promote=true → promoted to Exclusive + warning;ExclusiveOnly+ Shared + promote=false → null + warning;Unsupported+ any request → null + warningLockOptionsFixtureFromScriptIsolationOptionsOrNull_BuildsCorrectLockFileNameupdated to assert onLockFile.File.Namerather than the full path (since the lock directory is now resolved dynamically rather than being a fixed subdirectory ofTentacleHome).What is not covered by unit tests
LockDirectory.GetLockDirectoryagainst a real NFS or FUSE mount. The live probing logic is verified against a real (local) filesystem viaIsolationFixtureand Group F inLockDirectoryFixture, but the specific scenario of a network filesystem detecting asUnsupportedand falling back to/tmpis exercised only through the fake in Group D.%LOCALAPPDATA%\Calamari,%TEMP%) are not covered by integration tests.