Implement Script Isolation by rhysparry · Pull Request #1803 · OctopusDeploy/Calamari

rhysparry · 2026-02-25T03:54:22Z

Overview

This change provides an implementation to enforce script isolation within Calamari.

Currently, Tentacle provides script isolation functionality, by maintaining a series of in-process locks.

For SSH, this is achieved by performing the locking on the Octopus Instance running the script. This introduces issues for multi-node clusters as the lock is limited to a single node. Our execution scaling strategies rely on being able to create additional nodes to process service bus messages, so this needs to be addressed to support SSH targets and workers.

This implementation uses a file locking approach to share the lock across multiple processes.

Result

When the appropriate common arguments are supplied, Calamari will acquire an appropriate lock before running the relevant command.

The relevant common arguments are:

scriptIsolationLevel which defines the script isolation level Calamari should enforce. This must be one of two values (case insensitive): FullIsolation to run the command independent of all other commands and NoIsolation to allow the command to run concurrently with other NoIsolation commands. This argument is required for isolation to be enforced.
scriptIsolationMutexName defines a mutex name to use. Only other commands running with the same mutex name will be isolated. This name should be valid to use in a file name. This argument is required for isolation to be enforced.
scriptIsolationTimeout (optional) defines a timeout to use when waiting for the mutex. A timeout value should be greater than 10ms and less than 1 day. Providing a value outside of this range will remove the timeout and either reduce the number of retries or retry indefinitely. If not provided, the script will wait indefinitely for the mutex.

Additionally, TentacleHome is used as a location to store the files used for the lock. This directory resolves to something like /home/username/.octopus/machine-slug. This means that if the same machine is registered with a different slug it will use independent isolation locks.

Testing

This PR includes unit and integration tests validating lock behaviour
CI tests are configured to cover Windows and Linux platforms
MacOS has been tested locally

Current Caveats

This approach only isolates the Calamari component of the execution, not additional tasks, such as extracting and installing Calamari in the first place.
The underlying filesystem in use may not properly support locking.

Miscellaneous

Resolves EFT-251
Mini-RFC: Script Isolation in multi-node configurations

source/Calamari.Common/Features/Processes/ScriptIsolation/FileLock.cs

rhysparry · 2026-02-25T04:00:55Z

source/Calamari.Common/Features/Processes/ScriptIsolation/FileLock.cs

+            LockType.Exclusive => FileShare.None,
+            LockType.Shared => FileShare.ReadWrite,


I opted to use a custom enum to map to the appropriate FileShare value used in the underlying lock. This keeps the intended usage clearer.

rhysparry · 2026-02-25T04:04:36Z

source/Calamari.Common/Features/Processes/ScriptIsolation/Isolation.cs

+    static readonly TimeSpan RetryInitialDelay = TimeSpan.FromMilliseconds(10);
+    static readonly TimeSpan RetryMaxDelay = TimeSpan.FromMilliseconds(500);


We don't need to use a polling strategy in Tentacle's ScriptIsolationMutex.

Do you mean we should change Tentacle?

No, this comment was supposed to show that this lock mechanism has to poll, whereas the Tentacle based one does not. Therefore this is new and these values are not drawn from anywhere else beyond being what I would consider to be reasonabel starting points.

source/Calamari.Common/Features/Processes/ScriptIsolation/Isolation.cs

source/Calamari.Common/CalamariFlavourProgram.cs

Pass isolation options as parameters Add unit tests for script isolation Update test to pass options

gb-8

Looks good.
I think we need to tests to confirm that a lock can be acquired when another one is released, and that we will wait for a period for this to happen.

source/Calamari.Common/Features/Processes/ScriptIsolation/FileLock.cs

gb-8 · 2026-03-06T05:53:43Z

source/Calamari.Common/Features/Processes/ScriptIsolation/Isolation.cs

+    static readonly TimeSpan RetryInitialDelay = TimeSpan.FromMilliseconds(10);
+    static readonly TimeSpan RetryMaxDelay = TimeSpan.FromMilliseconds(500);


Do you mean we should change Tentacle?

source/Calamari.Common/Features/Processes/ScriptIsolation/LockOptions.cs

source/Calamari.Common/Plumbing/Commands/CommonOptions.cs

source/Calamari.Tests/Fixtures/ScriptIsolation/FileLockFixture.cs

gb-8

I've left a few comments that we need to discuss.

gb-8 · 2026-03-10T06:49:57Z

source/Calamari.Common/Features/Processes/ScriptIsolation/LockOptions.cs

+        }
+
+        var lockFileInfo = GetLockFileInfo(options.TentacleHome, options.MutexName);
+        _ = lockFileInfo.Exists;


What's going on here?
If this is required to force caching or some other subtle behaviour, then we need a comment to explain that.
Otherwise we can remove it.

Good catch. This is a remnant to try and force FileInfo to actually throw exceptions due to a bad path.

source/Calamari.Common/Features/Processes/ScriptIsolation/FileLock.cs

gb-8 · 2026-03-10T07:07:04Z

source/Calamari.Common/Features/Processes/ScriptIsolation/Isolation.cs

+    static ResiliencePipeline<ILockHandle> BuildLockAcquisitionPipeline(LockOptions lockOptions)
+    {
+        var builder = new ResiliencePipelineBuilder<ILockHandle>();
+        // Timeout must be between 10ms and 1 day. (Polly)


What does (Polly) mean here? Are you saying those timeout limits are imposed by Polly? If so, can you link to the docs?

There is a big difference between timing out after 1 day and never timing out. If we really cannot support longer timeouts, then probably we should cap the timeout rather than removing it.

Let's discuss.

Yes, this is a limit imposed by Polly: https://www.pollydocs.org/api/Polly.Timeout.TimeoutStrategyOptions.html#Polly_Timeout_TimeoutStrategyOptions_Timeout

Happy to switch this to cap the timeout. Will discuss.

Since timeout can be user-defined, I think we need to find a solution that will support timeouts >24h.

Apparently the constraint on the timeout only applies when setting the timeout directly. By using the timeout provider, you can specify higher values, although it is still limited by CancellationTokenSource which is ~25 days. The same limitation applies to existing script isolation implementations (Tentacle and Server-based SSH implementation)

Cool. I reckon that is plenty high enough.

gb-8 · 2026-03-10T07:13:32Z

source/Calamari.Tests/Fixtures/ScriptIsolation/ScriptIsolationBinaryFixture.cs

+            var methodBody = runMethod!.GetMethodBody();
+            methodBody.Should().NotBeNull();
+
+            var localsContainILockHandle = methodBody!.LocalVariables


Can we not test this by executing two Calamari processes and verifying isolation behaviour?
That would be a harder test to write and slower, but would actually verify behaviour.

I've added integration tests now

gb-8 · 2026-03-18T00:06:11Z

source/Calamari.Scripting.Tests/ScriptIsolationAsyncIntegrationFixture.cs

+            results[1].ExitCode.Should().Be(0,
+                "Process2 should succeed. Output: " + results[1].CapturedOutput);
+
+            // With NoIsolation (shared locks), both processes should have run.


I can see how asserting on overlap could be flaky.
Have you manually verified that they do overlap most of the time?

A quick way to eyeball this is to compare the execution times. Each script has a 2 second sleep.

The timings indicate concurrency as sequentially would take 4 seconds plus overhead and this was less than that.

gb-8

I think all issues have been addressed, but the PR needs updating with an account of what testing has been done, in particular with respect to exercising platform specific Hresult values for checking on lock failures.

rhysparry · 2026-03-18T00:29:40Z

I think all issues have been addressed, but the PR needs updating with an account of what testing has been done, in particular with respect to exercising platform specific Hresult values for checking on lock failures.

I've added the following to the PR description:

Testing

This PR includes unit and integration tests validating lock behaviour

CI tests are configured to cover Windows and Linux platforms

MacOS has been tested locally

These unit tests cover the HResult values.

gb-8

✅ Pre-approved (but we're still waiting for confirmation of CI builds covering Windows).

rhysparry force-pushed the rhys/eft-251/script-isolation branch from ca2cf06 to e49c2f6 Compare February 25, 2026 03:58

rhysparry commented Feb 25, 2026

View reviewed changes

rhysparry force-pushed the rhys/eft-251/script-isolation branch 4 times, most recently from a181073 to ccf545f Compare March 5, 2026 01:29

rhysparry marked this pull request as ready for review March 5, 2026 01:29

rhysparry self-assigned this Mar 5, 2026

rhysparry force-pushed the rhys/eft-251/script-isolation branch from ccf545f to d50b0b5 Compare March 5, 2026 04:35

Add script isolation enforcement

a5c54e5

Pass isolation options as parameters Add unit tests for script isolation Update test to pass options

rhysparry force-pushed the rhys/eft-251/script-isolation branch from d50b0b5 to a5c54e5 Compare March 6, 2026 01:00

gb-8 reviewed Mar 6, 2026

View reviewed changes

rhysparry added 3 commits March 9, 2026 09:41

Test exclusive blocks other exclusive + shared

5761616

Add tests to verify isolation will wait for lock

3abb2cb

Improve LockOptions validation and logging

2b77070

rhysparry requested a review from gb-8 March 9, 2026 03:26

rhysparry added 2 commits March 9, 2026 14:58

Don't log error if timeout not specified

1105a97

Enforce filename validation in mutex name

8e90250

gb-8 reviewed Mar 10, 2026

View reviewed changes

rhysparry added 7 commits March 11, 2026 11:48

Remove extraneous Exists check

04d1e1a

Add integration tests for script isolation

d01a670

More specific IOException handling

ebabd6c

Add tests that validate timeout

68bd278

Support timeout > 24 hours

0de995b

Assign test categories

8647e7d

Add missing test category

a9eb219

gb-8 reviewed Mar 18, 2026

View reviewed changes

gb-8 approved these changes Mar 18, 2026

View reviewed changes

Remove TestCategory across tests to run linux/win

3a7f692

		LockType.Exclusive => FileShare.None,
		LockType.Shared => FileShare.ReadWrite,

		static readonly TimeSpan RetryInitialDelay = TimeSpan.FromMilliseconds(10);
		static readonly TimeSpan RetryMaxDelay = TimeSpan.FromMilliseconds(500);

Conversation

rhysparry commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Result

Testing

Current Caveats

Miscellaneous

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gb-8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gb-8 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gb-8 left a comment

Choose a reason for hiding this comment

Uh oh!

rhysparry commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

gb-8 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rhysparry commented Feb 25, 2026 •

edited

Loading

rhysparry commented Mar 18, 2026 •

edited

Loading