Conversation
ca2cf06 to
e49c2f6
Compare
source/Calamari.Common/Features/Processes/ScriptIsolation/FileLock.cs
Outdated
Show resolved
Hide resolved
| LockType.Exclusive => FileShare.None, | ||
| LockType.Shared => FileShare.ReadWrite, |
There was a problem hiding this comment.
I opted to use a custom enum to map to the appropriate FileShare value used in the underlying lock. This keeps the intended usage clearer.
| static readonly TimeSpan RetryInitialDelay = TimeSpan.FromMilliseconds(10); | ||
| static readonly TimeSpan RetryMaxDelay = TimeSpan.FromMilliseconds(500); |
There was a problem hiding this comment.
We don't need to use a polling strategy in Tentacle's ScriptIsolationMutex.
There was a problem hiding this comment.
Do you mean we should change Tentacle?
There was a problem hiding this comment.
No, this comment was supposed to show that this lock mechanism has to poll, whereas the Tentacle based one does not. Therefore this is new and these values are not drawn from anywhere else beyond being what I would consider to be reasonabel starting points.
source/Calamari.Common/Features/Processes/ScriptIsolation/Isolation.cs
Outdated
Show resolved
Hide resolved
a181073 to
ccf545f
Compare
ccf545f to
d50b0b5
Compare
Pass isolation options as parameters Add unit tests for script isolation Update test to pass options
d50b0b5 to
a5c54e5
Compare
gb-8
left a comment
There was a problem hiding this comment.
Looks good.
I think we need to tests to confirm that a lock can be acquired when another one is released, and that we will wait for a period for this to happen.
source/Calamari.Common/Features/Processes/ScriptIsolation/FileLock.cs
Outdated
Show resolved
Hide resolved
| static readonly TimeSpan RetryInitialDelay = TimeSpan.FromMilliseconds(10); | ||
| static readonly TimeSpan RetryMaxDelay = TimeSpan.FromMilliseconds(500); |
There was a problem hiding this comment.
Do you mean we should change Tentacle?
source/Calamari.Common/Features/Processes/ScriptIsolation/LockOptions.cs
Show resolved
Hide resolved
source/Calamari.Common/Features/Processes/ScriptIsolation/LockOptions.cs
Outdated
Show resolved
Hide resolved
gb-8
left a comment
There was a problem hiding this comment.
I've left a few comments that we need to discuss.
| } | ||
|
|
||
| var lockFileInfo = GetLockFileInfo(options.TentacleHome, options.MutexName); | ||
| _ = lockFileInfo.Exists; |
There was a problem hiding this comment.
What's going on here?
If this is required to force caching or some other subtle behaviour, then we need a comment to explain that.
Otherwise we can remove it.
There was a problem hiding this comment.
Good catch. This is a remnant to try and force FileInfo to actually throw exceptions due to a bad path.
source/Calamari.Common/Features/Processes/ScriptIsolation/FileLock.cs
Outdated
Show resolved
Hide resolved
| static ResiliencePipeline<ILockHandle> BuildLockAcquisitionPipeline(LockOptions lockOptions) | ||
| { | ||
| var builder = new ResiliencePipelineBuilder<ILockHandle>(); | ||
| // Timeout must be between 10ms and 1 day. (Polly) |
There was a problem hiding this comment.
What does (Polly) mean here? Are you saying those timeout limits are imposed by Polly? If so, can you link to the docs?
There is a big difference between timing out after 1 day and never timing out. If we really cannot support longer timeouts, then probably we should cap the timeout rather than removing it.
Let's discuss.
There was a problem hiding this comment.
Yes, this is a limit imposed by Polly: https://www.pollydocs.org/api/Polly.Timeout.TimeoutStrategyOptions.html#Polly_Timeout_TimeoutStrategyOptions_Timeout
Happy to switch this to cap the timeout. Will discuss.
There was a problem hiding this comment.
Since timeout can be user-defined, I think we need to find a solution that will support timeouts >24h.
There was a problem hiding this comment.
Apparently the constraint on the timeout only applies when setting the timeout directly. By using the timeout provider, you can specify higher values, although it is still limited by CancellationTokenSource which is ~25 days. The same limitation applies to existing script isolation implementations (Tentacle and Server-based SSH implementation)
There was a problem hiding this comment.
Cool. I reckon that is plenty high enough.
| var methodBody = runMethod!.GetMethodBody(); | ||
| methodBody.Should().NotBeNull(); | ||
|
|
||
| var localsContainILockHandle = methodBody!.LocalVariables |
There was a problem hiding this comment.
Can we not test this by executing two Calamari processes and verifying isolation behaviour?
That would be a harder test to write and slower, but would actually verify behaviour.
There was a problem hiding this comment.
I've added integration tests now
| results[1].ExitCode.Should().Be(0, | ||
| "Process2 should succeed. Output: " + results[1].CapturedOutput); | ||
|
|
||
| // With NoIsolation (shared locks), both processes should have run. |
There was a problem hiding this comment.
I can see how asserting on overlap could be flaky.
Have you manually verified that they do overlap most of the time?
gb-8
left a comment
There was a problem hiding this comment.
I think all issues have been addressed, but the PR needs updating with an account of what testing has been done, in particular with respect to exercising platform specific Hresult values for checking on lock failures.
I've added the following to the PR description:
These unit tests cover the HResult values. |
gb-8
left a comment
There was a problem hiding this comment.
✅ Pre-approved (but we're still waiting for confirmation of CI builds covering Windows).

Overview
This change provides an implementation to enforce script isolation within Calamari.
Currently, Tentacle provides script isolation functionality, by maintaining a series of in-process locks.
For SSH, this is achieved by performing the locking on the Octopus Instance running the script. This introduces issues for multi-node clusters as the lock is limited to a single node. Our execution scaling strategies rely on being able to create additional nodes to process service bus messages, so this needs to be addressed to support SSH targets and workers.
This implementation uses a file locking approach to share the lock across multiple processes.
Result
When the appropriate common arguments are supplied, Calamari will acquire an appropriate lock before running the relevant command.
The relevant common arguments are:
scriptIsolationLevelwhich defines the script isolation level Calamari should enforce. This must be one of two values (case insensitive):FullIsolationto run the command independent of all other commands andNoIsolationto allow the command to run concurrently with otherNoIsolationcommands. This argument is required for isolation to be enforced.scriptIsolationMutexNamedefines a mutex name to use. Only other commands running with the same mutex name will be isolated. This name should be valid to use in a file name. This argument is required for isolation to be enforced.scriptIsolationTimeout(optional) defines a timeout to use when waiting for the mutex. A timeout value should be greater than 10ms and less than 1 day. Providing a value outside of this range will remove the timeout and either reduce the number of retries or retry indefinitely. If not provided, the script will wait indefinitely for the mutex.Additionally,
TentacleHomeis used as a location to store the files used for the lock. This directory resolves to something like/home/username/.octopus/machine-slug. This means that if the same machine is registered with a different slug it will use independent isolation locks.Testing
Current Caveats
Miscellaneous