[Comgr/Caching/Support] Don't use mmap in localCache, on NFS#3049
Open
jmmartinez wants to merge 1 commit into
Open
[Comgr/Caching/Support] Don't use mmap in localCache, on NFS#3049jmmartinez wants to merge 1 commit into
mmap in localCache, on NFS#3049jmmartinez wants to merge 1 commit into
Conversation
When the cache directory is on an NFS, we may hit a SIGBUS signal when there is contention on the cache. This happens typically with MPI jobs running on an NFS mounted $HOME directory. With NFS, the attributes are stored in a cache. The mmap call may succeed, but when we try to access memory returned from this call we may hit a SIGBUS because the file was modified by another process on another machine. If we execute through the regular `read()` path, we also fail with a `Stale file descriptor` error. But at least this path is properly handled and we can continue without using the cache. To achieve this, we make: 1) `getOpenFileImpl()` not use `mmap` if `IsVolatile` is `true`, 2) We make the `localCache` pass `IsVolatile=true` if the cache is mounted on an NFS directory. Notice that (1) may affect other paths appart from the `localCache`. But from what I checked, `IsVolatile` is only `true` for tools like `clangd` on user files. The rest either use `IsVolatile` equal to `false`. This fix is worth upstraming, but I haven't done so yet because I want to explore other solutions before commiting to it.
Collaborator
|
Given that LCOMPILER-2307 is a P2, can we attempt an upstream fix first? I think it's ok if it requires some iteration |
|
Hmm, tend to agree with @lamb-j here since the downstream customer has a temporary fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the cache directory is on an NFS, we may hit a
SIGBUSsignal when there is contention on the cache.This happens typically with MPI jobs running on an NFS mounted
$HOMEdirectory.With NFS, the attributes are stored in a cache. The mmap call may succeed, but when we try to access memory returned from this call we may hit a
SIGBUSbecause the file was modified by another process on another machine.If we execute through the regular
read()path, we also fail with aStale file descriptorerror. But at least this path is properly handled and we can continue without using the cache.To achieve this, we make:
getOpenFileImpl()not usemmapifIsVolatileistrue,localCachepassesIsVolatile=trueif the cache ismounted on an NFS directory.
Notice that (1) may affect other paths appart from the
localCache. But from what I checked,IsVolatileis onlytruefor tools likeclangdon user files. The rest either useIsVolatileequal tofalse.This fix is worth upstreaming with some extra work, but I haven't done so yet because I want to explore other solutions before committing to it.
Related to LCOMPILER-2307.