Fsync data files before commit for crash-consistency & delete orphan files after refresh on merge#21898
Fsync data files before commit for crash-consistency & delete orphan files after refresh on merge#21898Bukhtawar wants to merge 2 commits into
Conversation
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
PR Reviewer Guide 🔍(Review updated until commit 4bec9c0)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 4bec9c0 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 4436566
Suggestions up to commit e6e44a8
Suggestions up to commit 6706ddd
Suggestions up to commit 950424d
Suggestions up to commit 3d132ad
|
67ac46c to
37dc0f4
Compare
|
Persistent review updated to latest commit 37dc0f4 |
37dc0f4 to
a6db8b1
Compare
|
Persistent review updated to latest commit a6db8b1 |
a6db8b1 to
d9c2923
Compare
|
Persistent review updated to latest commit d9c2923 |
d9c2923 to
6bb3bad
Compare
|
Persistent review updated to latest commit 6bb3bad |
6bb3bad to
8ce8382
Compare
|
Persistent review updated to latest commit 8ce8382 |
8ce8382 to
6925a18
Compare
|
Persistent review updated to latest commit 6925a18 |
6925a18 to
81c4c9e
Compare
|
Persistent review updated to latest commit 81c4c9e |
81c4c9e to
b7ab71d
Compare
|
Persistent review updated to latest commit 8de53eb |
|
❌ Gradle check result for 8de53eb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
8de53eb to
3d132ad
Compare
|
Persistent review updated to latest commit 3d132ad |
3d132ad to
950424d
Compare
|
Persistent review updated to latest commit 950424d |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #21898 +/- ##
============================================
+ Coverage 73.36% 73.38% +0.01%
- Complexity 75430 75492 +62
============================================
Files 6034 6034
Lines 342604 342629 +25
Branches 49279 49287 +8
============================================
+ Hits 251357 251440 +83
- Misses 71220 71233 +13
+ Partials 20027 19956 -71 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
950424d to
6706ddd
Compare
|
Persistent review updated to latest commit 6706ddd |
87f5e43 to
e6e44a8
Compare
|
Persistent review updated to latest commit 87f5e43 |
|
Persistent review updated to latest commit e6e44a8 |
e6e44a8 to
4436566
Compare
|
Persistent review updated to latest commit 4436566 |
|
❌ Gradle check result for 4436566: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 4436566: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
| List<String> indexFiles = new ArrayList<>(); | ||
| Set<Path> subdirectoriesToSync = new HashSet<>(); | ||
| Path indexDir = shardPath.resolveIndex(); | ||
| for (String name : names) { | ||
| Path resolved = fsDataPath.resolve(parseFilePath(name)); | ||
| if (resolved.startsWith(indexDir)) { | ||
| indexFiles.add(resolved.toString()); | ||
| } else { | ||
| Path normalized = resolved.normalize(); | ||
| if (normalized.startsWith(fsDataPath) == false) { | ||
| throw new IOException("Path traversal detected: " + name + " resolves outside shard data path"); | ||
| } | ||
| IOUtils.fsync(normalized, false); | ||
| subdirectoriesToSync.add(normalized.getParent()); | ||
| } | ||
| } | ||
| if (indexFiles.isEmpty() == false) { | ||
| super.sync(indexFiles); | ||
| } | ||
| for (Path dir : subdirectoriesToSync) { | ||
| IOUtils.fsync(dir, true); | ||
| } |
There was a problem hiding this comment.
Do we need this separately? won't sync calling parseFilePath handle it to exactly sync the red files?
There was a problem hiding this comment.
Mostly to handle mock filesystems. For subdirectory files, parseFilePath resolves to an absolute path outside the FSDirectory root. FSDirectory.fsync() then tries to resolve that against its own root. This fails on Lucene's mock test filesystem because the path crosses filesystem provider boundaries, hence IOUtils.fsync() on the resolved path
| for (Path dir : subdirectoriesToSync) { | ||
| IOUtils.fsync(dir, true); | ||
| } |
There was a problem hiding this comment.
This i think we should keep to fsync sub dirs.
There was a problem hiding this comment.
Agreed, I have moved it into syncMetaData() so the responsibility is clear: sync() handles file data, syncMetaData() handles all directory entries
|
❌ Gradle check result for 4436566: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 4436566: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The sync call in LuceneCommitter was passing getFiles(true) which includes segments_N. This file is already fsync'd by IndexWriter.commit() and is not accessible through the store directory at the format-prefixed path, causing NoSuchFileException on any flush after the initial commit. Changed to getFiles(false) so only format-specific data files (parquet, lucene segment data) are explicitly fsync'd — segments_N remains handled by Lucene's own commit path. Added CompositeRemoteStoreFsyncRecoveryIT with 4 test cases: - Primary local recovery with fsync (committed files survive restart) - Replica recovery from remote store + incremental translog replay - Full cluster restart with primary local + replica remote recovery - Merge-on-refresh merged files survive restart via fsync Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
4436566 to
4bec9c0
Compare
|
Persistent review updated to latest commit 4bec9c0 |
Description
Before the fsync change:
The problem:
The commit point (segments_N) contains the serialized CatalogSnapshot in its user data. That catalog references Parquet files by name (e.g., parquet/_parquet_file_generation_1.parquet). After indexWriter.commit() returns, the commit
point is durable on disk, but the Parquet file it references may still only be in page cache.
If the node crashes in that window:
Additionally merge on refresh resulted in original files(created by child index writers) prior to the merge to be orphaned and kept lingering on the disk, not freeing up storage, the change now fixes the orphan files
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.