Skip to content

Fsync data files before commit for crash-consistency & delete orphan files after refresh on merge#21898

Open
Bukhtawar wants to merge 2 commits into
opensearch-project:mainfrom
Bukhtawar:refresh-consolidation-v2
Open

Fsync data files before commit for crash-consistency & delete orphan files after refresh on merge#21898
Bukhtawar wants to merge 2 commits into
opensearch-project:mainfrom
Bukhtawar:refresh-consolidation-v2

Conversation

@Bukhtawar
Copy link
Copy Markdown
Contributor

@Bukhtawar Bukhtawar commented May 29, 2026

Description

Before the fsync change:

  1. LuceneCommitter.commit() calls indexWriter.commit()
  2. IndexWriter.commit() fsyncs only files it manages — files in its own Directory (/index/): segments_N, .si, .cfs, .doc, etc.
  3. Parquet files live in /parquet/ — a completely separate directory that Lucene has zero awareness of
  4. These Parquet files were written by RustBridge.finalizeWriter() during refresh → landed in OS page cache → never explicitly fsync'd

The problem:

The commit point (segments_N) contains the serialized CatalogSnapshot in its user data. That catalog references Parquet files by name (e.g., parquet/_parquet_file_generation_1.parquet). After indexWriter.commit() returns, the commit
point is durable on disk, but the Parquet file it references may still only be in page cache.

If the node crashes in that window:

  • Commit point exists on disk: "this shard's data includes _parquet_file_generation_1.parquet"
  • But that Parquet file is truncated/missing/corrupt (never made it from page cache to disk)
  • On recovery → engine opens → reads commit → tries to use Parquet file → data loss or corruption

Additionally merge on refresh resulted in original files(created by child index writers) prior to the merge to be orphaned and kept lingering on the disk, not freeing up storage, the change now fixes the orphan files

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
@Bukhtawar Bukhtawar requested a review from a team as a code owner May 29, 2026 23:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

PR Reviewer Guide 🔍

(Review updated until commit 4bec9c0)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 Security concerns

Path traversal:
The SubdirectoryAwareDirectory.sync() method attempts to prevent path traversal by checking if the resolved path starts with fsDataPath. However, this check relies on parseFilePath() to sanitize inputs. If parseFilePath() does not fully normalize or validate file names (e.g., allowing .. sequences), an attacker could craft a malicious file name that resolves outside the shard data path, enabling unauthorized file access or modification. The check at line 108 may not be sufficient if parseFilePath() is weak.

✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The pendingDeletes map is reassigned on every tryDeletePendingFiles() call. If deleteFiles() returns a non-empty map (files that failed to delete), the old pendingDeletes is discarded, losing track of previously failed deletions. This can cause orphaned files to accumulate indefinitely if deletion repeatedly fails for certain files.

    pendingDeletes = deleteFiles(pendingDeletes);
}
Possible Issue

The pendingDeletes map is a ConcurrentHashMap, but computeIfAbsent is called with a lambda that returns a mutable ArrayList. If multiple threads call refresh() concurrently and both compute the same key, they will share the same ArrayList instance without synchronization, leading to potential data corruption or ConcurrentModificationException.

pendingDeletes.computeIfAbsent(pendingDeletionPerFormat.getKey(), k -> new ArrayList<>())
    .addAll(pendingDeletionPerFormat.getValue());
Path Traversal

The path traversal check at line 108 uses startsWith() on normalized paths, but parseFilePath() may not fully sanitize malicious inputs like ../../etc/passwd. If parseFilePath() allows directory traversal sequences, an attacker could fsync arbitrary files outside the shard data path, potentially causing data corruption or denial of service.

Path normalized = resolved.normalize();
if (normalized.startsWith(fsDataPath) == false) {
    throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
}
Resource Leak

If indexWriter.close() throws an exception, the finally block still executes store.decRef(), which is correct. However, if store.decRef() then throws (e.g., due to a bug or resource exhaustion), the original exception from indexWriter.close() is suppressed and lost. This makes debugging difficult and may hide the root cause of the failure.

try {
    indexWriter.close();
} finally {
    this.store.decRef();
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

PR Code Suggestions ✨

Latest suggestions up to 4bec9c0

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Strengthen path traversal validation

Path traversal validation using normalize() and startsWith() is insufficient for
security. Symbolic links can bypass this check. Use toRealPath() to resolve symbolic
links before validation, or verify that the canonical path is within the allowed
directory to prevent directory traversal attacks.

server/src/main/java/org/opensearch/index/store/SubdirectoryAwareDirectory.java [107-110]

-Path normalized = resolved.normalize();
-if (normalized.startsWith(fsDataPath) == false) {
+Path normalized = resolved.toRealPath();
+if (normalized.startsWith(fsDataPath.toRealPath()) == false) {
     throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical security issue. The current path traversal validation using normalize() and startsWith() can be bypassed via symbolic links. Using toRealPath() to resolve symbolic links before validation is essential to prevent directory traversal attacks that could allow access to files outside the intended directory.

High
Possible issue
Use thread-safe collection for concurrent access

The pendingDeletes map stores Collection values but uses ArrayList in
computeIfAbsent. Since ConcurrentHashMap allows concurrent modifications, the
ArrayList instances are not thread-safe and could lead to data corruption if
multiple threads modify the same collection. Use a thread-safe collection like
ConcurrentLinkedQueue or synchronize access.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [308-312]

 for (Map.Entry<String, Collection<String>> pendingDeletionPerFormat : deleteFiles(getFilesToDelete(onlyNew))
     .entrySet()) {
-    pendingDeletes.computeIfAbsent(pendingDeletionPerFormat.getKey(), k -> new ArrayList<>())
+    pendingDeletes.computeIfAbsent(pendingDeletionPerFormat.getKey(), k -> new ConcurrentLinkedQueue<>())
         .addAll(pendingDeletionPerFormat.getValue());
 }
Suggestion importance[1-10]: 8

__

Why: This is a valid concurrency issue. The ArrayList instances stored in pendingDeletes are not thread-safe, and concurrent modifications could lead to data corruption. Using ConcurrentLinkedQueue or another thread-safe collection is necessary for correctness in a concurrent environment.

Medium
General
Remove redundant volatile modifier

Using volatile with ConcurrentHashMap is redundant and may cause confusion. The
ConcurrentHashMap already provides thread-safe visibility guarantees for its
operations. Remove the volatile modifier to avoid unnecessary memory barriers and
clarify the concurrency model.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [82]

-private volatile Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
+private final Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that volatile is redundant with ConcurrentHashMap, but the issue is minor. The volatile modifier doesn't harm correctness, though removing it would clarify the concurrency model. However, changing to final may not be appropriate since the field is reassigned in tryDeletePendingFiles().

Low

Previous suggestions

Suggestions up to commit 4436566
CategorySuggestion                                                                                                                                    Impact
Security
Normalize path before security check

The path traversal check occurs after resolved.startsWith(indexDir) but before
normalization. An attacker could craft a path that passes the indexDir check but
still escapes fsDataPath after normalization. Normalize resolved before any security
checks to prevent path traversal vulnerabilities.

server/src/main/java/org/opensearch/index/store/SubdirectoryAwareDirectory.java [104-117]

 for (String name : names) {
-    Path resolved = fsDataPath.resolve(parseFilePath(name));
+    Path resolved = fsDataPath.resolve(parseFilePath(name)).normalize();
+    if (resolved.startsWith(fsDataPath) == false) {
+        throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
+    }
     if (resolved.startsWith(indexDir)) {
         indexFiles.add(resolved.toString());
     } else {
-        Path normalized = resolved.normalize();
-        if (normalized.startsWith(fsDataPath) == false) {
-            throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
-        }
-        IOUtils.fsync(normalized, false);
-        subdirectoriesToSync.add(normalized.getParent());
+        IOUtils.fsync(resolved, false);
+        subdirectoriesToSync.add(resolved.getParent());
     }
 }
Suggestion importance[1-10]: 9

__

Why: Critical security issue: path traversal check should occur before any other logic. The current code checks startsWith(indexDir) before normalization, allowing potential bypass. Normalizing first and checking against fsDataPath prevents path traversal attacks.

High
Possible issue
Ensure mutable collection for addAll

The computeIfAbsent returns a Collection but attempts to call addAll on it. Since
ConcurrentHashMap values are Collection, not necessarily List, this may fail at
runtime if the existing value is an immutable collection. Ensure the value is always
a mutable List.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [308-312]

 for (Map.Entry<String, Collection<String>> pendingDeletionPerFormat : deleteFiles(getFilesToDelete(onlyNew))
     .entrySet()) {
-    pendingDeletes.computeIfAbsent(pendingDeletionPerFormat.getKey(), k -> new ArrayList<>())
-        .addAll(pendingDeletionPerFormat.getValue());
+    pendingDeletes.compute(pendingDeletionPerFormat.getKey(), (k, v) -> {
+        List<String> list = v instanceof List ? (List<String>) v : new ArrayList<>(v != null ? v : List.of());
+        list.addAll(pendingDeletionPerFormat.getValue());
+        return list;
+    });
 }
Suggestion importance[1-10]: 7

__

Why: Valid concern about computeIfAbsent returning Collection<String> when addAll expects a mutable collection. The suggestion correctly identifies a potential runtime failure if the collection becomes immutable, though the current code initializes with ArrayList which is mutable.

Medium
General
Remove redundant volatile modifier

Using volatile with ConcurrentHashMap is redundant and may cause confusion. The
ConcurrentHashMap itself provides thread-safe operations, and the volatile keyword
only ensures visibility of the reference, not the map's contents. Remove volatile to
avoid misleading semantics.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [82]

-private volatile Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
+private final Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that volatile is redundant with ConcurrentHashMap for reference visibility. However, the field is reassigned in tryDeletePendingFiles() (line 352), so volatile ensures visibility of the reference change across threads. Changing to final would break this functionality.

Low
Suggestions up to commit e6e44a8
CategorySuggestion                                                                                                                                    Impact
Security
Path traversal check may be bypassed

Path traversal validation using startsWith can be bypassed with symbolic links or
relative paths. Use toRealPath() to resolve symbolic links before validation, or
verify that normalize() alone is sufficient for your security requirements in this
context.

server/src/main/java/org/opensearch/index/store/SubdirectoryAwareDirectory.java [105-108]

-Path resolved = fsDataPath.resolve(name).normalize();
-if (resolved.startsWith(fsDataPath) == false) {
+Path resolved = fsDataPath.resolve(name).normalize().toRealPath();
+if (resolved.startsWith(fsDataPath.toRealPath()) == false) {
     throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
 }
Suggestion importance[1-10]: 7

__

Why: Valid security concern about symbolic link handling in path traversal validation. Using toRealPath() would strengthen the check, though normalize() may be sufficient depending on the deployment environment. This is a moderate security improvement rather than a critical vulnerability fix.

Medium
General
Redundant volatile with ConcurrentHashMap

Using volatile with ConcurrentHashMap is redundant since ConcurrentHashMap already
provides thread-safe visibility guarantees. The volatile keyword adds no additional
safety here and may confuse readers about the synchronization strategy. Remove
volatile and rely on ConcurrentHashMap's built-in concurrency control.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [82]

-private volatile Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
+private final Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
Suggestion importance[1-10]: 5

__

Why: Correct observation that volatile is redundant with ConcurrentHashMap. However, the field is reassigned (pendingDeletes = deleteFiles(pendingDeletes)), so volatile ensures visibility of the reference change across threads. The suggestion to make it final would prevent reassignment and break the code. The current pattern is acceptable though not ideal.

Low
Suggestions up to commit 6706ddd
CategorySuggestion                                                                                                                                    Impact
Security
Strengthen path traversal validation

The path traversal check may fail for symbolic links or when fsDataPath itself
contains symbolic links. Use toRealPath() on both paths before comparison to ensure
accurate validation and prevent potential security vulnerabilities.

server/src/main/java/org/opensearch/index/store/SubdirectoryAwareDirectory.java [94-100]

 Path resolved = fsDataPath.resolve(name).normalize();
-if (resolved.startsWith(fsDataPath) == false) {
+Path realResolved = resolved.toRealPath();
+Path realFsDataPath = fsDataPath.toRealPath();
+if (realResolved.startsWith(realFsDataPath) == false) {
     throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
 }
-IOUtils.fsync(resolved, false);
+IOUtils.fsync(realResolved, false);
Suggestion importance[1-10]: 7

__

Why: Valid security improvement. Using toRealPath() provides stronger protection against symbolic link-based path traversal attacks. However, it may throw IOException if the path doesn't exist, which needs handling.

Medium
Possible issue
Prevent unsafe engine closure

After timeout or interruption, active merges may still be running when the engine
closes, potentially corrupting data. Consider forcibly canceling remaining merges or
throwing an exception to prevent unsafe closure.

server/src/main/java/org/opensearch/index/engine/DataFormatAwareEngine.java [1823-1831]

 mergeScheduler.shutdown();
 try {
     boolean drained = mergeScheduler.awaitMerges(30, java.util.concurrent.TimeUnit.SECONDS);
     if (drained == false) {
-        logger.warn("closeNoLock: timed out waiting for active merges to drain");
+        logger.error("closeNoLock: timed out waiting for active merges to drain - forcing shutdown");
+        throw new IllegalStateException("Cannot safely close engine with active merges");
     }
 } catch (InterruptedException e) {
     Thread.currentThread().interrupt();
+    throw new IllegalStateException("Interrupted while waiting for merges to complete", e);
 }
Suggestion importance[1-10]: 6

__

Why: Valid concern about active merges during shutdown. However, throwing an exception during close may cause resource leaks. A better approach might be to forcibly cancel merges or wait longer, rather than throwing exceptions that prevent cleanup of other resources.

Low
General
Remove redundant volatile modifier

Using volatile with ConcurrentHashMap is redundant since ConcurrentHashMap already
provides thread-safe visibility guarantees. The volatile keyword adds unnecessary
overhead and may cause confusion about the synchronization model.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [82]

-private volatile Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
+private final Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
Suggestion importance[1-10]: 5

__

Why: Correct observation that volatile is redundant with ConcurrentHashMap. However, the volatile keyword here ensures visibility of the reference itself being reassigned (line 352: pendingDeletes = deleteFiles(...)), not just the map's internal operations. The suggestion to use final would prevent this reassignment pattern.

Low
Suggestions up to commit 950424d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix thread-safety for pendingDeletes updates

The pendingDeletes map uses ConcurrentHashMap but the ArrayList values are not
thread-safe. If tryDeletePendingFiles() runs concurrently during this operation, it
could cause ConcurrentModificationException or data corruption. Use
ConcurrentHashMap.compute() or synchronize access to ensure thread-safe
modifications.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [308-312]

 for (Map.Entry<String, Collection<String>> pendingDeletionPerFormat : deleteFiles(getFilesToDelete(onlyNew))
     .entrySet()) {
-    pendingDeletes.computeIfAbsent(pendingDeletionPerFormat.getKey(), k -> new ArrayList<>())
-        .addAll(pendingDeletionPerFormat.getValue());
+    pendingDeletes.compute(pendingDeletionPerFormat.getKey(), (k, existing) -> {
+        Collection<String> result = existing != null ? new ArrayList<>(existing) : new ArrayList<>();
+        result.addAll(pendingDeletionPerFormat.getValue());
+        return result;
+    });
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a thread-safety issue where ArrayList values in pendingDeletes are not thread-safe, and concurrent access from tryDeletePendingFiles() could cause issues. The proposed solution using compute() with a new ArrayList copy ensures atomic updates and prevents concurrent modification exceptions.

Medium
Security
Strengthen path traversal validation

The path traversal check may be bypassed on Windows systems due to
case-insensitivity and path separator differences. Use resolved.toRealPath() with
NOFOLLOW_LINKS option and compare canonical paths to ensure robust security
validation across all platforms.

server/src/main/java/org/opensearch/index/store/SubdirectoryAwareDirectory.java [95-98]

-Path resolved = fsDataPath.resolve(name).normalize();
-if (resolved.startsWith(fsDataPath) == false) {
+Path resolved = fsDataPath.resolve(name).normalize().toRealPath(java.nio.file.LinkOption.NOFOLLOW_LINKS);
+Path canonicalBase = fsDataPath.toRealPath(java.nio.file.LinkOption.NOFOLLOW_LINKS);
+if (resolved.startsWith(canonicalBase) == false) {
     throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion improves security by using toRealPath() to resolve symbolic links and canonicalize paths, which strengthens path traversal protection. However, toRealPath() throws NoSuchFileException if the file doesn't exist, which may not be appropriate for all use cases in this context (e.g., when syncing files that are about to be created).

Medium
General
Remove redundant volatile with ConcurrentHashMap

Using a volatile reference to a ConcurrentHashMap is redundant and potentially
misleading. The volatile keyword ensures visibility of the reference itself, but
ConcurrentHashMap already provides thread-safe operations. Consider removing
volatile or using AtomicReference if you need atomic reference updates beyond what
ConcurrentHashMap provides.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [82]

-private volatile Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
+private final Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that volatile is redundant with ConcurrentHashMap for visibility guarantees. However, the code also reassigns pendingDeletes in tryDeletePendingFiles() (line 352), which requires volatile for safe publication of the new reference. The suggestion overlooks this reassignment pattern, making it partially incorrect.

Low
Suggestions up to commit 3d132ad
CategorySuggestion                                                                                                                                    Impact
Security
Prevent symlink-based path traversal

The path traversal check occurs after resolve() and normalize(), but symbolic links
could bypass this validation. An attacker could create a symlink inside fsDataPath
pointing outside, which would pass the startsWith() check but still access
unauthorized locations. Verify the real path or reject symlinks.

server/src/main/java/org/opensearch/index/store/SubdirectoryAwareDirectory.java [94-104]

 for (String name : names) {
     if (name.contains("/")) {
         Path resolved = fsDataPath.resolve(name).normalize();
-        if (resolved.startsWith(fsDataPath) == false) {
+        Path realPath = resolved.toRealPath();
+        if (realPath.startsWith(fsDataPath) == false) {
             throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
         }
-        IOUtils.fsync(resolved, false);
-        subdirectoriesToSync.add(resolved.getParent());
+        IOUtils.fsync(realPath, false);
+        subdirectoriesToSync.add(realPath.getParent());
     } else {
         indexFiles.add(parseFilePath(name));
     }
 }
Suggestion importance[1-10]: 9

__

Why: Critical security issue: symbolic links could bypass the path traversal check. Using toRealPath() resolves symlinks before validation, preventing unauthorized file access outside fsDataPath.

High
Possible issue
Fix race condition in pending deletes

Reassigning the pendingDeletes map from deleteFiles() can lose entries if concurrent
threads modify it between read and write. This creates a race condition where
pending deletes added during execution are lost. Use atomic operations or
synchronization to ensure thread-safe updates.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [351-353]

-private void tryDeletePendingFiles() throws IOException {
-    pendingDeletes = deleteFiles(pendingDeletes);
+private synchronized void tryDeletePendingFiles() throws IOException {
+    Map<String, Collection<String>> remaining = deleteFiles(pendingDeletes);
+    pendingDeletes.clear();
+    pendingDeletes.putAll(remaining);
 }
Suggestion importance[1-10]: 8

__

Why: Valid concern about race condition when reassigning pendingDeletes. The suggested synchronization approach prevents lost updates during concurrent modifications, addressing a potential data loss issue.

Medium
General
Remove redundant volatile modifier

Using volatile with ConcurrentHashMap is redundant and may cause confusion. The
ConcurrentHashMap already provides thread-safe operations. Remove volatile to avoid
unnecessary memory barriers and clarify thread-safety guarantees.

sandbox/plugins/composite-engine/src/main/java/org/opensearch/composite/CompositeIndexingExecutionEngine.java [82]

-private volatile Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
+private final Map<String, Collection<String>> pendingDeletes = new ConcurrentHashMap<>();
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that volatile is redundant with ConcurrentHashMap. However, the field is reassigned in tryDeletePendingFiles(), so final would be incorrect. The volatile should be removed, but the field should remain non-final.

Medium

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 67ac46c to 37dc0f4 Compare May 30, 2026 00:04
@Bukhtawar Bukhtawar changed the title Refresh consolidation v2 Fsync data files before commit for crash-consistent & delete orphan files after refresh on merge May 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 37dc0f4

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 37dc0f4 to a6db8b1 Compare May 30, 2026 00:09
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a6db8b1

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from a6db8b1 to d9c2923 Compare May 30, 2026 00:26
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit d9c2923

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from d9c2923 to 6bb3bad Compare May 30, 2026 00:32
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 6bb3bad

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 6bb3bad to 8ce8382 Compare May 30, 2026 00:37
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8ce8382

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 8ce8382 to 6925a18 Compare May 30, 2026 00:43
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 6925a18

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 6925a18 to 81c4c9e Compare May 30, 2026 00:48
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 81c4c9e

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8de53eb

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 8de53eb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 8de53eb to 3d132ad Compare May 30, 2026 06:56
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3d132ad

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 3d132ad to 950424d Compare May 30, 2026 07:34
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 950424d

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 950424d: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

❌ Patch coverage is 80.76923% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.38%. Comparing base (e0404f4) to head (4bec9c0).
⚠️ Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
...search/index/store/SubdirectoryAwareDirectory.java 80.76% 2 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21898      +/-   ##
============================================
+ Coverage     73.36%   73.38%   +0.01%     
- Complexity    75430    75492      +62     
============================================
  Files          6034     6034              
  Lines        342604   342629      +25     
  Branches      49279    49287       +8     
============================================
+ Hits         251357   251440      +83     
- Misses        71220    71233      +13     
+ Partials      20027    19956      -71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 950424d to 6706ddd Compare May 30, 2026 09:04
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 6706ddd

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch 3 times, most recently from 87f5e43 to e6e44a8 Compare May 30, 2026 09:52
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 87f5e43

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e6e44a8

@Bukhtawar Bukhtawar changed the title Fsync data files before commit for crash-consistent & delete orphan files after refresh on merge Fsync data files before commit for crash-consistency & delete orphan files after refresh on merge May 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for e6e44a8: SUCCESS

@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from e6e44a8 to 4436566 Compare May 30, 2026 14:47
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 4436566

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4436566: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4436566: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines +101 to +122
List<String> indexFiles = new ArrayList<>();
Set<Path> subdirectoriesToSync = new HashSet<>();
Path indexDir = shardPath.resolveIndex();
for (String name : names) {
Path resolved = fsDataPath.resolve(parseFilePath(name));
if (resolved.startsWith(indexDir)) {
indexFiles.add(resolved.toString());
} else {
Path normalized = resolved.normalize();
if (normalized.startsWith(fsDataPath) == false) {
throw new IOException("Path traversal detected: " + name + " resolves outside shard data path");
}
IOUtils.fsync(normalized, false);
subdirectoriesToSync.add(normalized.getParent());
}
}
if (indexFiles.isEmpty() == false) {
super.sync(indexFiles);
}
for (Path dir : subdirectoriesToSync) {
IOUtils.fsync(dir, true);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this separately? won't sync calling parseFilePath handle it to exactly sync the red files?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to handle mock filesystems. For subdirectory files, parseFilePath resolves to an absolute path outside the FSDirectory root. FSDirectory.fsync() then tries to resolve that against its own root. This fails on Lucene's mock test filesystem because the path crosses filesystem provider boundaries, hence IOUtils.fsync() on the resolved path

Comment on lines +120 to +122
for (Path dir : subdirectoriesToSync) {
IOUtils.fsync(dir, true);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This i think we should keep to fsync sub dirs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I have moved it into syncMetaData() so the responsibility is clear: sync() handles file data, syncMetaData() handles all directory entries

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4436566: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4436566: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

The sync call in LuceneCommitter was passing getFiles(true) which
includes segments_N. This file is already fsync'd by IndexWriter.commit()
and is not accessible through the store directory at the format-prefixed
path, causing NoSuchFileException on any flush after the initial commit.

Changed to getFiles(false) so only format-specific data files (parquet,
lucene segment data) are explicitly fsync'd — segments_N remains handled
by Lucene's own commit path.

Added CompositeRemoteStoreFsyncRecoveryIT with 4 test cases:
- Primary local recovery with fsync (committed files survive restart)
- Replica recovery from remote store + incremental translog replay
- Full cluster restart with primary local + replica remote recovery
- Merge-on-refresh merged files survive restart via fsync

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@Bukhtawar Bukhtawar force-pushed the refresh-consolidation-v2 branch from 4436566 to 4bec9c0 Compare May 31, 2026 08:30
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 4bec9c0

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 4bec9c0: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants