CASSANDRA-21147 - Direct IO support for Cursor Compaction by samueldlightfoot · Pull Request #4606 · apache/cassandra

samueldlightfoot · 2026-02-08T10:30:06Z

Summary:

Wire compaction_read_disk_access_mode through cursor compaction to enable direct IO reads, matching the existing support in the iterator-based compaction path.

Changes:

Thread DiskAccessMode from DatabaseDescriptor.getCompactionReadDiskAccessMode() through CursorCompactor.createCursors() → StatefulCursor → SSTableCursorReader → SSTableReader.openDataReader(DiskAccessMode)
Extract CursorCompactor.createCursors() with proper cleanup on partial construction failure via Throwables.closeNonNullAndAddSuppressed
Fix DirectThreadLocalReadAheadBuffer.cleanBuffer() to clean the backing buffer (attachment) rather than the aligned slice, preventing a native memory leak on close
Ensure SSTable scanners are closed prior to opening cursor readers (resolving readahead buffer interference and excessive fd usage)

patch by Sam Lightfoot reviewed by for CASSANDRA-21147

This change wires DiskAccessMode through cursor-based compaction, enabling direct I/O reads for cursor compaction to match the existing support in the iterator-based compaction path. Key changes: - Thread DiskAccessMode from DatabaseDescriptor.getCompactionReadDiskAccessMode() through CursorCompactor, StatefulCursor, and SSTableCursorReader - Consolidate SSTableReader's openDataReader/openDataReaderForScan variants into a unified openDataReaderInternal with canReuseDfile guard - Fix DirectThreadLocalReadAheadBuffer.cleanBuffer() to clean the backing buffer rather than the aligned slice - Ensure SSTable scanners are closed before opening cursor readers (resolving readahead buffer interference and excessive fd usage)

aweisberg

TY, looks pretty good. Do we have testing of what happens when the configured mode is direct IO but direct IO isn't supported? What I am wondering is if there are any crossed wires where the access mode on the file doesn't match the actual way it was opened.

I don't think this is the case just because it's the dfile that is used.

The other important thing is to look at the tests that were added in the cursor based compaction patch and then parameterize those with direct IO so it is exercised.

Really the tests you parameterized should maybe also be parameterized on cursor based compaction as well? I am assuming those tests are relatively fast executing so it should be fine?

aweisberg · 2026-02-16T19:21:14Z

src/java/org/apache/cassandra/io/util/DirectThreadLocalReadAheadBuffer.java

    }
+
+    @Override
+    protected void cleanBuffer(ByteBuffer buffer)


Could this check be done in MemoryUtil.clean or should MemoryUtil.clean have some kind of assertion added for cases where the buffer is a slice? I imagine if we try to clean a slice we get some error on the native side, but maybe it would be cleaner to error out in Java with a nice stack trace.

Strongly agree it should throw - it would have detected the potential leak. Changes pushed.

aweisberg · 2026-02-16T19:28:00Z

src/java/org/apache/cassandra/db/compaction/CursorCompactor.java

+     * descriptors and prevents conflicts when scan and non-scan readers for the same file share thread-local
+     * buffer state on the same thread.
+     */
+    private static StatefulCursor[] convertScannersToCursors(List<ISSTableScanner> scanners, ImmutableSet<SSTableReader> sstables,


This has some smell to it. Should the key in ThreadLocalReadAheadBuffer include the disk access mode? Should these scanners have been opened with the correct disk access mode in the first place or is it still an issue because we then need to re-open with the Cursor based approach?

Agree there's a smell - the cursor compaction code has been tailored to fit the existing compaction interface and accept scanners and pull just metadata from them, whereas raw SSTables would have suited better. The ideal refactor I see is to let each separate compaction pipeline open the readers it requires, but this may be better done in a follow-up.

The disk access mode is not actually the issue here, it is the fact two readers (one scan and one non-scan) exist for an SSTable at the same time on the same thread. On trunk this is a bug causing cursor compaction to use scan reads (with a read-ahead buffer) rather than random access reads (intended), due to ScanCompressedReader's allocated() check being based on shared static state, looking like the below

@Override public boolean allocated() { // this checks the static block map, which is inherently shared between the two // readers (scan + non-scan) return readAheadBuffer.hasBuffer(); }

Closing the scanner before opening the cursor reader deallocates the block map and thus allocated() returns false when opening the cursor readers, leading to the correct random access reader 'reader' being chosen:

// How readChunk picks the reader (scan vs random) CompressedReader readFrom = (scanReader != null && scanReader.allocated()) ? scanReader : reader;

This is perhaps slightly brittle from CompressedChunkReader's allocated() perspective, but the precondition that a given file is not opened by two readers from the same thread concurrently does not seem unreasonable (like it is currently in cursor compaction). I did look into guarding instantiation for the same file but it caused a large number of test failures (I cannot remember the full details).

aweisberg · 2026-02-16T19:30:04Z

src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java

+    {
+        return diskAccessMode == null
+               || diskAccessMode == dfile.diskAccessMode()
+               || (diskAccessMode == DiskAccessMode.direct && !dfile.supportsDirectIO());


Do we at any point log that we weren't able to open the file with direct IO as requested? Might be a use case for a rate limited logger.

We fail on start-up if any of the data file directories do not support direct I/O.

The only case FileHandle::supportsDirectIO would return false is when an SSTable is uncompressed as we have only implemented support for the compressed case, which may be deemed as noise if we log for. Curious on your thoughts.

aweisberg · 2026-02-16T19:40:27Z

src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java

        return sstableMetadata;
    }

+    public RandomAccessReader openDataReader()


Should there really be a no access mode arg versions here? Shouldn't everyone know what access mode they want or explicitly supply they don't care?

Not sure if I am right here and the verbosity is worth it. The concern is people call the no-arg version when they should have passed configuration, but at some point it's not worth it.

This one may not be worth it. We'd have to make sure the original dFile is available to callers so they can pass dFile.diskAccessMode() in (i.e. the default).

There are currently >10 usages of openDataReader/openDataReaderForScan that we'd have to pass it through for, and I can see a couple at a glance that do not have access to the FileHandle for the data file (dFile).

We could improve the docs to indicate the files default disk access mode will be used for these overloads.

aweisberg · 2026-02-16T19:42:35Z

src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java


-        if (isSameDiskAccessMode || isDirectIONotSupported)
-            return dfile.createReaderForScan(OnReaderClose.RETAIN_FILE_OPEN);
+    private RandomAccessReader openDataReaderInternal(@Nullable DiskAccessMode diskAccessMode,


The logic in this method (and similar from the introduction of DIO for scans) needs test coverage to make sure it can re-use when it is supposed to re-use, and fall back to creating a new dfile, and then closes the dfile correctly.

I haven't checked that these tests exist yet just pointing it out.

samueldlightfoot force-pushed the CASSANDRA-21147-cursor-direct branch 3 times, most recently from fe9e9e6 to 245ac52 Compare February 8, 2026 11:37

samueldlightfoot marked this pull request as draft February 9, 2026 18:09

samueldlightfoot force-pushed the CASSANDRA-21147-cursor-direct branch 3 times, most recently from b6f0cb2 to ef6f3f0 Compare February 10, 2026 10:58

samueldlightfoot marked this pull request as ready for review February 10, 2026 10:59

samueldlightfoot force-pushed the CASSANDRA-21147-cursor-direct branch from ef6f3f0 to 0d16207 Compare February 10, 2026 13:43

aweisberg self-requested a review February 16, 2026 16:44

aweisberg requested changes Feb 16, 2026

View reviewed changes

MemoryUtil::clean should throw if the attachment is not null

4ffc653

samueldlightfoot force-pushed the CASSANDRA-21147-cursor-direct branch from 7ce6eda to 4ffc653 Compare February 17, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-21147 - Direct IO support for Cursor Compaction#4606

CASSANDRA-21147 - Direct IO support for Cursor Compaction#4606
samueldlightfoot wants to merge 2 commits intoapache:trunkfrom
samueldlightfoot:CASSANDRA-21147-cursor-direct

samueldlightfoot commented Feb 8, 2026 •

edited

Loading

Uh oh!

aweisberg left a comment

Uh oh!

aweisberg Feb 16, 2026

Uh oh!

samueldlightfoot Feb 17, 2026

Uh oh!

aweisberg Feb 16, 2026

Uh oh!

samueldlightfoot Feb 17, 2026 •

edited

Loading

Uh oh!

aweisberg Feb 16, 2026

Uh oh!

samueldlightfoot Feb 17, 2026

Uh oh!

aweisberg Feb 16, 2026

Uh oh!

samueldlightfoot Feb 17, 2026

Uh oh!

aweisberg Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samueldlightfoot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aweisberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samueldlightfoot Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samueldlightfoot commented Feb 8, 2026 •

edited

Loading

samueldlightfoot Feb 17, 2026 •

edited

Loading