CASSANDRA-21147 - Direct IO support for Cursor Compaction#4606
CASSANDRA-21147 - Direct IO support for Cursor Compaction#4606samueldlightfoot wants to merge 2 commits intoapache:trunkfrom
Conversation
fe9e9e6 to
245ac52
Compare
b6f0cb2 to
ef6f3f0
Compare
This change wires DiskAccessMode through cursor-based compaction, enabling direct I/O reads for cursor compaction to match the existing support in the iterator-based compaction path. Key changes: - Thread DiskAccessMode from DatabaseDescriptor.getCompactionReadDiskAccessMode() through CursorCompactor, StatefulCursor, and SSTableCursorReader - Consolidate SSTableReader's openDataReader/openDataReaderForScan variants into a unified openDataReaderInternal with canReuseDfile guard - Fix DirectThreadLocalReadAheadBuffer.cleanBuffer() to clean the backing buffer rather than the aligned slice - Ensure SSTable scanners are closed before opening cursor readers (resolving readahead buffer interference and excessive fd usage)
ef6f3f0 to
0d16207
Compare
aweisberg
left a comment
There was a problem hiding this comment.
TY, looks pretty good. Do we have testing of what happens when the configured mode is direct IO but direct IO isn't supported? What I am wondering is if there are any crossed wires where the access mode on the file doesn't match the actual way it was opened.
I don't think this is the case just because it's the dfile that is used.
The other important thing is to look at the tests that were added in the cursor based compaction patch and then parameterize those with direct IO so it is exercised.
Really the tests you parameterized should maybe also be parameterized on cursor based compaction as well? I am assuming those tests are relatively fast executing so it should be fine?
| } | ||
|
|
||
| @Override | ||
| protected void cleanBuffer(ByteBuffer buffer) |
There was a problem hiding this comment.
Could this check be done in MemoryUtil.clean or should MemoryUtil.clean have some kind of assertion added for cases where the buffer is a slice? I imagine if we try to clean a slice we get some error on the native side, but maybe it would be cleaner to error out in Java with a nice stack trace.
There was a problem hiding this comment.
Strongly agree it should throw - it would have detected the potential leak. Changes pushed.
| * descriptors and prevents conflicts when scan and non-scan readers for the same file share thread-local | ||
| * buffer state on the same thread. | ||
| */ | ||
| private static StatefulCursor[] convertScannersToCursors(List<ISSTableScanner> scanners, ImmutableSet<SSTableReader> sstables, |
There was a problem hiding this comment.
This has some smell to it. Should the key in ThreadLocalReadAheadBuffer include the disk access mode? Should these scanners have been opened with the correct disk access mode in the first place or is it still an issue because we then need to re-open with the Cursor based approach?
There was a problem hiding this comment.
Agree there's a smell - the cursor compaction code has been tailored to fit the existing compaction interface and accept scanners and pull just metadata from them, whereas raw SSTables would have suited better. The ideal refactor I see is to let each separate compaction pipeline open the readers it requires, but this may be better done in a follow-up.
The disk access mode is not actually the issue here, it is the fact two readers (one scan and one non-scan) exist for an SSTable at the same time on the same thread. On trunk this is a bug causing cursor compaction to use scan reads (with a read-ahead buffer) rather than random access reads (intended), due to ScanCompressedReader's allocated() check being based on shared static state, looking like the below
@Override
public boolean allocated()
{
// this checks the static block map, which is inherently shared between the two
// readers (scan + non-scan)
return readAheadBuffer.hasBuffer();
}
Closing the scanner before opening the cursor reader deallocates the block map and thus allocated() returns false when opening the cursor readers, leading to the correct random access reader 'reader' being chosen:
// How readChunk picks the reader (scan vs random)
CompressedReader readFrom = (scanReader != null && scanReader.allocated()) ? scanReader : reader;
This is perhaps slightly brittle from CompressedChunkReader's allocated() perspective, but the precondition that a given file is not opened by two readers from the same thread concurrently does not seem unreasonable (like it is currently in cursor compaction). I did look into guarding instantiation for the same file but it caused a large number of test failures (I cannot remember the full details).
| { | ||
| return diskAccessMode == null | ||
| || diskAccessMode == dfile.diskAccessMode() | ||
| || (diskAccessMode == DiskAccessMode.direct && !dfile.supportsDirectIO()); |
There was a problem hiding this comment.
Do we at any point log that we weren't able to open the file with direct IO as requested? Might be a use case for a rate limited logger.
There was a problem hiding this comment.
We fail on start-up if any of the data file directories do not support direct I/O.
The only case FileHandle::supportsDirectIO would return false is when an SSTable is uncompressed as we have only implemented support for the compressed case, which may be deemed as noise if we log for. Curious on your thoughts.
| return sstableMetadata; | ||
| } | ||
|
|
||
| public RandomAccessReader openDataReader() |
There was a problem hiding this comment.
Should there really be a no access mode arg versions here? Shouldn't everyone know what access mode they want or explicitly supply they don't care?
Not sure if I am right here and the verbosity is worth it. The concern is people call the no-arg version when they should have passed configuration, but at some point it's not worth it.
There was a problem hiding this comment.
This one may not be worth it. We'd have to make sure the original dFile is available to callers so they can pass dFile.diskAccessMode() in (i.e. the default).
There are currently >10 usages of openDataReader/openDataReaderForScan that we'd have to pass it through for, and I can see a couple at a glance that do not have access to the FileHandle for the data file (dFile).
We could improve the docs to indicate the files default disk access mode will be used for these overloads.
|
|
||
| if (isSameDiskAccessMode || isDirectIONotSupported) | ||
| return dfile.createReaderForScan(OnReaderClose.RETAIN_FILE_OPEN); | ||
| private RandomAccessReader openDataReaderInternal(@Nullable DiskAccessMode diskAccessMode, |
There was a problem hiding this comment.
The logic in this method (and similar from the introduction of DIO for scans) needs test coverage to make sure it can re-use when it is supposed to re-use, and fall back to creating a new dfile, and then closes the dfile correctly.
I haven't checked that these tests exist yet just pointing it out.
7ce6eda to
4ffc653
Compare
Summary:
Wire compaction_read_disk_access_mode through cursor compaction to enable direct IO reads, matching the existing support in the iterator-based compaction path.
Changes:
patch by Sam Lightfoot reviewed by for CASSANDRA-21147