Share SubReadStream between System.IO.Compression and System.Formats.Tar#128750
Share SubReadStream between System.IO.Compression and System.Formats.Tar#128750rzikm wants to merge 1 commit into
Conversation
Both ZipArchive and Tar used their own internal SubReadStream classes that exposed a bounded view over a 'super' stream. Tar additionally had a SeekableSubReadStream subclass to support seeking. The two implementations have converged in functionality, so consolidate them into a single sealed class at src/libraries/Common/src/System/IO/SubReadStream.cs that supports both seekable and unseekable super streams. The unified class also keeps Tar's AdvanceToEnd/AdvanceToEndAsync APIs (no-ops for Zip since Zip never calls them) so TarReader can continue to discard unread data segments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-formats-tar |
🤖 Copilot Code Review — PR #128750Note This review was generated by GitHub Copilot. Holistic AssessmentMotivation: Well-justified. Two near-duplicate Approach: Sound. The merged class correctly handles both seekable and unseekable cases inline (matching Zip’s existing pattern), keeps Tar-specific Summary: Detailed Findings❌ Formatting — Incorrect indentation in TarHeader.Read.cs comment (line ~233)The replacement comment block has excessive indentation (20 spaces) compared to the surrounding method-level comments (8 spaces): - // - All other typeflag entries with a data section will generate a stream wrapping...
+ // - All other typeflag entries with a data section will generate a SubReadStream...
+ // When the archive stream is seekable, the SubReadStream is seekable too.
internal void ProcessDataBlock(Stream archiveStream, bool copyData)The two new comment lines should be at 8-space indentation ( ✅ Correctness — SubReadStream consolidation is faithfulVerified that:
✅ Correctness — TarHeader type-check simplification is correctThe old code: if (_dataStream is SeekableSubReadStream) { AdvanceStream... }
else if (_dataStream is SubReadStream) { skipBlockAlignmentPadding = false; }New code correctly collapses to checking 💡 Suggestion — Consider removing
|
There was a problem hiding this comment.
Pull request overview
This PR consolidates the duplicated bounded read-only “substream” implementations used by System.IO.Compression (Zip) and System.Formats.Tar (Tar) into a single shared SubReadStream under src/libraries/Common, and updates both libraries to consume that shared implementation.
Changes:
- Introduces a shared
System.IO.SubReadStreamimplementation undersrc/libraries/Common/and wires both Tar and Zip projects to compile it. - Removes the old per-library
SubReadStreamimplementations (including Tar’sSeekableSubReadStream) and updates Tar’s call sites to handle seekability viaarchiveStream.CanSeek. - Aligns resource strings needed by the shared implementation by adding
IO_NotSupported_Un{readable,seekable,writable}Streamentries toSystem.IO.Compression.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/Common/src/System/IO/SubReadStream.cs | Adds the shared bounded read-only stream implementation (used by both Tar and Zip). |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs | Removes Zip’s in-file SubReadStream implementation. |
| src/libraries/System.IO.Compression/src/System.IO.Compression.csproj | Compiles in the shared SubReadStream.cs from Common. |
| src/libraries/System.IO.Compression/src/Resources/Strings.resx | Adds IO_NotSupported_Un* SR strings used by the shared implementation. |
| src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarReader.cs | Updates type checks/comments to reference only SubReadStream. |
| src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs | Switches Tar data-stream creation to the shared SubReadStream and collapses seekable/unseekable handling. |
| src/libraries/System.Formats.Tar/src/System.Formats.Tar.csproj | Removes Tar-local stream files and compiles in the shared SubReadStream.cs. |
| src/libraries/System.Formats.Tar/src/System/Formats/Tar/SubReadStream.cs | Deleted (replaced by shared common implementation). |
| src/libraries/System.Formats.Tar/src/System/Formats/Tar/SeekableSubReadStream.cs | Deleted (replaced by seek-capable behavior within shared implementation). |
| byte[] buffer = ArrayPool<byte>.Shared.Rent((int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard)); | ||
| while (bytesToDiscard > 0) | ||
| { | ||
| int currentLengthToRead = (int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard); | ||
| _superStream.ReadExactly(buffer.AsSpan(0, currentLengthToRead)); | ||
| bytesToDiscard -= currentLengthToRead; | ||
| } | ||
| ArrayPool<byte>.Shared.Return(buffer); | ||
| } |
| byte[] buffer = ArrayPool<byte>.Shared.Rent((int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard)); | ||
| while (bytesToDiscard > 0) | ||
| { | ||
| int currentLengthToRead = (int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard); | ||
| await _superStream.ReadExactlyAsync(buffer, 0, currentLengthToRead, cancellationToken).ConfigureAwait(false); | ||
| bytesToDiscard -= currentLengthToRead; | ||
| } | ||
| ArrayPool<byte>.Shared.Return(buffer); | ||
| } |
| // When working with seekable streams, every time we return an entry, we avoid advancing the pointer beyond the data section | ||
| // This is so the user can read the data if desired. But if the data was not read by the user, we need to advance the pointer | ||
| // here until it's located at the beginning of the next entry header. |
| // When working with seekable streams, every time we return an entry, we avoid advancing the pointer beyond the data section | ||
| // This is so the user can read the data if desired. But if the data was not read by the user, we need to advance the pointer | ||
| // here until it's located at the beginning of the next entry header. |
| } | ||
| ArgumentOutOfRangeException.ThrowIfNegative(startPosition); | ||
| ArgumentOutOfRangeException.ThrowIfNegative(maxLength); | ||
| ArgumentOutOfRangeException.ThrowIfGreaterThan(startPosition, long.MaxValue - maxLength, nameof(startPosition)); |
There was a problem hiding this comment.
| ArgumentOutOfRangeException.ThrowIfGreaterThan(startPosition, long.MaxValue - maxLength, nameof(startPosition)); | |
| ArgumentOutOfRangeException.ThrowIfGreaterThan(startPosition, long.MaxValue - maxLength); |
Nit: It'll pick up the first argument's expression by default already
| public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) | ||
| { | ||
| if (cancellationToken.IsCancellationRequested) | ||
| { | ||
| return Task.FromCanceled<int>(cancellationToken); | ||
| } | ||
| ValidateBufferArguments(buffer, offset, count); | ||
| return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask(); | ||
| } | ||
|
|
||
| public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) | ||
| { | ||
| if (cancellationToken.IsCancellationRequested) | ||
| { | ||
| return ValueTask.FromCanceled<int>(cancellationToken); | ||
| } | ||
| ThrowIfDisposed(); | ||
| ThrowIfCantRead(); | ||
| ThrowIfBeyondEndOfStream(); | ||
| return ReadAsyncCore(buffer, cancellationToken); | ||
| } |
There was a problem hiding this comment.
| public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) | |
| { | |
| if (cancellationToken.IsCancellationRequested) | |
| { | |
| return Task.FromCanceled<int>(cancellationToken); | |
| } | |
| ValidateBufferArguments(buffer, offset, count); | |
| return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask(); | |
| } | |
| public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) | |
| { | |
| if (cancellationToken.IsCancellationRequested) | |
| { | |
| return ValueTask.FromCanceled<int>(cancellationToken); | |
| } | |
| ThrowIfDisposed(); | |
| ThrowIfCantRead(); | |
| ThrowIfBeyondEndOfStream(); | |
| return ReadAsyncCore(buffer, cancellationToken); | |
| } | |
| public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) | |
| { | |
| ValidateBufferArguments(buffer, offset, count); | |
| return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask(); | |
| } | |
| public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) | |
| { | |
| ThrowIfDisposed(); | |
| ThrowIfCantRead(); | |
| ThrowIfBeyondEndOfStream(); | |
| return ReadAsyncCore(buffer, cancellationToken); | |
| } |
| _ => throw new ArgumentOutOfRangeException(nameof(origin)), | ||
| }; | ||
|
|
||
| if (newPositionInSuperStream < _startInSuperStream) |
There was a problem hiding this comment.
Should we also check if we went past the end?
| // - All other typeflag entries with a data section will generate a SubReadStream wrapping the data section. | ||
| // When the archive stream is seekable, the SubReadStream is seekable too. |
There was a problem hiding this comment.
| // - All other typeflag entries with a data section will generate a SubReadStream wrapping the data section. | |
| // When the archive stream is seekable, the SubReadStream is seekable too. | |
| // - All other typeflag entries with a data section will generate a SubReadStream wrapping the data section. | |
| // When the archive stream is seekable, the SubReadStream is seekable too. |
| <Compile Include="$(SharedOpenSourcePath)System\IO\Compression\ZipBlocks.FieldLocations.cs" /> | ||
| <Compile Include="$(SharedOpenSourcePath)System\IO\Compression\ZipCustomStreams.cs" /> | ||
| <Compile Include="$(CommonPath)System\IO\SubReadStream.cs" Link="Common\System\IO\SubReadStream.cs" /> | ||
| <Compile Include="$(SharedOpenSourcePath)System\IO\Compression\ZipHelper.cs" /> |
There was a problem hiding this comment.
Unrelated, but do you know where this path is coming from?
Search isn't being helpful
Or is it just empty and we end up with the correct path anyway? 😄
Note
This PR was authored with assistance from GitHub Copilot.
Both
System.IO.Compression(ZipArchive) andSystem.Formats.Tarcarried near-duplicateSubReadStreamimplementations that expose a bounded read-only window over a "super" stream. Tar additionally had aSeekableSubReadStreamsubclass to handle the seekable case. The two implementations had largely converged in functionality, so this consolidates them into a single shared class.Approach
src/libraries/Common/src/System/IO/SubReadStream.csas a single sealed class that supports both seekable and unseekable super streams (the seekable/unseekable split that Tar previously did via subclassing is now handled inline based onsuperStream.CanSeek, matching the pattern Zip already used).AdvanceToEnd/AdvanceToEndAsync(plus the associated_hasReachedEndguard) on the shared class. Zip never calls them, so they are inert for the Zip consumer.SubReadStream.csandSeekableSubReadStream.csfromSystem.Formats.Tar, and remove theSubReadStreamclass fromZipCustomStreams.cs. Both csprojs now<Compile Include="$(CommonPath)System\IO\SubReadStream.cs" ... />.is SubReadStream/is SeekableSubReadStreamtype checks inTarHeader.Read.csandTarReader.csinto a singleis SubReadStreamcheck plus an innerarchiveStream.CanSeekbranch.IO_NotSupported_Un{readable,seekable,writable}StreamSR entries toSystem.IO.Compression's resx (Tar already had equivalents).Net diff: 9 files, +323 / -545.
Behavior notes
A few small, deliberate behavior changes fall out of the merge:
ArgumentOutOfRangeExceptionfor negative arguments / overflow instead of the domain-specificInvalidDataException(SR.TarInvalidNumber/SR.LocalFileHeaderCorrupt). Both call sites construct from validated header fields so this is not user-visible in normal use.ObjectDisposedExceptionthrown from disposed-stream checks now uses the standardObjectDisposedException.ThrowIf(_isDisposed, this)form instead of a custom message.NotSupportedExceptionmessages on seek/write/etc. now use the standardSR.IO_NotSupported_*strings instead of the old Zip-specificSR.SeekingNotSupported/SR.WritingNotSupported.Flush()on the new class is a no-op (matches the typical convention for read-only streams) instead of throwing.Validation
Both library test suites pass on Windows:
System.Formats.Tar.Tests: 6569 / 6569System.IO.Compression.Tests: 1962 / 1962