Skip to content

Share SubReadStream between System.IO.Compression and System.Formats.Tar#128750

Open
rzikm wants to merge 1 commit into
mainfrom
rzikm/share-subreadstream
Open

Share SubReadStream between System.IO.Compression and System.Formats.Tar#128750
rzikm wants to merge 1 commit into
mainfrom
rzikm/share-subreadstream

Conversation

@rzikm
Copy link
Copy Markdown
Member

@rzikm rzikm commented May 29, 2026

Note

This PR was authored with assistance from GitHub Copilot.

Both System.IO.Compression (ZipArchive) and System.Formats.Tar carried near-duplicate SubReadStream implementations that expose a bounded read-only window over a "super" stream. Tar additionally had a SeekableSubReadStream subclass to handle the seekable case. The two implementations had largely converged in functionality, so this consolidates them into a single shared class.

Approach

  • Add src/libraries/Common/src/System/IO/SubReadStream.cs as a single sealed class that supports both seekable and unseekable super streams (the seekable/unseekable split that Tar previously did via subclassing is now handled inline based on superStream.CanSeek, matching the pattern Zip already used).
  • Keep Tar's AdvanceToEnd / AdvanceToEndAsync (plus the associated _hasReachedEnd guard) on the shared class. Zip never calls them, so they are inert for the Zip consumer.
  • Delete SubReadStream.cs and SeekableSubReadStream.cs from System.Formats.Tar, and remove the SubReadStream class from ZipCustomStreams.cs. Both csprojs now <Compile Include="$(CommonPath)System\IO\SubReadStream.cs" ... />.
  • Collapse the dual is SubReadStream / is SeekableSubReadStream type checks in TarHeader.Read.cs and TarReader.cs into a single is SubReadStream check plus an inner archiveStream.CanSeek branch.
  • Add IO_NotSupported_Un{readable,seekable,writable}Stream SR entries to System.IO.Compression's resx (Tar already had equivalents).

Net diff: 9 files, +323 / -545.

Behavior notes

A few small, deliberate behavior changes fall out of the merge:

  • Constructor now throws ArgumentOutOfRangeException for negative arguments / overflow instead of the domain-specific InvalidDataException (SR.TarInvalidNumber / SR.LocalFileHeaderCorrupt). Both call sites construct from validated header fields so this is not user-visible in normal use.
  • ObjectDisposedException thrown from disposed-stream checks now uses the standard ObjectDisposedException.ThrowIf(_isDisposed, this) form instead of a custom message.
  • Zip's NotSupportedException messages on seek/write/etc. now use the standard SR.IO_NotSupported_* strings instead of the old Zip-specific SR.SeekingNotSupported / SR.WritingNotSupported.
  • Flush() on the new class is a no-op (matches the typical convention for read-only streams) instead of throwing.

Validation

Both library test suites pass on Windows:

  • System.Formats.Tar.Tests: 6569 / 6569
  • System.IO.Compression.Tests: 1962 / 1962

Both ZipArchive and Tar used their own internal SubReadStream classes that exposed a bounded view over a 'super' stream. Tar additionally had a SeekableSubReadStream subclass to support seeking. The two implementations have converged in functionality, so consolidate them into a single sealed class at src/libraries/Common/src/System/IO/SubReadStream.cs that supports both seekable and unseekable super streams.

The unified class also keeps Tar's AdvanceToEnd/AdvanceToEndAsync APIs (no-ops for Zip since Zip never calls them) so TarReader can continue to discard unread data segments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 29, 2026 10:28
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-formats-tar
See info in area-owners.md if you want to be subscribed.

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Copilot Code Review — PR #128750

Note

This review was generated by GitHub Copilot.

Holistic Assessment

Motivation: Well-justified. Two near-duplicate SubReadStream implementations existed across System.IO.Compression and System.Formats.Tar. Consolidating them reduces maintenance cost with no new public API surface.

Approach: Sound. The merged class correctly handles both seekable and unseekable cases inline (matching Zip’s existing pattern), keeps Tar-specific AdvanceToEnd methods (inert for Zip), and the type-check simplifications in TarHeader/TarReader are correct.

Summary: ⚠️ Needs Changes. There is one formatting error that should be fixed before merge, and one minor behavioral question worth confirming.


Detailed Findings

❌ Formatting — Incorrect indentation in TarHeader.Read.cs comment (line ~233)

The replacement comment block has excessive indentation (20 spaces) compared to the surrounding method-level comments (8 spaces):

-        // - All other typeflag entries with a data section will generate a stream wrapping...
+                    // - All other typeflag entries with a data section will generate a SubReadStream...
+                    //   When the archive stream is seekable, the SubReadStream is seekable too.
         internal void ProcessDataBlock(Stream archiveStream, bool copyData)

The two new comment lines should be at 8-space indentation ( //) to match the three sibling comment lines directly above them.

✅ Correctness — SubReadStream consolidation is faithful

Verified that:

  • LimitByRemaining correctly uses Math.Max(0, ...) to guard against negative Remaining when seeked past end.
  • Seekable path repositions the super stream before reads (matching both old implementations).
  • AdvanceSuperStream handles seekable (position increment) and unseekable (ReadExactly loop) correctly.
  • Dispose does not close the super stream (matching both originals).
  • CanRead/CanSeek return false after dispose (matching originals).

✅ Correctness — TarHeader type-check simplification is correct

The old code:

if (_dataStream is SeekableSubReadStream) { AdvanceStream... }
else if (_dataStream is SubReadStream) { skipBlockAlignmentPadding = false; }

New code correctly collapses to checking archiveStream.CanSeek inside a single is SubReadStream branch, since the new SubReadStream covers both cases.

💡 Suggestion — Consider removing ThrowIfBeyondEndOfStream check from seekable Read path

The new shared Read calls ThrowIfBeyondEndOfStream() which throws EndOfStreamException if _hasReachedEnd is set. The old SeekableSubReadStream.Read did not check this flag. After AdvanceToEnd() is called on a seekable stream, _hasReachedEnd = true, which would make subsequent reads throw. In practice this is likely fine (no caller reads after AdvanceToEnd), but it is a subtle behavioral difference from the old Tar seekable path. If you’ve confirmed no test relies on reading after AdvanceToEnd on seekable streams, this is fine as-is.

✅ Behavioral changes are documented and acceptable

The PR description clearly documents the small behavioral differences (exception types, message strings, Flush no-op). These are all in internal-only code paths with validated inputs, so they’re non-breaking.

✅ Resource strings added correctly

The three IO_NotSupported_* entries added to System.IO.Compression’s Strings.resx match the existing entries in System.Formats.Tar’s resx.

Generated by Code Review for issue #128750 · ● 3.1M ·

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the duplicated bounded read-only “substream” implementations used by System.IO.Compression (Zip) and System.Formats.Tar (Tar) into a single shared SubReadStream under src/libraries/Common, and updates both libraries to consume that shared implementation.

Changes:

  • Introduces a shared System.IO.SubReadStream implementation under src/libraries/Common/ and wires both Tar and Zip projects to compile it.
  • Removes the old per-library SubReadStream implementations (including Tar’s SeekableSubReadStream) and updates Tar’s call sites to handle seekability via archiveStream.CanSeek.
  • Aligns resource strings needed by the shared implementation by adding IO_NotSupported_Un{readable,seekable,writable}Stream entries to System.IO.Compression.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/libraries/Common/src/System/IO/SubReadStream.cs Adds the shared bounded read-only stream implementation (used by both Tar and Zip).
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs Removes Zip’s in-file SubReadStream implementation.
src/libraries/System.IO.Compression/src/System.IO.Compression.csproj Compiles in the shared SubReadStream.cs from Common.
src/libraries/System.IO.Compression/src/Resources/Strings.resx Adds IO_NotSupported_Un* SR strings used by the shared implementation.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarReader.cs Updates type checks/comments to reference only SubReadStream.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs Switches Tar data-stream creation to the shared SubReadStream and collapses seekable/unseekable handling.
src/libraries/System.Formats.Tar/src/System.Formats.Tar.csproj Removes Tar-local stream files and compiles in the shared SubReadStream.cs.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/SubReadStream.cs Deleted (replaced by shared common implementation).
src/libraries/System.Formats.Tar/src/System/Formats/Tar/SeekableSubReadStream.cs Deleted (replaced by seek-capable behavior within shared implementation).

Comment on lines +113 to +121
byte[] buffer = ArrayPool<byte>.Shared.Rent((int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard));
while (bytesToDiscard > 0)
{
int currentLengthToRead = (int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard);
_superStream.ReadExactly(buffer.AsSpan(0, currentLengthToRead));
bytesToDiscard -= currentLengthToRead;
}
ArrayPool<byte>.Shared.Return(buffer);
}
Comment on lines +134 to +142
byte[] buffer = ArrayPool<byte>.Shared.Rent((int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard));
while (bytesToDiscard > 0)
{
int currentLengthToRead = (int)Math.Min(MaxAdvanceBufferLength, bytesToDiscard);
await _superStream.ReadExactlyAsync(buffer, 0, currentLengthToRead, cancellationToken).ConfigureAwait(false);
bytesToDiscard -= currentLengthToRead;
}
ArrayPool<byte>.Shared.Return(buffer);
}
Comment on lines 214 to 216
// When working with seekable streams, every time we return an entry, we avoid advancing the pointer beyond the data section
// This is so the user can read the data if desired. But if the data was not read by the user, we need to advance the pointer
// here until it's located at the beginning of the next entry header.
Comment on lines 247 to 249
// When working with seekable streams, every time we return an entry, we avoid advancing the pointer beyond the data section
// This is so the user can read the data if desired. But if the data was not read by the user, we need to advance the pointer
// here until it's located at the beginning of the next entry header.
}
ArgumentOutOfRangeException.ThrowIfNegative(startPosition);
ArgumentOutOfRangeException.ThrowIfNegative(maxLength);
ArgumentOutOfRangeException.ThrowIfGreaterThan(startPosition, long.MaxValue - maxLength, nameof(startPosition));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ArgumentOutOfRangeException.ThrowIfGreaterThan(startPosition, long.MaxValue - maxLength, nameof(startPosition));
ArgumentOutOfRangeException.ThrowIfGreaterThan(startPosition, long.MaxValue - maxLength);

Nit: It'll pick up the first argument's expression by default already

Comment on lines +197 to +217
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested)
{
return Task.FromCanceled<int>(cancellationToken);
}
ValidateBufferArguments(buffer, offset, count);
return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask();
}

public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default)
{
if (cancellationToken.IsCancellationRequested)
{
return ValueTask.FromCanceled<int>(cancellationToken);
}
ThrowIfDisposed();
ThrowIfCantRead();
ThrowIfBeyondEndOfStream();
return ReadAsyncCore(buffer, cancellationToken);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
if (cancellationToken.IsCancellationRequested)
{
return Task.FromCanceled<int>(cancellationToken);
}
ValidateBufferArguments(buffer, offset, count);
return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask();
}
public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default)
{
if (cancellationToken.IsCancellationRequested)
{
return ValueTask.FromCanceled<int>(cancellationToken);
}
ThrowIfDisposed();
ThrowIfCantRead();
ThrowIfBeyondEndOfStream();
return ReadAsyncCore(buffer, cancellationToken);
}
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
ValidateBufferArguments(buffer, offset, count);
return ReadAsync(new Memory<byte>(buffer, offset, count), cancellationToken).AsTask();
}
public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default)
{
ThrowIfDisposed();
ThrowIfCantRead();
ThrowIfBeyondEndOfStream();
return ReadAsyncCore(buffer, cancellationToken);
}

_ => throw new ArgumentOutOfRangeException(nameof(origin)),
};

if (newPositionInSuperStream < _startInSuperStream)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check if we went past the end?

Comment on lines +233 to +234
// - All other typeflag entries with a data section will generate a SubReadStream wrapping the data section.
// When the archive stream is seekable, the SubReadStream is seekable too.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// - All other typeflag entries with a data section will generate a SubReadStream wrapping the data section.
// When the archive stream is seekable, the SubReadStream is seekable too.
// - All other typeflag entries with a data section will generate a SubReadStream wrapping the data section.
// When the archive stream is seekable, the SubReadStream is seekable too.

<Compile Include="$(SharedOpenSourcePath)System\IO\Compression\ZipBlocks.FieldLocations.cs" />
<Compile Include="$(SharedOpenSourcePath)System\IO\Compression\ZipCustomStreams.cs" />
<Compile Include="$(CommonPath)System\IO\SubReadStream.cs" Link="Common\System\IO\SubReadStream.cs" />
<Compile Include="$(SharedOpenSourcePath)System\IO\Compression\ZipHelper.cs" />
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but do you know where this path is coming from?
Search isn't being helpful

Or is it just empty and we end up with the correct path anyway? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants