Skip to content

Do not read a JAR completely into a byte-array before computing the SHA. #469

@jvdvegt

Description

@jvdvegt

New feature, improvement proposal

In our CI/CD pipeline (Linux, Java 21), we encountered this error:

Caused by: java.lang.OutOfMemoryError: Cannot reserve 1490778611 bytes of direct buffer memory (allocated: 1768309554, limit: 3221225472)
    at java.nio.Bits.reserveMemory (Bits.java:178)
    at java.nio.DirectByteBuffer.<init> (DirectByteBuffer.java:127)
    at java.nio.ByteBuffer.allocateDirect (ByteBuffer.java:360)
    at sun.nio.ch.Util.getTemporaryDirectBuffer (Util.java:242)
    at sun.nio.ch.IOUtil.read (IOUtil.java:303)
    at sun.nio.ch.IOUtil.read (IOUtil.java:283)
    at sun.nio.ch.FileChannelImpl.read (FileChannelImpl.java:234)
    at sun.nio.ch.ChannelInputStream.read (ChannelInputStream.java:74)
    at sun.nio.ch.ChannelInputStream.read (ChannelInputStream.java:103)
    at java.nio.file.Files.read (Files.java:3224)
    at java.nio.file.Files.readAllBytes (Files.java:3275)
    at org.apache.maven.buildcache.hash.SHA$Algorithm.hash (SHA.java:70)
    at org.apache.maven.buildcache.hash.HashAlgorithm.hash (HashAlgorithm.java:36)
    at org.apache.maven.buildcache.CacheControllerImpl.artifactDto (CacheControllerImpl.java:640)
    at org.apache.maven.buildcache.CacheControllerImpl.artifactDtos (CacheControllerImpl.java:628)
    at org.apache.maven.buildcache.CacheControllerImpl.save (CacheControllerImpl.java:508)
    at org.apache.maven.buildcache.BuildCacheMojosExecutionStrategy.execute (BuildCacheMojosExecutionStrategy.java:160)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
...

We're using the build cache for a 1.7GB artifact. The build cache code currently reads all bytes of an artifact into a byte array, and computes the SHA-hash in one go:

.

For some hashes, there is a 'zero allocation hash' implementation:

return HexUtils.toByteArray(hash.hashBytes(buffer.getBuffer()));
. Unfortunately, there is no such version for SHA (all available hashes are defined here).

Changing the SHA-implementation to use memory-mapping seems pretty straight forward, but changing the current behaviour might not be desirable. If I were to add a memory-mapped SHA implementation, would it be okay to add 4 SHA-entries to the HashFactory with _MM added to their name?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions