CASSANDRA-21178 add created_at column to system_distributed.compression_dictionaries by smiklosovic · Pull Request #4622 · apache/cassandra

smiklosovic · 2026-02-17T10:38:51Z

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

smiklosovic · 2026-02-17T10:43:23Z

src/java/org/apache/cassandra/db/compression/CompressionDictionaryDetailsTabularData.java

                throw new IllegalArgumentException("Provided dictionary id must be positive but it is '" + dictId + "'.");
            if (dict == null || dict.length == 0)
                throw new IllegalArgumentException("Provided dictionary byte array is null or empty.");
-            if (dict.length > FileUtils.ONE_MIB)


I removed this here because when I was testing import / export with created_at, I realized that we can not import dictionary bigger than 1MiB BUT WE CAN TRAIN IT.

So we train > 1MiB but we can not import after export.

It is possible to override the configuration via nodetool or cql, there we do not check max size, we check that only on import ...

I can revert this change and treat is more robustly in a completely different ticket, hardening sizes on all levels (cql, nodetool ...), can go in even after 6.0-alpha1. If I remove it here, we will at least not see the discrepancy I described above.

Thanks for the background! It helps to understand.

Dictionaries are attached to every SSTable. The size limit is added with this context. The size of the dictionaries are typically 64~100 KiB. That said, the underlying zstd trainer do allow train large dictionaries. The questions, do we want to train dictionaries larger than 1 MiB? The added dictionary size might outweighs the compression gains (64 KiB vs. 1 MiB dictionaries)

smiklosovic commented Feb 17, 2026

View reviewed changes

smiklosovic requested a review from yifan-c February 17, 2026 10:43

timestamp

c256ef2

smiklosovic force-pushed the CASSANDRA-21178 branch from da616ba to c256ef2 Compare February 17, 2026 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-21178 add created_at column to system_distributed.compression_dictionaries#4622

CASSANDRA-21178 add created_at column to system_distributed.compression_dictionaries#4622
smiklosovic wants to merge 1 commit intoapache:trunkfrom
smiklosovic:CASSANDRA-21178

smiklosovic commented Feb 17, 2026

Uh oh!

smiklosovic Feb 17, 2026

Uh oh!

yifan-c Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smiklosovic commented Feb 17, 2026

Uh oh!

smiklosovic Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

yifan-c Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants