Skip to content

CASSANDRA-21178 add created_at column to system_distributed.compression_dictionaries#4622

Open
smiklosovic wants to merge 1 commit intoapache:trunkfrom
smiklosovic:CASSANDRA-21178
Open

CASSANDRA-21178 add created_at column to system_distributed.compression_dictionaries#4622
smiklosovic wants to merge 1 commit intoapache:trunkfrom
smiklosovic:CASSANDRA-21178

Conversation

@smiklosovic
Copy link
Contributor

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

throw new IllegalArgumentException("Provided dictionary id must be positive but it is '" + dictId + "'.");
if (dict == null || dict.length == 0)
throw new IllegalArgumentException("Provided dictionary byte array is null or empty.");
if (dict.length > FileUtils.ONE_MIB)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this here because when I was testing import / export with created_at, I realized that we can not import dictionary bigger than 1MiB BUT WE CAN TRAIN IT.

So we train > 1MiB but we can not import after export.

It is possible to override the configuration via nodetool or cql, there we do not check max size, we check that only on import ...

I can revert this change and treat is more robustly in a completely different ticket, hardening sizes on all levels (cql, nodetool ...), can go in even after 6.0-alpha1. If I remove it here, we will at least not see the discrepancy I described above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the background! It helps to understand.

Dictionaries are attached to every SSTable. The size limit is added with this context. The size of the dictionaries are typically 64~100 KiB. That said, the underlying zstd trainer do allow train large dictionaries. The questions, do we want to train dictionaries larger than 1 MiB? The added dictionary size might outweighs the compression gains (64 KiB vs. 1 MiB dictionaries)

@smiklosovic smiklosovic requested a review from yifan-c February 17, 2026 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants