CASSANDRA-21178 add created_at column to system_distributed.compression_dictionaries#4622
CASSANDRA-21178 add created_at column to system_distributed.compression_dictionaries#4622smiklosovic wants to merge 1 commit intoapache:trunkfrom
Conversation
| throw new IllegalArgumentException("Provided dictionary id must be positive but it is '" + dictId + "'."); | ||
| if (dict == null || dict.length == 0) | ||
| throw new IllegalArgumentException("Provided dictionary byte array is null or empty."); | ||
| if (dict.length > FileUtils.ONE_MIB) |
There was a problem hiding this comment.
I removed this here because when I was testing import / export with created_at, I realized that we can not import dictionary bigger than 1MiB BUT WE CAN TRAIN IT.
So we train > 1MiB but we can not import after export.
It is possible to override the configuration via nodetool or cql, there we do not check max size, we check that only on import ...
I can revert this change and treat is more robustly in a completely different ticket, hardening sizes on all levels (cql, nodetool ...), can go in even after 6.0-alpha1. If I remove it here, we will at least not see the discrepancy I described above.
There was a problem hiding this comment.
Thanks for the background! It helps to understand.
Dictionaries are attached to every SSTable. The size limit is added with this context. The size of the dictionaries are typically 64~100 KiB. That said, the underlying zstd trainer do allow train large dictionaries. The questions, do we want to train dictionaries larger than 1 MiB? The added dictionary size might outweighs the compression gains (64 KiB vs. 1 MiB dictionaries)
da616ba to
c256ef2
Compare
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira