Expose C_SEGMENT_SIZE_BYTES and C_INDEX_SIZE_BYTES gauges#218
Expose C_SEGMENT_SIZE_BYTES and C_INDEX_SIZE_BYTES gauges#218lukas8219 wants to merge 5 commits into
C_SEGMENT_SIZE_BYTES and C_INDEX_SIZE_BYTES gauges#218Conversation
| %% local segments | ||
| [begin ok = delete_segment_from_index(I) end | ||
| || I <- IdxFiles], | ||
| [delete_segment_from_index(I) || I <- IdxFiles], |
There was a problem hiding this comment.
Not sure we pattern match from above is really necessary?
There was a problem hiding this comment.
Yeah looks like it isn't since delete_segment_from_index/1 already hard matches on ok
the-mikedavis
left a comment
There was a problem hiding this comment.
I took a look and I have a few nits but I think this is the right approach. I'm not an authority here so I'd wait for feedback from Arnaud / David / Karl before acting on my opinions (especially the index file part).
| {segments, ?C_SEGMENTS, counter, "Number of segments"} | ||
| {segments, ?C_SEGMENTS, counter, "Number of segments"}, | ||
| {segment_size_bytes, ?C_SEGMENT_SIZE_BYTES, counter, "Total size of all segment files in bytes"}, | ||
| {index_size_bytes, ?C_INDEX_SIZE_BYTES, counter, "Total size of all index files in bytes"} |
There was a problem hiding this comment.
I'm not sure it's worthwhile to track index data. It would mean an extra fstat for index files during recovery and retention to gather the size data. Since chunk headers are at minimum 48 bytes per chunk, the segment will always be larger (48 > 29 for each index record). If these metrics aim to answer a question of "which stream is taking up all of my data" I think segment size metrics are enough.
I think there's a separate question of "are my chunks so small they're hurting performance" and that could be answered by a bytes-per-chunk metric instead. Tracking index data here would indirectly hint towards that rather than answer it clearly.
There was a problem hiding this comment.
I am not strong on any claim here as I also believe that for big clusters, with loads of streams, that's not a cheap price to pay.
There was a problem hiding this comment.
@the-mikedavis
To track or not to track it: I thought maybe, if the overhead is not big, to keep going with both. Easier to catch bugs if users have this level of visibility.
Now that was myself assuming there's a chance the index file could have holes/get bloated somehow - which could have been naive as Streams are running for lots of years and there's not a single complaint about it.
I'll good with removing the index tracking from this PR
| %% local segments | ||
| [begin ok = delete_segment_from_index(I) end | ||
| || I <- IdxFiles], | ||
| [delete_segment_from_index(I) || I <- IdxFiles], |
There was a problem hiding this comment.
Yeah looks like it isn't since delete_segment_from_index/1 already hard matches on ok
| counters:put(Cnt, ?C_SEGMENT_SIZE_BYTES, InitSegBytes), | ||
| counters:put(Cnt, ?C_INDEX_SIZE_BYTES, InitIdxBytes), |
There was a problem hiding this comment.
I haven't looked very deeply but I'm not sure this accounts for the seeking to Size shorter than the file length after a hard crash + data loss. There could be garbled garbage sitting at the end of a segment. This would account for those extra bytes permanently
There was a problem hiding this comment.
You are referring to the maybe_fix_corrupted_files @the-mikedavis ? If so, the InitXBytes is calculated after that call so the truncates should be applied already.
There was a problem hiding this comment.
Actually I'm thinking of first_and_last_seginfos/1, I believe the size it calculates for the last segment file is based on the last valid position in the index file. Since the segment file is written before the index, Size might be smaller than the actual segment file size if the writer terminates between writing the segment and writing the index, or if there is data loss from a power loss. If we set the counter here to the file size (rather than the size according to Size) then we'll always be overestimating: below we set file:position(SegFd, Size) and file:truncate(SegFd). The segment data counter will always contain the truncated bytes.
| {segments, ?C_SEGMENTS, counter, "Number of segments"} | ||
| {segments, ?C_SEGMENTS, counter, "Number of segments"}, | ||
| {segment_size_bytes, ?C_SEGMENT_SIZE_BYTES, counter, "Total size of all segment files in bytes"}, | ||
| {index_size_bytes, ?C_INDEX_SIZE_BYTES, counter, "Total size of all index files in bytes"} |
There was a problem hiding this comment.
It looks like all of these are counter currently but I think most of them (maybe all of them?) should be gauge. Maybe it's outside the scope of this change though since the existing ones all use counter
There was a problem hiding this comment.
Pull request overview
This PR adds two new size metrics to Osiris logs—total segment bytes and total index bytes—so they can be surfaced in the Management UI and exported to Prometheus. The metrics are initialized on startup via filesystem stat, incremented on writes, and decremented after retention deletes segments.
Changes:
- Extend
osiris_logcounters withsegment_size_bytesandindex_size_bytes, including initialization, write-time increments, and retention-time decrements. - Extend retention evaluation to return deleted segment/index byte counts so stream processes can update the new metrics.
- Add Common Test coverage for fresh logs, writes, restart, and retention effects on the new size counters.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/osiris_log.erl |
Adds the two new counters, initializes them at init, updates them on write, and decrements them based on retention deletions. |
src/osiris_retention.erl |
Adjusts scheduling pattern match to handle the extended retention evaluation return value. |
src/osiris.hrl |
Increases the base log counter field count to accommodate the two new counters. |
test/osiris_log_SUITE.erl |
Adds tests validating the new counters across init/write/restart/retention flows and adapts retention tests to new return shape. |
test/osiris_SUITE.erl |
Adds an integration test ensuring retention decreases segment_size_bytes at the writer counter level. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
the-mikedavis
left a comment
There was a problem hiding this comment.
This is looking good to me 👍. I only have nitpicks / style comments left.
Before merging though I think we should discuss/decide whether index file bytes are worthy of tracking. I would lean against it but I don't have a strong opinion either way.
| -spec evaluate_retention(file:filename_all(), [retention_spec()]) -> | ||
| {range(), FirstTimestamp :: osiris:timestamp(), | ||
| NumRemainingFiles :: non_neg_integer()}. | ||
| NumRemainingFiles :: non_neg_integer(), | ||
| {DeletedSegBytes :: non_neg_integer(), DeletedIndexBytes :: non_neg_integer()}}. |
There was a problem hiding this comment.
I think this tuple is getting a little bit large. It could probably be a record or map instead? (I like maps :)
| % convert to binary for faster operations later | ||
| % mostly in segment_from_index_file/1 |
There was a problem hiding this comment.
I think we can keep this comment. Unrelated to this PR I think we should explore replacing all list strings in Osiris with binaries. They're much much more compact for all text like stream names and directories, especially ASCII. I will throw something on my todo list to look at it.
There was a problem hiding this comment.
Yes. My bad. AI-slopped hard on this one
- Restored comments about future performance improvements - Removed Index Tracking - Moved to Maps instead of Tuple
What
Add both metrics for C_SEGMENT_SIZE_BYTES and C_INDEX_SIZE_BYTES so we can enable on Management UI and Prometheus.
How
statat each file atinitBelow is AI-generated
Benchmark with Mac Book M3 Max PRO (32GB RAM 10-4 CPU)
sum_log_sizesis 101 × 2 stat calls, negligibleThe 406 ms on the first retention run is the actual file deletion of 51 segments. Runs 2–5 (index scan only, nothing to delete) are 2–3 ms regardless of how many segments remain — that confirms the O(remaining) scan is cheap at this scale.
At 100 GB you'd expect ~800 ms for a full 50% delete run, and the same 2–4 ms for idempotent evaluations. Nothing alarming.
Closes #161