Monitor and plot RSS memory and CPU usage during qlever index#277
Monitor and plot RSS memory and CPU usage during qlever index#277tanmay-9 wants to merge 33 commits into
qlever index#277Conversation
qlever index commandqlever index command
…/podman different memUsage parsing
qlever index commandqlever index
…e to the plot. Also make gb use consistent
…u cores used and add downsampling (max_points=500) for plot
…l explicit in index.py
qlever indexqlever index
| return f"{bytes_val / GB:.2f} GB" | ||
|
|
||
|
|
||
| def parse_memory_to_bytes(memory_string: str) -> int: |
There was a problem hiding this comment.
I think any function that is, in principle, general-purpose and does not have a lot of context or none at all, should be in utils.py.
And now that we are talking about it, it probably makes sense to have a util directory with different .py files for the groups of utils. That should be a separate PR, which comes before or after this one.
| ) | ||
|
|
||
|
|
||
| def compute_phase_boundaries( |
There was a problem hiding this comment.
There should be one function that analyzes a given log file (potentially partial) and the output of which can then be used both here and for the index-stats command. It's fine if the output of the function contains more information than is used by the respective caller (as long as it's not outrageously more or outrageously costly to compute, which I don't think will be the case here)
| return match.group(1) if match else None | ||
|
|
||
|
|
||
| def parse_qleverfile(qleverfile_path: Path) -> dict[str, str]: |
There was a problem hiding this comment.
Don't we have functionality for that already? What if the respective variable in the Qleverfile is overridden by a command-line argument
| return phases | ||
|
|
||
|
|
||
| def parse_git_hash(log_path: Path) -> str | None: |
There was a problem hiding this comment.
Looks like a util function
| return " | ".join(parts) if parts else None | ||
|
|
||
|
|
||
| def draw_usage_plot( |
There was a problem hiding this comment.
What's the difference between "draw" and "render"? Maybe it's just the naming ...
So far, the
qlever indexcommand gave no insight into how much memory an index build actually needs or which index phase is responsible for the peak.With this change, every index build records RSS memory and CPU usage over time, writes
<name>.resource-usage-log.tsv, and renders<name>.resource-usage-plot.pngonce the index build finishes. The plot shades each index build phase (parsing, vocabulary merge, conversion, each permutation, and the text index) as a separate band and annotates the memory peak, so resource usage can be attributed to a specific phase. For comparison across runs and settings, the plot is captioned with the git hash of the index binary, theSTXXL_MEMORYsetting, and the batch size. This works whether the index is built natively or in a container (docker/podman).The sampling rate can be set with
--resource-usage-intervaland the plot density on long builds with--resource-usage-plot-max-points(the sampling itself is unaffected, only how many points are drawn). There is also a--resource-usage-plot-onlyoption that renders the plot from an existing<name>.resource-usage-log.tsvwithout re-running the index build, which is useful for tweaking--resource-usage-plot-max-points.numpyandmatplotlibare used only to render the resource-usage plot and not to log the resource-usage during an index build. Because that is a narrow use case andmatplotlibis heavy, pulling in several transitive dependencies, they are an optional plot extra (pip install "qlever[plot]") rather than core dependencies. This keeps the base install small. Without these libraries the index build still succeeds and writes the resource-usage log; only the plot is skipped, with a hint to installqlever[plot].