Monitor and plot RSS memory and CPU usage during `qlever index` by tanmay-9 · Pull Request #277 · qlever-dev/qlever-control

tanmay-9 · 2026-04-06T17:18:42Z

So far, the qlever index command gave no insight into how much memory an index build actually needs or which index phase is responsible for the peak.

With this change, every index build records RSS memory and CPU usage over time, writes <name>.resource-usage-log.tsv, and renders <name>.resource-usage-plot.png once the index build finishes. The plot shades each index build phase (parsing, vocabulary merge, conversion, each permutation, and the text index) as a separate band and annotates the memory peak, so resource usage can be attributed to a specific phase. For comparison across runs and settings, the plot is captioned with the git hash of the index binary, the STXXL_MEMORY setting, and the batch size. This works whether the index is built natively or in a container (docker / podman).

The sampling rate can be set with --resource-usage-interval and the plot density on long builds with --resource-usage-plot-max-points (the sampling itself is unaffected, only how many points are drawn). There is also a --resource-usage-plot-only option that renders the plot from an existing <name>.resource-usage-log.tsv without re-running the index build, which is useful for tweaking --resource-usage-plot-max-points.

numpy and matplotlib are used only to render the resource-usage plot and not to log the resource-usage during an index build. Because that is a narrow use case and matplotlib is heavy, pulling in several transitive dependencies, they are an optional plot extra (pip install "qlever[plot]") rather than core dependencies. This keeps the base install small. Without these libraries the index build still succeeds and writes the resource-usage log; only the plot is skipped, with a hint to install qlever[plot].

…/podman different memUsage parsing

…onitor.py

…e to the plot. Also make gb use consistent

…u cores used and add downsampling (max_points=500) for plot

…l explicit in index.py

…stall qlever[plot]`

hannahbast

1-1 with Tanmay

hannahbast · 2026-06-03T15:22:43Z

+    return f"{bytes_val / GB:.2f} GB"
+
+
+def parse_memory_to_bytes(memory_string: str) -> int:


I think any function that is, in principle, general-purpose and does not have a lot of context or none at all, should be in utils.py.

And now that we are talking about it, it probably makes sense to have a util directory with different .py files for the groups of utils. That should be a separate PR, which comes before or after this one.

hannahbast · 2026-06-03T15:26:09Z

+    )
+
+
+def compute_phase_boundaries(


There should be one function that analyzes a given log file (potentially partial) and the output of which can then be used both here and for the index-stats command. It's fine if the output of the function contains more information than is used by the respective caller (as long as it's not outrageously more or outrageously costly to compute, which I don't think will be the case here)

hannahbast · 2026-06-03T15:28:19Z

+    return match.group(1) if match else None
+
+
+def parse_qleverfile(qleverfile_path: Path) -> dict[str, str]:


Don't we have functionality for that already? What if the respective variable in the Qleverfile is overridden by a command-line argument

hannahbast · 2026-06-03T15:28:32Z

+    return phases
+
+
+def parse_git_hash(log_path: Path) -> str | None:


Looks like a util function

hannahbast · 2026-06-03T15:29:56Z

+    return "   |   ".join(parts) if parts else None
+
+
+def draw_usage_plot(


What's the difference between "draw" and "render"? Maybe it's just the naming ...

… index-stats and usage_plot

…-per-batch

tanmay-9 and others added 3 commits April 2, 2026 11:50

Add first version of index memory monitoring

b764dcc

Fix memory monitor to correctly select native/container system

51ff865

Merge branch 'qlever-dev:main' into compute-index-mem-usage

556a54e

tanmay-9 changed the title ~~Compute the physical memory usage used by the qlever index command~~ Compute the physical memory used by the qlever index command Apr 7, 2026

tanmay-9 and others added 2 commits April 8, 2026 17:58

Use engine_name from qlever __init__ in memory monitor and fix docker…

85273cd

…/podman different memUsage parsing

Merge branch 'qlever-dev:main' into compute-index-mem-usage

c7e02d6

tanmay-9 changed the title ~~Compute the physical memory used by the qlever index command~~ Track memory usage during qlever index Apr 8, 2026

tanmay-9 and others added 19 commits April 9, 2026 10:06

Merge branch 'qlever-dev:main' into compute-index-mem-usage

0f8544a

Add a way to specify different parent pid for memory monitoring

1cba7a6

Add pss and uss memory monitoring

802652b

Fix process finding logic for memory monioring

f6d82ab

Add swap monitoring as well

d2a8b1d

Merge branch 'qlever-dev:main' into compute-index-mem-usage

37cef0d

Merge branch 'qlever-dev:main' into compute-index-mem-usage

91ed89e

Add tsv logging and matplotlib plotting with index phases to memory_m…

a378e5f

…onitor.py

Parse Qleverfile and logs to add git hash, stxxl memory and batch siz…

0de53d4

…e to the plot. Also make gb use consistent

Change memory_monitor to resource_monitor, cpu percent per core to cp…

126358f

…u cores used and add downsampling (max_points=500) for plot

Add plot_only option to index and make max_plot_points configurable

64083ca

Merge remote-tracking branch 'origin/main' into compute-index-mem-usage

318b5a8

Merge remote-tracking branch 'origin/main' into compute-index-mem-usage

29543e2

Merge remote-tracking branch 'origin/main' into compute-index-mem-usage

27640c3

Fix minor code issues for resource_monitor

bfa3898

Extract pure functions to module-level and make render_usage_plot cal…

3416812

…l explicit in index.py

Separate plot rendering logic in its own file

e5cfc7d

Fix failing index tests as a result of ResourceMonitor usage

7469737

Add pure-function tests for resource_monitor and usage_plot

019d379

tanmay-9 changed the title ~~Track memory usage during qlever index~~ Monitor and plot RSS memory and CPU usage during qlever index May 28, 2026

tanmay-9 and others added 3 commits May 28, 2026 15:12

Improve docstrings and comments, and fix minor issues

1631db9

Fix formatting

da3b0fc

Merge remote-tracking branch 'origin/main' into compute-index-mem-usage

10f2492

tanmay-9 added 3 commits June 3, 2026 15:49

Consistently use resource-usage wording everywhere

f3fb728

Make numpy and matplotlib optional dependencies installed via `pip in…

2c100eb

…stall qlever[plot]`

Skip the test file when numpy and matplotlib are not installed

79309d1

hannahbast reviewed Jun 3, 2026

View reviewed changes

tanmay-9 added 3 commits June 5, 2026 16:33

Add a function to parse index phase markers which can be used in both…

360c5e6

… index-stats and usage_plot

Move resource_usage general purpose functions to util

e124353

Fix parse_qleverfile to take the correct stxxl memory and num-triples…

49ebf5a

…-per-batch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor and plot RSS memory and CPU usage during `qlever index`#277

Monitor and plot RSS memory and CPU usage during `qlever index`#277
tanmay-9 wants to merge 33 commits into
qlever-dev:mainfrom
tanmay-9:compute-index-mem-usage

tanmay-9 commented Apr 6, 2026 •

edited

Loading

Uh oh!

hannahbast left a comment

Uh oh!

hannahbast Jun 3, 2026

Uh oh!

hannahbast Jun 3, 2026

Uh oh!

hannahbast Jun 3, 2026

Uh oh!

hannahbast Jun 3, 2026

Uh oh!

hannahbast Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return f"{bytes_val / GB:.2f} GB"


		def parse_memory_to_bytes(memory_string: str) -> int:

		return match.group(1) if match else None


		def parse_qleverfile(qleverfile_path: Path) -> dict[str, str]:

		return phases


		def parse_git_hash(log_path: Path) -> str \| None:

		return " \| ".join(parts) if parts else None


		def draw_usage_plot(

Conversation

tanmay-9 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannahbast left a comment

Choose a reason for hiding this comment

Uh oh!

hannahbast Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

hannahbast Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

hannahbast Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

hannahbast Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

hannahbast Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tanmay-9 commented Apr 6, 2026 •

edited

Loading