tools/mllm-llm-benchmark: add CSV output and configurable runs by huangzhenhua111 · Pull Request #622 · UbiquitousLearning/mllm

huangzhenhua111 · 2026-01-31T04:09:10Z

What have been changed

Add CLI flags: --runs, --cooldown_s, --output_csv
Validate --runs must be > 0 (avoid huge reserve() / meaningless 0 runs)
Error out if --output_csv cannot be opened (no silent skip)
Print per-run TTFT/prefill/decode + derived latency
Write averaged metrics per (pp,tg) into CSV with header
Guard divide-by-zero when speed is 0
KV cache estimate fields currently filled with 0 (follow-up to compute from config)

How to test

Tested on x86_64 (WSL/Ubuntu), CPU backend

./build/bin/mllm-llm-benchmark \
  -n tiny_llama \
  -m /home/huangzhenhua/models/mllm_tinyllama/tinyllama-fp32.mllm \
  -c /home/huangzhenhua/mllm-runok/examples/llama/config_tiny_llama.json \
  -pp 8 -tg 4 -t 4 -cl 2048 \
  -r 3 -cs 2 \
  -oc /tmp/mllm_bench.csv

cat /tmp/mllm_bench.csv

output
schema_version,git_commit,arch,model_name,pp,tg,ttft_ms,prefill_speed,decode_speed,prefill_ms,decode_ms_per_tok,kv_est_bytes_pp,kv_est_bytes_final
1,f5c0006ce4f93d378e960648936c852418e69c88,x86_64,tiny_llama,8,4,439.039,19.8621,5.04427,402.778,198.245,0,0

Summary by CodeRabbit

New Features
- Configurable benchmark parameters: run count and cooldown between runs
- Optional CSV export with header and summarized single-line results
- Enhanced reporting: per-run and averaged throughput and latency, clearer progress messages
Bug Fixes / Reliability
- Input validation for run count and safer sequencing (state cleared between runs)
- Rough KV cache memory estimates included in outputs

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-31T04:09:26Z

📝 Walkthrough

Walkthrough

Adds CLI flags for runs, cooldown_s, output_csv, schema_version, and kv_dtype_bytes; validates inputs; runs configurable benchmark iterations with cache clearing and optional cooldowns; computes per-run and averaged throughput/latency metrics; records optional CSV output with header and aggregated row including KV estimates and git/schema metadata.

Changes

Cohort / File(s)	Summary
Benchmark Tool `tools/mllm-llm-benchmark/main.cpp`	Added headers and new CLI flags (`--runs`, `--cooldown_s`, `--output_csv`, `--schema_version`, `--kv_dtype_bytes`); added git commit macros; set CPU/thread config after parsing; validate `runs`; replaced fixed 3-run loop with configurable R; reserve results, clear cache each run, compute per-run throughput→latency, optional cooldown between runs, compute averages, emit CSV header and aggregated CSV line (includes placeholders for KV byte estimates), and close CSV file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(cli): add mllm-llm-benchmark tool for performance testing #481: Introduced the benchmark entrypoint and initial CLI/benchmark structure that this change extends with multi-run, cooldowns, and CSV export.
feat: Add benchmark for Qwen3 and update readme about benchmark #487: Makes related edits to tools/mllm-llm-benchmark/main.cpp around run loop control and output aggregation; closely connected to these control-flow and output changes.

Poem

🐇 I hopped through runs and cleared the slate,
Counting latencies, neat and great,
CSV carrots in a tidy row,
Averaged numbers all aglow,
A benchmarking rabbit — swift and late!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: adding CSV output functionality and making the number of runs configurable.
Description check	✅ Passed	The description provides comprehensive details including changes made, testing information, and example output, though it deviates from the template structure.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@tools/mllm-llm-benchmark/main.cpp`:
- Line 45: The CLI option kv_dtype_bytes (added via
mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes")) is never used; update the
KV cache size estimation (replace the hardcoded 0.0 values where KV cache is
computed) to use kv_dtype_bytes to compute bytes-per-element and derive the KV
cache memory estimate, or if you prefer to remove the feature, delete the
kv_dtype_bytes argument and any references to KV cache estimation; specifically,
use the parsed kv_dtype_bytes value to multiply by number of KV elements (or
element count computation already present) to produce the correct KV cache size
instead of 0.0.
- Around line 187-216: The averaged metrics (avg_ttft, avg_prefill_speed,
avg_decode_speed, avg_prefill_ms, avg_decode_ms_per_tok) are only written to
csv_file via the stringstream ss but never printed to the console; update the
end of the aggregation block in main.cpp so the same aggregated line is printed
to stdout (similar to the per-run print at lines ~159-166). After building ss
(which includes schema_version, MLLM_GIT_COMMIT_HASH,
mllm::cpu::CURRENT_ARCH_STRING, model_name, pp, tg, avg_* and kv estimates),
write ss.str() to std::cout (and still to csv_file if csv_file.is_open()), so
users without --output_csv still see the average results.
- Around line 49-50: num_threads currently can be 0 when unset; update the flag
handling to ensure a sensible default or validate before use: either add a
.def(...) when defining num_threads to set a default (e.g., 1 or
std::thread::hardware_concurrency()) or check num_threads.get() before calling
mllm::Context::instance().setCpuOpThreads and mllm::setMaximumNumThreads and
replace 0 with a fallback value; reference the num_threads variable and the
calls mllm::Context::instance().setCpuOpThreads(...) and
mllm::setMaximumNumThreads(...) when making the change.

🧹 Nitpick comments (3)

tools/mllm-llm-benchmark/main.cpp (3)
10-10: Unused include: <algorithm> is not used.

The comment states this is "For std::transform", but std::transform is not called anywhere in this file. Consider removing the unused include.
Suggested fix
-#include <algorithm>  // For std::transform
26-27: Unused macros: STR_HELPER and STR are never used.

These macros duplicate the functionality of STRINGIFY_INTERNAL and STRINGIFY (defined at lines 17-18), which are already used throughout the file. Consider removing these unused macros.
Suggested fix
-#define STR_HELPER(x) `#x`
-#define STR(x) STR_HELPER(x)
-
168-172: Consider hoisting cooldown_s.get() outside the loop.

The cooldown_s.get() call is repeated on every iteration, but the value never changes. Move it outside the loop for cleaner code.

Based on learnings: "Avoid unnecessary object creation in loops or hot paths."
Suggested fix
+    int cool = cooldown_s.get();
     for (int i = 0; i < R; ++i) {
       mllm::print("  Run", i + 1, "of", R, "...");
       // ... existing code ...
 
-      int cool = cooldown_s.get();
       if (i + 1 < R && cool > 0) {

coderabbitai · 2026-01-31T04:13:45Z

tools/mllm-llm-benchmark/main.cpp

+  auto& cooldown_s = mllm::Argparse::add<int32_t>("-cs|--cooldown_s").help("Cooldown time between runs in seconds").def(5);
+  auto& output_csv = mllm::Argparse::add<std::string>("-oc|--output_csv").help("Output results to a CSV file").def("");
+  auto& schema_version = mllm::Argparse::add<int32_t>("-sv|--schema_version").help("Schema version for output format").def(1);
+  auto& kv_dtype_bytes = mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes").help("KV cache data type bytes (1: int8, 2: fp16, 4: fp32)").def(4);


🛠️ Refactor suggestion | 🟠 Major

kv_dtype_bytes is defined but never used.

This CLI argument is collected but never referenced in the code. The KV cache estimates at lines 195-196 are hardcoded to 0.0 instead of being computed using this value.

Either implement the KV cache estimation using kv_dtype_bytes, or remove the unused argument to avoid confusing users.

🤖 Prompt for AI Agents

In `@tools/mllm-llm-benchmark/main.cpp` at line 45, The CLI option kv_dtype_bytes (added via mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes")) is never used; update the KV cache size estimation (replace the hardcoded 0.0 values where KV cache is computed) to use kv_dtype_bytes to compute bytes-per-element and derive the KV cache memory estimate, or if you prefer to remove the feature, delete the kv_dtype_bytes argument and any references to KV cache estimation; specifically, use the parsed kv_dtype_bytes value to multiply by number of KV elements (or element count computation already present) to produce the correct KV cache size instead of 0.0.

coderabbitai · 2026-01-31T04:13:45Z

tools/mllm-llm-benchmark/main.cpp

+  mllm::Context::instance().setCpuOpThreads(num_threads.get());
+  mllm::setMaximumNumThreads((uint32_t)num_threads.get());


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the Argparse implementation find . -name "Argparse.hpp" -o -name "Argparse.h" | head -5

Repository: UbiquitousLearning/mllm

Length of output: 92

🏁 Script executed:

# Look at the main.cpp file around line 35 and 49-50 sed -n '30,55p' tools/mllm-llm-benchmark/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1757

🏁 Script executed:

# Search for how num_threads is used and defined rg -n "num_threads" tools/mllm-llm-benchmark/ -A2 -B2

Repository: UbiquitousLearning/mllm

Length of output: 1148

🏁 Script executed:

# Check if there's any validation or default initialization rg -n "\.def\(" tools/mllm-llm-benchmark/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 697

🏁 Script executed:

# Now examine the Argparse.hpp implementation to understand get() behavior cat -n mllm/utils/Argparse.hpp | head -150

Repository: UbiquitousLearning/mllm

Length of output: 4941

🏁 Script executed:

# Look for the get() method specifically rg -n "get\(\)" mllm/utils/Argparse.hpp -A5 -B5

Repository: UbiquitousLearning/mllm

Length of output: 1536

🏁 Script executed:

# Check if there's any constructor or initialization for int32_t type rg -n "int32_t|get\(" mllm/utils/Argparse.hpp | head -40

Repository: UbiquitousLearning/mllm

Length of output: 324

Add a default value or validation for num_threads.

When --threads is not provided, num_threads.get() returns 0 (the default-initialized value for int32_t). Setting thread counts to 0 is likely unintended behavior. Unlike the newer arguments (runs, cooldown_s, etc.), num_threads lacks a .def() call. Either add .def() with a reasonable default or validate the value before use at lines 49-50.

🤖 Prompt for AI Agents

In `@tools/mllm-llm-benchmark/main.cpp` around lines 49 - 50, num_threads currently can be 0 when unset; update the flag handling to ensure a sensible default or validate before use: either add a .def(...) when defining num_threads to set a default (e.g., 1 or std::thread::hardware_concurrency()) or check num_threads.get() before calling mllm::Context::instance().setCpuOpThreads and mllm::setMaximumNumThreads and replace 0 with a fallback value; reference the num_threads variable and the calls mllm::Context::instance().setCpuOpThreads(...) and mllm::setMaximumNumThreads(...) when making the change.

coderabbitai · 2026-01-31T04:13:45Z

tools/mllm-llm-benchmark/main.cpp

+    avg_ttft /= denom;
+    avg_prefill_speed /= denom;
+    avg_decode_speed /= denom;
+
+    float avg_prefill_ms = (avg_prefill_speed > 0.0f) ? (pp / avg_prefill_speed) * 1000.0f : 0.0f;
+    float avg_decode_ms_per_tok = (avg_decode_speed > 0.0f) ? (1.0f / avg_decode_speed) * 1000.0f : 0.0f;
+
+    // Rough KV cache estimate (bytes)
+    double kv_est_bytes_pp = 0.0;
+    double kv_est_bytes_final = 0.0;
+
+    // Prepare one line output (avg)
+    std::stringstream ss;
+    ss << schema_version.get() << "," 
+       << STRINGIFY(MLLM_GIT_COMMIT_HASH) << "," 
+       << mllm::cpu::CURRENT_ARCH_STRING << ","
+       << model_name.get() << ","
+       << pp << ","
+       << tg << ","
+       << avg_ttft << ","
+       << avg_prefill_speed << ","
+       << avg_decode_speed << ","
+       << avg_prefill_ms << ","
+       << avg_decode_ms_per_tok << ","
+       << kv_est_bytes_pp << ","
+       << kv_est_bytes_final;
+
+    if (csv_file.is_open()) {
+      csv_file << ss.str() << std::endl;
+    }


⚠️ Potential issue | 🟡 Minor

Average results are calculated but never printed to console.

The code computes averages (avg_ttft, avg_prefill_speed, avg_decode_speed, avg_prefill_ms, avg_decode_ms_per_tok) but only writes them to the CSV file. Users who don't specify --output_csv will never see the aggregated benchmark results.

Consider printing the averages to console, similar to how per-run results are printed at lines 159-166.

Suggested fix

avg_ttft /= denom; avg_prefill_speed /= denom; avg_decode_speed /= denom; float avg_prefill_ms = (avg_prefill_speed > 0.0f) ? (pp / avg_prefill_speed) * 1000.0f : 0.0f; float avg_decode_ms_per_tok = (avg_decode_speed > 0.0f) ? (1.0f / avg_decode_speed) * 1000.0f : 0.0f; + mllm::print(" ----------------------------------------"); + mllm::print(" Average Results (", R, "runs ):"); + mllm::print(" Avg TTFT :", avg_ttft, "ms"); + mllm::print(" Avg Prefill Speed:", avg_prefill_speed, "tokens/s"); + mllm::print(" Avg Decode Speed :", avg_decode_speed, "tokens/s"); + mllm::print(" Avg Prefill Latency :", avg_prefill_ms, "ms"); + mllm::print(" Avg Decode Latency :", avg_decode_ms_per_tok, "ms/token"); + // Rough KV cache estimate (bytes)

🤖 Prompt for AI Agents

In `@tools/mllm-llm-benchmark/main.cpp` around lines 187 - 216, The averaged metrics (avg_ttft, avg_prefill_speed, avg_decode_speed, avg_prefill_ms, avg_decode_ms_per_tok) are only written to csv_file via the stringstream ss but never printed to the console; update the end of the aggregation block in main.cpp so the same aggregated line is printed to stdout (similar to the per-run print at lines ~159-166). After building ss (which includes schema_version, MLLM_GIT_COMMIT_HASH, mllm::cpu::CURRENT_ARCH_STRING, model_name, pp, tg, avg_* and kv estimates), write ss.str() to std::cout (and still to csv_file if csv_file.is_open()), so users without --output_csv still see the average results.

…comments * Add CSV output and configurable runs/cooldown * Validate --runs (>0) and error out on CSV open failure * Guard divide-by-zero and compute average metrics * Restore benchmark loop comments for better readability

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tools/mllm-llm-benchmark/main.cpp`:
- Around line 205-219: The CSV writer currently concatenates raw string fields
(STRINGIFY(MLLM_GIT_COMMIT_HASH), mllm::cpu::CURRENT_ARCH_STRING,
model_name.get()) into the stream which breaks parsing when those values contain
commas or quotes; update the code that builds the stringstream (around
schema_version.get(), STRINGIFY(MLLM_GIT_COMMIT_HASH),
mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) to escape and quote these
string fields before inserting them (e.g., wrap in double quotes and double any
internal double-quote characters), leaving numeric fields as-is so the produced
CSV is safe to parse.

coderabbitai · 2026-01-31T05:17:43Z

tools/mllm-llm-benchmark/main.cpp

+    // Prepare one line output (avg)
+    std::stringstream ss;
+    ss << schema_version.get() << "," 
+       << STRINGIFY(MLLM_GIT_COMMIT_HASH) << "," 
+       << mllm::cpu::CURRENT_ARCH_STRING << ","
+       << model_name.get() << ","
+       << pp << ","
+       << tg << ","
+       << avg_ttft << ","
+       << avg_prefill_speed << ","
+       << avg_decode_speed << ","
+       << avg_prefill_ms << ","
+       << avg_decode_ms_per_tok << ","
+       << kv_est_bytes_pp << ","
+       << kv_est_bytes_final;


⚠️ Potential issue | 🟡 Minor

Escape string fields in CSV output.

model_name, arch, and commit can contain commas or quotes, which will break CSV parsing. Quote/escape string fields before writing.

Proposed fix

- std::stringstream ss; - ss << schema_version.get() << "," - << STRINGIFY(MLLM_GIT_COMMIT_HASH) << "," - << mllm::cpu::CURRENT_ARCH_STRING << "," - << model_name.get() << "," + auto csv_escape = [](const std::string& s) { + std::string out; + out.reserve(s.size() + 2); + out.push_back('"'); + for (char c : s) { + if (c == '"') out += "\"\""; + else out.push_back(c); + } + out.push_back('"'); + return out; + }; + + std::stringstream ss; + ss << schema_version.get() << "," + << csv_escape(STRINGIFY(MLLM_GIT_COMMIT_HASH)) << "," + << csv_escape(mllm::cpu::CURRENT_ARCH_STRING) << "," + << csv_escape(model_name.get()) << "," << pp << "," << tg << "," << avg_ttft << ","

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Prepare one line output (avg)

std::stringstream ss;

ss << schema_version.get() << ","

<< STRINGIFY(MLLM_GIT_COMMIT_HASH) << ","

<< mllm::cpu::CURRENT_ARCH_STRING << ","

<< model_name.get() << ","

<< pp << ","

<< tg << ","

<< avg_ttft << ","

<< avg_prefill_speed << ","

<< avg_decode_speed << ","

<< avg_prefill_ms << ","

<< avg_decode_ms_per_tok << ","

<< kv_est_bytes_pp << ","

<< kv_est_bytes_final;

// Prepare one line output (avg)

auto csv_escape = [](const std::string& s) {

std::string out;

out.reserve(s.size() + 2);

out.push_back('"');

for (char c : s) {

if (c == '"') out += "\"\"";

else out.push_back(c);

}

out.push_back('"');

return out;

};

std::stringstream ss;

ss << schema_version.get() << ","

<< csv_escape(STRINGIFY(MLLM_GIT_COMMIT_HASH)) << ","

<< csv_escape(mllm::cpu::CURRENT_ARCH_STRING) << ","

<< csv_escape(model_name.get()) << ","

<< pp << ","

<< tg << ","

<< avg_ttft << ","

<< avg_prefill_speed << ","

<< avg_decode_speed << ","

<< avg_prefill_ms << ","

<< avg_decode_ms_per_tok << ","

<< kv_est_bytes_pp << ","

<< kv_est_bytes_final;

🤖 Prompt for AI Agents

In `@tools/mllm-llm-benchmark/main.cpp` around lines 205 - 219, The CSV writer currently concatenates raw string fields (STRINGIFY(MLLM_GIT_COMMIT_HASH), mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) into the stream which breaks parsing when those values contain commas or quotes; update the code that builds the stringstream (around schema_version.get(), STRINGIFY(MLLM_GIT_COMMIT_HASH), mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) to escape and quote these string fields before inserting them (e.g., wrap in double quotes and double any internal double-quote characters), leaving numeric fields as-is so the produced CSV is safe to parse.

huangzhenhua111 · 2026-01-31T05:20:54Z

@coderabbitai full review

coderabbitai · 2026-01-31T05:21:10Z

@huangzhenhua111, I will perform a full review of all the changes in this PR.

✅ Actions performed

Full review triggered.

chenghuaWang · 2026-02-17T03:48:58Z

pls keeps all changes in latest PR: #639

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

huangzhenhua111 force-pushed the fix/llm-benchmark-csv-clean branch from c9b6689 to 97453b1 Compare January 31, 2026 05:12

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

huangzhenhua111 mentioned this pull request Jan 31, 2026

tools/mllm-llm-benchmark: estimate KV cache bytes from model config #623

Closed

chenghuaWang closed this Feb 17, 2026

		mllm::Context::instance().setCpuOpThreads(num_threads.get());
		mllm::setMaximumNumThreads((uint32_t)num_threads.get());

Conversation

huangzhenhua111 commented Jan 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What have been changed

How to test

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

huangzhenhua111 commented Jan 31, 2026

Uh oh!

coderabbitai bot commented Jan 31, 2026

Uh oh!

chenghuaWang commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

huangzhenhua111 commented Jan 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 31, 2026 •

edited

Loading