Skip to content

tools/mllm-llm-benchmark: add CSV output and configurable runs#622

Closed
huangzhenhua111 wants to merge 1 commit intoUbiquitousLearning:mainfrom
huangzhenhua111:fix/llm-benchmark-csv-clean
Closed

tools/mllm-llm-benchmark: add CSV output and configurable runs#622
huangzhenhua111 wants to merge 1 commit intoUbiquitousLearning:mainfrom
huangzhenhua111:fix/llm-benchmark-csv-clean

Conversation

@huangzhenhua111
Copy link

@huangzhenhua111 huangzhenhua111 commented Jan 31, 2026

What have been changed

  • Add CLI flags: --runs, --cooldown_s, --output_csv
  • Validate --runs must be > 0 (avoid huge reserve() / meaningless 0 runs)
  • Error out if --output_csv cannot be opened (no silent skip)
  • Print per-run TTFT/prefill/decode + derived latency
  • Write averaged metrics per (pp,tg) into CSV with header
  • Guard divide-by-zero when speed is 0
  • KV cache estimate fields currently filled with 0 (follow-up to compute from config)

How to test

  • Tested on x86_64 (WSL/Ubuntu), CPU backend
./build/bin/mllm-llm-benchmark \
  -n tiny_llama \
  -m /home/huangzhenhua/models/mllm_tinyllama/tinyllama-fp32.mllm \
  -c /home/huangzhenhua/mllm-runok/examples/llama/config_tiny_llama.json \
  -pp 8 -tg 4 -t 4 -cl 2048 \
  -r 3 -cs 2 \
  -oc /tmp/mllm_bench.csv

cat /tmp/mllm_bench.csv
  • output
    schema_version,git_commit,arch,model_name,pp,tg,ttft_ms,prefill_speed,decode_speed,prefill_ms,decode_ms_per_tok,kv_est_bytes_pp,kv_est_bytes_final
    1,f5c0006ce4f93d378e960648936c852418e69c88,x86_64,tiny_llama,8,4,439.039,19.8621,5.04427,402.778,198.245,0,0

Summary by CodeRabbit

  • New Features
    • Configurable benchmark parameters: run count and cooldown between runs
    • Optional CSV export with header and summarized single-line results
    • Enhanced reporting: per-run and averaged throughput and latency, clearer progress messages
  • Bug Fixes / Reliability
    • Input validation for run count and safer sequencing (state cleared between runs)
    • Rough KV cache memory estimates included in outputs

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

📝 Walkthrough

Walkthrough

Adds CLI flags for runs, cooldown_s, output_csv, schema_version, and kv_dtype_bytes; validates inputs; runs configurable benchmark iterations with cache clearing and optional cooldowns; computes per-run and averaged throughput/latency metrics; records optional CSV output with header and aggregated row including KV estimates and git/schema metadata.

Changes

Cohort / File(s) Summary
Benchmark Tool
tools/mllm-llm-benchmark/main.cpp
Added headers and new CLI flags (--runs, --cooldown_s, --output_csv, --schema_version, --kv_dtype_bytes); added git commit macros; set CPU/thread config after parsing; validate runs; replaced fixed 3-run loop with configurable R; reserve results, clear cache each run, compute per-run throughput→latency, optional cooldown between runs, compute averages, emit CSV header and aggregated CSV line (includes placeholders for KV byte estimates), and close CSV file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐇 I hopped through runs and cleared the slate,
Counting latencies, neat and great,
CSV carrots in a tidy row,
Averaged numbers all aglow,
A benchmarking rabbit — swift and late!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding CSV output functionality and making the number of runs configurable.
Description check ✅ Passed The description provides comprehensive details including changes made, testing information, and example output, though it deviates from the template structure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@tools/mllm-llm-benchmark/main.cpp`:
- Line 45: The CLI option kv_dtype_bytes (added via
mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes")) is never used; update the
KV cache size estimation (replace the hardcoded 0.0 values where KV cache is
computed) to use kv_dtype_bytes to compute bytes-per-element and derive the KV
cache memory estimate, or if you prefer to remove the feature, delete the
kv_dtype_bytes argument and any references to KV cache estimation; specifically,
use the parsed kv_dtype_bytes value to multiply by number of KV elements (or
element count computation already present) to produce the correct KV cache size
instead of 0.0.
- Around line 187-216: The averaged metrics (avg_ttft, avg_prefill_speed,
avg_decode_speed, avg_prefill_ms, avg_decode_ms_per_tok) are only written to
csv_file via the stringstream ss but never printed to the console; update the
end of the aggregation block in main.cpp so the same aggregated line is printed
to stdout (similar to the per-run print at lines ~159-166). After building ss
(which includes schema_version, MLLM_GIT_COMMIT_HASH,
mllm::cpu::CURRENT_ARCH_STRING, model_name, pp, tg, avg_* and kv estimates),
write ss.str() to std::cout (and still to csv_file if csv_file.is_open()), so
users without --output_csv still see the average results.
- Around line 49-50: num_threads currently can be 0 when unset; update the flag
handling to ensure a sensible default or validate before use: either add a
.def(...) when defining num_threads to set a default (e.g., 1 or
std::thread::hardware_concurrency()) or check num_threads.get() before calling
mllm::Context::instance().setCpuOpThreads and mllm::setMaximumNumThreads and
replace 0 with a fallback value; reference the num_threads variable and the
calls mllm::Context::instance().setCpuOpThreads(...) and
mllm::setMaximumNumThreads(...) when making the change.
🧹 Nitpick comments (3)
tools/mllm-llm-benchmark/main.cpp (3)

10-10: Unused include: <algorithm> is not used.

The comment states this is "For std::transform", but std::transform is not called anywhere in this file. Consider removing the unused include.

Suggested fix
-#include <algorithm>  // For std::transform

26-27: Unused macros: STR_HELPER and STR are never used.

These macros duplicate the functionality of STRINGIFY_INTERNAL and STRINGIFY (defined at lines 17-18), which are already used throughout the file. Consider removing these unused macros.

Suggested fix
-#define STR_HELPER(x) `#x`
-#define STR(x) STR_HELPER(x)
-

168-172: Consider hoisting cooldown_s.get() outside the loop.

The cooldown_s.get() call is repeated on every iteration, but the value never changes. Move it outside the loop for cleaner code.

Based on learnings: "Avoid unnecessary object creation in loops or hot paths."

Suggested fix
+    int cool = cooldown_s.get();
     for (int i = 0; i < R; ++i) {
       mllm::print("  Run", i + 1, "of", R, "...");
       // ... existing code ...
 
-      int cool = cooldown_s.get();
       if (i + 1 < R && cool > 0) {

auto& cooldown_s = mllm::Argparse::add<int32_t>("-cs|--cooldown_s").help("Cooldown time between runs in seconds").def(5);
auto& output_csv = mllm::Argparse::add<std::string>("-oc|--output_csv").help("Output results to a CSV file").def("");
auto& schema_version = mllm::Argparse::add<int32_t>("-sv|--schema_version").help("Schema version for output format").def(1);
auto& kv_dtype_bytes = mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes").help("KV cache data type bytes (1: int8, 2: fp16, 4: fp32)").def(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

kv_dtype_bytes is defined but never used.

This CLI argument is collected but never referenced in the code. The KV cache estimates at lines 195-196 are hardcoded to 0.0 instead of being computed using this value.

Either implement the KV cache estimation using kv_dtype_bytes, or remove the unused argument to avoid confusing users.

🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` at line 45, The CLI option kv_dtype_bytes
(added via mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes")) is never used;
update the KV cache size estimation (replace the hardcoded 0.0 values where KV
cache is computed) to use kv_dtype_bytes to compute bytes-per-element and derive
the KV cache memory estimate, or if you prefer to remove the feature, delete the
kv_dtype_bytes argument and any references to KV cache estimation; specifically,
use the parsed kv_dtype_bytes value to multiply by number of KV elements (or
element count computation already present) to produce the correct KV cache size
instead of 0.0.

Comment on lines +49 to +50
mllm::Context::instance().setCpuOpThreads(num_threads.get());
mllm::setMaximumNumThreads((uint32_t)num_threads.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the Argparse implementation
find . -name "Argparse.hpp" -o -name "Argparse.h" | head -5

Repository: UbiquitousLearning/mllm

Length of output: 92


🏁 Script executed:

# Look at the main.cpp file around line 35 and 49-50
sed -n '30,55p' tools/mllm-llm-benchmark/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 1757


🏁 Script executed:

# Search for how num_threads is used and defined
rg -n "num_threads" tools/mllm-llm-benchmark/ -A2 -B2

Repository: UbiquitousLearning/mllm

Length of output: 1148


🏁 Script executed:

# Check if there's any validation or default initialization
rg -n "\.def\(" tools/mllm-llm-benchmark/main.cpp

Repository: UbiquitousLearning/mllm

Length of output: 697


🏁 Script executed:

# Now examine the Argparse.hpp implementation to understand get() behavior
cat -n mllm/utils/Argparse.hpp | head -150

Repository: UbiquitousLearning/mllm

Length of output: 4941


🏁 Script executed:

# Look for the get() method specifically
rg -n "get\(\)" mllm/utils/Argparse.hpp -A5 -B5

Repository: UbiquitousLearning/mllm

Length of output: 1536


🏁 Script executed:

# Check if there's any constructor or initialization for int32_t type
rg -n "int32_t|get\(" mllm/utils/Argparse.hpp | head -40

Repository: UbiquitousLearning/mllm

Length of output: 324


Add a default value or validation for num_threads.

When --threads is not provided, num_threads.get() returns 0 (the default-initialized value for int32_t). Setting thread counts to 0 is likely unintended behavior. Unlike the newer arguments (runs, cooldown_s, etc.), num_threads lacks a .def() call. Either add .def() with a reasonable default or validate the value before use at lines 49-50.

🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` around lines 49 - 50, num_threads
currently can be 0 when unset; update the flag handling to ensure a sensible
default or validate before use: either add a .def(...) when defining num_threads
to set a default (e.g., 1 or std::thread::hardware_concurrency()) or check
num_threads.get() before calling mllm::Context::instance().setCpuOpThreads and
mllm::setMaximumNumThreads and replace 0 with a fallback value; reference the
num_threads variable and the calls
mllm::Context::instance().setCpuOpThreads(...) and
mllm::setMaximumNumThreads(...) when making the change.

Comment on lines +187 to +216
avg_ttft /= denom;
avg_prefill_speed /= denom;
avg_decode_speed /= denom;

float avg_prefill_ms = (avg_prefill_speed > 0.0f) ? (pp / avg_prefill_speed) * 1000.0f : 0.0f;
float avg_decode_ms_per_tok = (avg_decode_speed > 0.0f) ? (1.0f / avg_decode_speed) * 1000.0f : 0.0f;

// Rough KV cache estimate (bytes)
double kv_est_bytes_pp = 0.0;
double kv_est_bytes_final = 0.0;

// Prepare one line output (avg)
std::stringstream ss;
ss << schema_version.get() << ","
<< STRINGIFY(MLLM_GIT_COMMIT_HASH) << ","
<< mllm::cpu::CURRENT_ARCH_STRING << ","
<< model_name.get() << ","
<< pp << ","
<< tg << ","
<< avg_ttft << ","
<< avg_prefill_speed << ","
<< avg_decode_speed << ","
<< avg_prefill_ms << ","
<< avg_decode_ms_per_tok << ","
<< kv_est_bytes_pp << ","
<< kv_est_bytes_final;

if (csv_file.is_open()) {
csv_file << ss.str() << std::endl;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Average results are calculated but never printed to console.

The code computes averages (avg_ttft, avg_prefill_speed, avg_decode_speed, avg_prefill_ms, avg_decode_ms_per_tok) but only writes them to the CSV file. Users who don't specify --output_csv will never see the aggregated benchmark results.

Consider printing the averages to console, similar to how per-run results are printed at lines 159-166.

Suggested fix
     avg_ttft /= denom;
     avg_prefill_speed /= denom;
     avg_decode_speed /= denom;

     float avg_prefill_ms = (avg_prefill_speed > 0.0f) ? (pp / avg_prefill_speed) * 1000.0f : 0.0f;
     float avg_decode_ms_per_tok = (avg_decode_speed > 0.0f) ? (1.0f / avg_decode_speed) * 1000.0f : 0.0f;

+    mllm::print("  ----------------------------------------");
+    mllm::print("  Average Results (", R, "runs ):");
+    mllm::print("    Avg TTFT         :", avg_ttft, "ms");
+    mllm::print("    Avg Prefill Speed:", avg_prefill_speed, "tokens/s");
+    mllm::print("    Avg Decode Speed :", avg_decode_speed, "tokens/s");
+    mllm::print("    Avg Prefill Latency   :", avg_prefill_ms, "ms");
+    mllm::print("    Avg Decode Latency    :", avg_decode_ms_per_tok, "ms/token");
+
     // Rough KV cache estimate (bytes)
🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` around lines 187 - 216, The averaged
metrics (avg_ttft, avg_prefill_speed, avg_decode_speed, avg_prefill_ms,
avg_decode_ms_per_tok) are only written to csv_file via the stringstream ss but
never printed to the console; update the end of the aggregation block in
main.cpp so the same aggregated line is printed to stdout (similar to the
per-run print at lines ~159-166). After building ss (which includes
schema_version, MLLM_GIT_COMMIT_HASH, mllm::cpu::CURRENT_ARCH_STRING,
model_name, pp, tg, avg_* and kv estimates), write ss.str() to std::cout (and
still to csv_file if csv_file.is_open()), so users without --output_csv still
see the average results.

…comments

* Add CSV output and configurable runs/cooldown
* Validate --runs (>0) and error out on CSV open failure
* Guard divide-by-zero and compute average metrics
* Restore benchmark loop comments for better readability
@huangzhenhua111 huangzhenhua111 force-pushed the fix/llm-benchmark-csv-clean branch from c9b6689 to 97453b1 Compare January 31, 2026 05:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tools/mllm-llm-benchmark/main.cpp`:
- Around line 205-219: The CSV writer currently concatenates raw string fields
(STRINGIFY(MLLM_GIT_COMMIT_HASH), mllm::cpu::CURRENT_ARCH_STRING,
model_name.get()) into the stream which breaks parsing when those values contain
commas or quotes; update the code that builds the stringstream (around
schema_version.get(), STRINGIFY(MLLM_GIT_COMMIT_HASH),
mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) to escape and quote these
string fields before inserting them (e.g., wrap in double quotes and double any
internal double-quote characters), leaving numeric fields as-is so the produced
CSV is safe to parse.

Comment on lines +205 to +219
// Prepare one line output (avg)
std::stringstream ss;
ss << schema_version.get() << ","
<< STRINGIFY(MLLM_GIT_COMMIT_HASH) << ","
<< mllm::cpu::CURRENT_ARCH_STRING << ","
<< model_name.get() << ","
<< pp << ","
<< tg << ","
<< avg_ttft << ","
<< avg_prefill_speed << ","
<< avg_decode_speed << ","
<< avg_prefill_ms << ","
<< avg_decode_ms_per_tok << ","
<< kv_est_bytes_pp << ","
<< kv_est_bytes_final;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Escape string fields in CSV output.

model_name, arch, and commit can contain commas or quotes, which will break CSV parsing. Quote/escape string fields before writing.

Proposed fix
-    std::stringstream ss;
-    ss << schema_version.get() << "," 
-       << STRINGIFY(MLLM_GIT_COMMIT_HASH) << "," 
-       << mllm::cpu::CURRENT_ARCH_STRING << ","
-       << model_name.get() << ","
+    auto csv_escape = [](const std::string& s) {
+      std::string out;
+      out.reserve(s.size() + 2);
+      out.push_back('"');
+      for (char c : s) {
+        if (c == '"') out += "\"\"";
+        else out.push_back(c);
+      }
+      out.push_back('"');
+      return out;
+    };
+
+    std::stringstream ss;
+    ss << schema_version.get() << ","
+       << csv_escape(STRINGIFY(MLLM_GIT_COMMIT_HASH)) << ","
+       << csv_escape(mllm::cpu::CURRENT_ARCH_STRING) << ","
+       << csv_escape(model_name.get()) << ","
        << pp << ","
        << tg << ","
        << avg_ttft << ","
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Prepare one line output (avg)
std::stringstream ss;
ss << schema_version.get() << ","
<< STRINGIFY(MLLM_GIT_COMMIT_HASH) << ","
<< mllm::cpu::CURRENT_ARCH_STRING << ","
<< model_name.get() << ","
<< pp << ","
<< tg << ","
<< avg_ttft << ","
<< avg_prefill_speed << ","
<< avg_decode_speed << ","
<< avg_prefill_ms << ","
<< avg_decode_ms_per_tok << ","
<< kv_est_bytes_pp << ","
<< kv_est_bytes_final;
// Prepare one line output (avg)
auto csv_escape = [](const std::string& s) {
std::string out;
out.reserve(s.size() + 2);
out.push_back('"');
for (char c : s) {
if (c == '"') out += "\"\"";
else out.push_back(c);
}
out.push_back('"');
return out;
};
std::stringstream ss;
ss << schema_version.get() << ","
<< csv_escape(STRINGIFY(MLLM_GIT_COMMIT_HASH)) << ","
<< csv_escape(mllm::cpu::CURRENT_ARCH_STRING) << ","
<< csv_escape(model_name.get()) << ","
<< pp << ","
<< tg << ","
<< avg_ttft << ","
<< avg_prefill_speed << ","
<< avg_decode_speed << ","
<< avg_prefill_ms << ","
<< avg_decode_ms_per_tok << ","
<< kv_est_bytes_pp << ","
<< kv_est_bytes_final;
🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` around lines 205 - 219, The CSV writer
currently concatenates raw string fields (STRINGIFY(MLLM_GIT_COMMIT_HASH),
mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) into the stream which breaks
parsing when those values contain commas or quotes; update the code that builds
the stringstream (around schema_version.get(), STRINGIFY(MLLM_GIT_COMMIT_HASH),
mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) to escape and quote these
string fields before inserting them (e.g., wrap in double quotes and double any
internal double-quote characters), leaving numeric fields as-is so the produced
CSV is safe to parse.

@huangzhenhua111
Copy link
Author

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

@huangzhenhua111, I will perform a full review of all the changes in this PR.

✅ Actions performed

Full review triggered.

@chenghuaWang
Copy link
Collaborator

pls keeps all changes in latest PR: #639

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments