tools/mllm-llm-benchmark: add CSV output and configurable runs#622
tools/mllm-llm-benchmark: add CSV output and configurable runs#622huangzhenhua111 wants to merge 1 commit intoUbiquitousLearning:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds CLI flags for runs, cooldown_s, output_csv, schema_version, and kv_dtype_bytes; validates inputs; runs configurable benchmark iterations with cache clearing and optional cooldowns; computes per-run and averaged throughput/latency metrics; records optional CSV output with header and aggregated row including KV estimates and git/schema metadata. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@tools/mllm-llm-benchmark/main.cpp`:
- Line 45: The CLI option kv_dtype_bytes (added via
mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes")) is never used; update the
KV cache size estimation (replace the hardcoded 0.0 values where KV cache is
computed) to use kv_dtype_bytes to compute bytes-per-element and derive the KV
cache memory estimate, or if you prefer to remove the feature, delete the
kv_dtype_bytes argument and any references to KV cache estimation; specifically,
use the parsed kv_dtype_bytes value to multiply by number of KV elements (or
element count computation already present) to produce the correct KV cache size
instead of 0.0.
- Around line 187-216: The averaged metrics (avg_ttft, avg_prefill_speed,
avg_decode_speed, avg_prefill_ms, avg_decode_ms_per_tok) are only written to
csv_file via the stringstream ss but never printed to the console; update the
end of the aggregation block in main.cpp so the same aggregated line is printed
to stdout (similar to the per-run print at lines ~159-166). After building ss
(which includes schema_version, MLLM_GIT_COMMIT_HASH,
mllm::cpu::CURRENT_ARCH_STRING, model_name, pp, tg, avg_* and kv estimates),
write ss.str() to std::cout (and still to csv_file if csv_file.is_open()), so
users without --output_csv still see the average results.
- Around line 49-50: num_threads currently can be 0 when unset; update the flag
handling to ensure a sensible default or validate before use: either add a
.def(...) when defining num_threads to set a default (e.g., 1 or
std::thread::hardware_concurrency()) or check num_threads.get() before calling
mllm::Context::instance().setCpuOpThreads and mllm::setMaximumNumThreads and
replace 0 with a fallback value; reference the num_threads variable and the
calls mllm::Context::instance().setCpuOpThreads(...) and
mllm::setMaximumNumThreads(...) when making the change.
🧹 Nitpick comments (3)
tools/mllm-llm-benchmark/main.cpp (3)
10-10: Unused include:<algorithm>is not used.The comment states this is "For std::transform", but
std::transformis not called anywhere in this file. Consider removing the unused include.Suggested fix
-#include <algorithm> // For std::transform
26-27: Unused macros:STR_HELPERandSTRare never used.These macros duplicate the functionality of
STRINGIFY_INTERNALandSTRINGIFY(defined at lines 17-18), which are already used throughout the file. Consider removing these unused macros.Suggested fix
-#define STR_HELPER(x) `#x` -#define STR(x) STR_HELPER(x) -
168-172: Consider hoistingcooldown_s.get()outside the loop.The
cooldown_s.get()call is repeated on every iteration, but the value never changes. Move it outside the loop for cleaner code.Based on learnings: "Avoid unnecessary object creation in loops or hot paths."
Suggested fix
+ int cool = cooldown_s.get(); for (int i = 0; i < R; ++i) { mllm::print(" Run", i + 1, "of", R, "..."); // ... existing code ... - int cool = cooldown_s.get(); if (i + 1 < R && cool > 0) {
| auto& cooldown_s = mllm::Argparse::add<int32_t>("-cs|--cooldown_s").help("Cooldown time between runs in seconds").def(5); | ||
| auto& output_csv = mllm::Argparse::add<std::string>("-oc|--output_csv").help("Output results to a CSV file").def(""); | ||
| auto& schema_version = mllm::Argparse::add<int32_t>("-sv|--schema_version").help("Schema version for output format").def(1); | ||
| auto& kv_dtype_bytes = mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes").help("KV cache data type bytes (1: int8, 2: fp16, 4: fp32)").def(4); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
kv_dtype_bytes is defined but never used.
This CLI argument is collected but never referenced in the code. The KV cache estimates at lines 195-196 are hardcoded to 0.0 instead of being computed using this value.
Either implement the KV cache estimation using kv_dtype_bytes, or remove the unused argument to avoid confusing users.
🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` at line 45, The CLI option kv_dtype_bytes
(added via mllm::Argparse::add<int32_t>("-kv|--kv_dtype_bytes")) is never used;
update the KV cache size estimation (replace the hardcoded 0.0 values where KV
cache is computed) to use kv_dtype_bytes to compute bytes-per-element and derive
the KV cache memory estimate, or if you prefer to remove the feature, delete the
kv_dtype_bytes argument and any references to KV cache estimation; specifically,
use the parsed kv_dtype_bytes value to multiply by number of KV elements (or
element count computation already present) to produce the correct KV cache size
instead of 0.0.
| mllm::Context::instance().setCpuOpThreads(num_threads.get()); | ||
| mllm::setMaximumNumThreads((uint32_t)num_threads.get()); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the Argparse implementation
find . -name "Argparse.hpp" -o -name "Argparse.h" | head -5Repository: UbiquitousLearning/mllm
Length of output: 92
🏁 Script executed:
# Look at the main.cpp file around line 35 and 49-50
sed -n '30,55p' tools/mllm-llm-benchmark/main.cppRepository: UbiquitousLearning/mllm
Length of output: 1757
🏁 Script executed:
# Search for how num_threads is used and defined
rg -n "num_threads" tools/mllm-llm-benchmark/ -A2 -B2Repository: UbiquitousLearning/mllm
Length of output: 1148
🏁 Script executed:
# Check if there's any validation or default initialization
rg -n "\.def\(" tools/mllm-llm-benchmark/main.cppRepository: UbiquitousLearning/mllm
Length of output: 697
🏁 Script executed:
# Now examine the Argparse.hpp implementation to understand get() behavior
cat -n mllm/utils/Argparse.hpp | head -150Repository: UbiquitousLearning/mllm
Length of output: 4941
🏁 Script executed:
# Look for the get() method specifically
rg -n "get\(\)" mllm/utils/Argparse.hpp -A5 -B5Repository: UbiquitousLearning/mllm
Length of output: 1536
🏁 Script executed:
# Check if there's any constructor or initialization for int32_t type
rg -n "int32_t|get\(" mllm/utils/Argparse.hpp | head -40Repository: UbiquitousLearning/mllm
Length of output: 324
Add a default value or validation for num_threads.
When --threads is not provided, num_threads.get() returns 0 (the default-initialized value for int32_t). Setting thread counts to 0 is likely unintended behavior. Unlike the newer arguments (runs, cooldown_s, etc.), num_threads lacks a .def() call. Either add .def() with a reasonable default or validate the value before use at lines 49-50.
🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` around lines 49 - 50, num_threads
currently can be 0 when unset; update the flag handling to ensure a sensible
default or validate before use: either add a .def(...) when defining num_threads
to set a default (e.g., 1 or std::thread::hardware_concurrency()) or check
num_threads.get() before calling mllm::Context::instance().setCpuOpThreads and
mllm::setMaximumNumThreads and replace 0 with a fallback value; reference the
num_threads variable and the calls
mllm::Context::instance().setCpuOpThreads(...) and
mllm::setMaximumNumThreads(...) when making the change.
| avg_ttft /= denom; | ||
| avg_prefill_speed /= denom; | ||
| avg_decode_speed /= denom; | ||
|
|
||
| float avg_prefill_ms = (avg_prefill_speed > 0.0f) ? (pp / avg_prefill_speed) * 1000.0f : 0.0f; | ||
| float avg_decode_ms_per_tok = (avg_decode_speed > 0.0f) ? (1.0f / avg_decode_speed) * 1000.0f : 0.0f; | ||
|
|
||
| // Rough KV cache estimate (bytes) | ||
| double kv_est_bytes_pp = 0.0; | ||
| double kv_est_bytes_final = 0.0; | ||
|
|
||
| // Prepare one line output (avg) | ||
| std::stringstream ss; | ||
| ss << schema_version.get() << "," | ||
| << STRINGIFY(MLLM_GIT_COMMIT_HASH) << "," | ||
| << mllm::cpu::CURRENT_ARCH_STRING << "," | ||
| << model_name.get() << "," | ||
| << pp << "," | ||
| << tg << "," | ||
| << avg_ttft << "," | ||
| << avg_prefill_speed << "," | ||
| << avg_decode_speed << "," | ||
| << avg_prefill_ms << "," | ||
| << avg_decode_ms_per_tok << "," | ||
| << kv_est_bytes_pp << "," | ||
| << kv_est_bytes_final; | ||
|
|
||
| if (csv_file.is_open()) { | ||
| csv_file << ss.str() << std::endl; | ||
| } |
There was a problem hiding this comment.
Average results are calculated but never printed to console.
The code computes averages (avg_ttft, avg_prefill_speed, avg_decode_speed, avg_prefill_ms, avg_decode_ms_per_tok) but only writes them to the CSV file. Users who don't specify --output_csv will never see the aggregated benchmark results.
Consider printing the averages to console, similar to how per-run results are printed at lines 159-166.
Suggested fix
avg_ttft /= denom;
avg_prefill_speed /= denom;
avg_decode_speed /= denom;
float avg_prefill_ms = (avg_prefill_speed > 0.0f) ? (pp / avg_prefill_speed) * 1000.0f : 0.0f;
float avg_decode_ms_per_tok = (avg_decode_speed > 0.0f) ? (1.0f / avg_decode_speed) * 1000.0f : 0.0f;
+ mllm::print(" ----------------------------------------");
+ mllm::print(" Average Results (", R, "runs ):");
+ mllm::print(" Avg TTFT :", avg_ttft, "ms");
+ mllm::print(" Avg Prefill Speed:", avg_prefill_speed, "tokens/s");
+ mllm::print(" Avg Decode Speed :", avg_decode_speed, "tokens/s");
+ mllm::print(" Avg Prefill Latency :", avg_prefill_ms, "ms");
+ mllm::print(" Avg Decode Latency :", avg_decode_ms_per_tok, "ms/token");
+
// Rough KV cache estimate (bytes)🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` around lines 187 - 216, The averaged
metrics (avg_ttft, avg_prefill_speed, avg_decode_speed, avg_prefill_ms,
avg_decode_ms_per_tok) are only written to csv_file via the stringstream ss but
never printed to the console; update the end of the aggregation block in
main.cpp so the same aggregated line is printed to stdout (similar to the
per-run print at lines ~159-166). After building ss (which includes
schema_version, MLLM_GIT_COMMIT_HASH, mllm::cpu::CURRENT_ARCH_STRING,
model_name, pp, tg, avg_* and kv estimates), write ss.str() to std::cout (and
still to csv_file if csv_file.is_open()), so users without --output_csv still
see the average results.
…comments * Add CSV output and configurable runs/cooldown * Validate --runs (>0) and error out on CSV open failure * Guard divide-by-zero and compute average metrics * Restore benchmark loop comments for better readability
c9b6689 to
97453b1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tools/mllm-llm-benchmark/main.cpp`:
- Around line 205-219: The CSV writer currently concatenates raw string fields
(STRINGIFY(MLLM_GIT_COMMIT_HASH), mllm::cpu::CURRENT_ARCH_STRING,
model_name.get()) into the stream which breaks parsing when those values contain
commas or quotes; update the code that builds the stringstream (around
schema_version.get(), STRINGIFY(MLLM_GIT_COMMIT_HASH),
mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) to escape and quote these
string fields before inserting them (e.g., wrap in double quotes and double any
internal double-quote characters), leaving numeric fields as-is so the produced
CSV is safe to parse.
| // Prepare one line output (avg) | ||
| std::stringstream ss; | ||
| ss << schema_version.get() << "," | ||
| << STRINGIFY(MLLM_GIT_COMMIT_HASH) << "," | ||
| << mllm::cpu::CURRENT_ARCH_STRING << "," | ||
| << model_name.get() << "," | ||
| << pp << "," | ||
| << tg << "," | ||
| << avg_ttft << "," | ||
| << avg_prefill_speed << "," | ||
| << avg_decode_speed << "," | ||
| << avg_prefill_ms << "," | ||
| << avg_decode_ms_per_tok << "," | ||
| << kv_est_bytes_pp << "," | ||
| << kv_est_bytes_final; |
There was a problem hiding this comment.
Escape string fields in CSV output.
model_name, arch, and commit can contain commas or quotes, which will break CSV parsing. Quote/escape string fields before writing.
Proposed fix
- std::stringstream ss;
- ss << schema_version.get() << ","
- << STRINGIFY(MLLM_GIT_COMMIT_HASH) << ","
- << mllm::cpu::CURRENT_ARCH_STRING << ","
- << model_name.get() << ","
+ auto csv_escape = [](const std::string& s) {
+ std::string out;
+ out.reserve(s.size() + 2);
+ out.push_back('"');
+ for (char c : s) {
+ if (c == '"') out += "\"\"";
+ else out.push_back(c);
+ }
+ out.push_back('"');
+ return out;
+ };
+
+ std::stringstream ss;
+ ss << schema_version.get() << ","
+ << csv_escape(STRINGIFY(MLLM_GIT_COMMIT_HASH)) << ","
+ << csv_escape(mllm::cpu::CURRENT_ARCH_STRING) << ","
+ << csv_escape(model_name.get()) << ","
<< pp << ","
<< tg << ","
<< avg_ttft << ","📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Prepare one line output (avg) | |
| std::stringstream ss; | |
| ss << schema_version.get() << "," | |
| << STRINGIFY(MLLM_GIT_COMMIT_HASH) << "," | |
| << mllm::cpu::CURRENT_ARCH_STRING << "," | |
| << model_name.get() << "," | |
| << pp << "," | |
| << tg << "," | |
| << avg_ttft << "," | |
| << avg_prefill_speed << "," | |
| << avg_decode_speed << "," | |
| << avg_prefill_ms << "," | |
| << avg_decode_ms_per_tok << "," | |
| << kv_est_bytes_pp << "," | |
| << kv_est_bytes_final; | |
| // Prepare one line output (avg) | |
| auto csv_escape = [](const std::string& s) { | |
| std::string out; | |
| out.reserve(s.size() + 2); | |
| out.push_back('"'); | |
| for (char c : s) { | |
| if (c == '"') out += "\"\""; | |
| else out.push_back(c); | |
| } | |
| out.push_back('"'); | |
| return out; | |
| }; | |
| std::stringstream ss; | |
| ss << schema_version.get() << "," | |
| << csv_escape(STRINGIFY(MLLM_GIT_COMMIT_HASH)) << "," | |
| << csv_escape(mllm::cpu::CURRENT_ARCH_STRING) << "," | |
| << csv_escape(model_name.get()) << "," | |
| << pp << "," | |
| << tg << "," | |
| << avg_ttft << "," | |
| << avg_prefill_speed << "," | |
| << avg_decode_speed << "," | |
| << avg_prefill_ms << "," | |
| << avg_decode_ms_per_tok << "," | |
| << kv_est_bytes_pp << "," | |
| << kv_est_bytes_final; |
🤖 Prompt for AI Agents
In `@tools/mllm-llm-benchmark/main.cpp` around lines 205 - 219, The CSV writer
currently concatenates raw string fields (STRINGIFY(MLLM_GIT_COMMIT_HASH),
mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) into the stream which breaks
parsing when those values contain commas or quotes; update the code that builds
the stringstream (around schema_version.get(), STRINGIFY(MLLM_GIT_COMMIT_HASH),
mllm::cpu::CURRENT_ARCH_STRING, model_name.get()) to escape and quote these
string fields before inserting them (e.g., wrap in double quotes and double any
internal double-quote characters), leaving numeric fields as-is so the produced
CSV is safe to parse.
|
@coderabbitai full review |
|
✅ Actions performedFull review triggered. |
|
pls keeps all changes in latest PR: #639 |
What have been changed
--runs,--cooldown_s,--output_csv--runsmust be > 0 (avoid hugereserve()/ meaningless 0 runs)--output_csvcannot be opened (no silent skip)How to test
schema_version,git_commit,arch,model_name,pp,tg,ttft_ms,prefill_speed,decode_speed,prefill_ms,decode_ms_per_tok,kv_est_bytes_pp,kv_est_bytes_final
1,f5c0006ce4f93d378e960648936c852418e69c88,x86_64,tiny_llama,8,4,439.039,19.8621,5.04427,402.778,198.245,0,0
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.