This document is the source of truth for the architecture of the C++
sibling of merlion-node-exporter-rs. It captures every decision
that Codex (or any implementer) should not have to re-litigate.
If you find yourself making a non-trivial choice that this document doesn't cover, stop and update this document first so the choice shows up in the design history rather than in code archaeology.
- Wire compatibility with upstream
node_exporter. A Prometheus scrape config / dashboard / alerting rule that works against upstreamnode_exportermust work against this binary, byte-for-byte, for the collectors we ship. Metric names, label sets, type lines, and HELP text all follow upstream conventions. - Wire compatibility with
merlion-node-exporter-rs. The two Merlion implementations are interchangeable. A scrape diff between them should reduce to numeric value differences and label-ordering noise — never metric-name or shape differences. - Linux MVP scope. The 15 collectors listed in
§7 Implementation Plan — same set as
-rs. - Modern C++ idioms. C++23,
std::expected,std::format,std::string_viewat boundaries, RAII everywhere,constexprwhere it pays for itself. - Headers-only-where-practical dependencies. Keep
FetchContentpulls minimal so the project builds cleanly on Linux and macOS without package-manager assistance beyondbrew install llvm cmake.
- BSD / Darwin / Solaris / AIX collectors. macOS is supported as a
build target so contributors can compile on their laptops, but
the only collector that actually returns data on macOS is
uname. Everything else returns an error and degrades tonode_scrape_collector_success{collector="..."} 0, matching-rs's behaviour. - Histograms, summaries, or OpenMetrics protobuf negotiation in the MVP. The exposition encoder targets the text 0.0.4 subset that node_exporter actually emits today.
- A Prometheus client-library dependency. The Metric model is hand-rolled — see §3.2 for the rationale.
| Concern | Choice | Notes |
|---|---|---|
| Language | C++23 | rust-version parallel: cmake_minimum_required(VERSION 3.28) |
| Compiler | Clang ≥ 18 (Homebrew LLVM on macOS) | Apple Clang is not supported — std::expected / std::format lag in libc++ |
| Build | CMake 3.28+ | FetchContent for deps, ctest for tests |
| HTTP | cpp-httplib v0.18+ |
Blocking, header-only, perfect for the per-scrape model |
| CLI | CLI11 v2.4+ |
Header-only |
| Logging | spdlog v1.14+ |
Mirrors the tracing setup in -rs |
| Tests | Catch2 v3 |
FetchContent'd |
| Format | clang-format |
Style pinned in .clang-format (LLVM base + 4-space indent + 100-col line) |
| CI | GitHub Actions | Matrix: ubuntu-24.04 (clang-18, gcc-14) + macos-14 (brew llvm) |
cpp-httplibover Boost.Beast / Crow / Drogon. Per-scrape blocking handlers are the right model for a node exporter — there is nothing to await.cpp-httplibis the smallest dependency that gives us a single-header blocking HTTP server with thread-pool handling. Boost is enormous; Crow / Drogon are async frameworks we don't need.- Hand-rolled Metric model. Identical reasoning to
-rs's decision to skipprometheus-client: the typed pre-registration pattern popular client libraries optimise for is more verbose than helpful when every scrape re-reads/procfresh. CLI11over Boost.Program_options /argparse. Header-only, no Boost, declarative builder API maps cleanly ontoclapin-rsso the two CLIs stay in sync.- Homebrew LLVM clang, not Apple Clang.
std::expectedandstd::formatare still partial in Apple's libc++. This matches the project-wide toolchain decision inmerlion-tsdb-cpp.
The C++ tree mirrors -rs's module layout one-to-one. A reviewer who
knows one project should be able to find the equivalent file in the
other in five seconds.
merlion-node-exporter-cpp/
├── include/merlion_node_exporter/
│ ├── metric.hpp # Metric / Sample / MetricType
│ ├── encoding.hpp # Prometheus text-format encoder
│ ├── registry.hpp # Collector interface + Registry
│ ├── config.hpp # path.{procfs,sysfs,rootfs}
│ ├── server.hpp # cpp-httplib /metrics server
│ ├── cli.hpp # CLI11 argument struct
│ └── version.hpp # generated by CMake from project version
├── src/
│ ├── encoding.cpp
│ ├── registry.cpp
│ ├── config.cpp
│ ├── server.cpp
│ ├── cli.cpp
│ ├── main.cpp
│ └── collectors/
│ ├── loadavg.cpp
│ ├── meminfo.cpp
│ └── uname.cpp
├── tests/
│ ├── encoding_test.cpp
│ ├── loadavg_test.cpp
│ ├── meminfo_test.cpp
│ └── uname_test.cpp
├── cmake/
│ └── ToolchainLLVM.cmake # optional: forces brew LLVM on macOS
├── docs/
│ └── DESIGN.md # ← you are here
├── .clang-format
├── .github/workflows/ci.yml
├── CMakeLists.txt
├── README.md
├── LICENSE
└── NOTICE
The -rs equivalents are:
| C++ file | Rust file |
|---|---|
metric.hpp |
src/metric.rs |
encoding.{hpp,cpp} |
src/encoding.rs |
registry.{hpp,cpp} |
src/registry.rs |
config.{hpp,cpp} |
src/config.rs |
server.{hpp,cpp} |
src/server.rs |
cli.{hpp,cpp} |
src/cli.rs |
collectors/<X>.cpp |
src/collectors/<X>.rs |
main.cpp |
src/main.rs |
Mirrors -rs exactly. Public surface:
namespace merlion::node_exporter {
enum class MetricType { Counter, Gauge, Untyped };
struct Sample {
// Ordered key/value pairs; order is preserved as supplied by the
// collector so output is deterministic.
std::vector<std::pair<std::string, std::string>> labels;
double value = 0.0;
};
struct Metric {
std::string name;
std::string help;
MetricType mtype = MetricType::Untyped;
std::vector<Sample> samples;
};
} // namespace merlion::node_exporterConstruction is plain aggregate / brace-init; no builder pattern. The
hand-rolled fluent style in -rs is unnecessary in C++ where
designated initialisers exist.
The encoder writes Prometheus text format 0.0.4 byte-for-byte
compatible with -rs's src/encoding.rs. Public surface:
namespace merlion::node_exporter::encoding {
inline constexpr std::string_view content_type =
"text/plain; version=0.0.4; charset=utf-8";
std::string encode(std::span<const Metric> metrics);
} // namespace merlion::node_exporter::encodingRules (must match -rs):
- Skip metric families with no samples.
- Emit
# HELP <name> <escaped help>\nthen# TYPE <name> <type>\nper family. - One sample per line:
<name>{<labels>} <value>\n. - Label-value escaping:
\→\\,"→\",\n→\n(literal backslash-n). - Integer-valued doubles with
|v| < 1e15print without a decimal point; everything else usesstd::format("{}", v). NaN,+Inf,-Infprinted literally.
A round-trip test fixture in tests/encoding_test.cpp asserts that the
canonical output exactly equals the byte string produced by -rs's
encoder for the same input. This fixture is the contract that keeps
the two implementations interchangeable.
namespace merlion::node_exporter {
struct Config {
std::filesystem::path procfs = "/proc";
std::filesystem::path sysfs = "/sys";
std::filesystem::path rootfs = "/";
// /proc/foo and foo both resolve to <procfs>/foo.
std::filesystem::path proc_path(std::string_view rel) const;
std::filesystem::path sys_path(std::string_view rel) const;
};
class Collector {
public:
virtual ~Collector() = default;
virtual std::string_view name() const noexcept = 0;
virtual std::expected<std::vector<Metric>, std::string>
collect(const Config&) const = 0;
};
} // namespace merlion::node_exporterstd::expected(C++23) is the canonical mechanism — no exceptions across the collector boundary. Exceptions inside a collector implementation are caught by the registry and converted tostd::unexpected("…").name()returns a string literal ("loadavg","meminfo", …). Stable identifier used for--no-collector.<name>flags and thecollectorlabel on scrape-status metrics.- Collectors are stateless and reused across scrapes.
class Registry {
public:
void register_collector(std::unique_ptr<Collector>);
std::vector<std::string_view> enabled_names() const;
// Runs every collector, appends per-collector
// node_scrape_collector_success and
// node_scrape_collector_duration_seconds, returns the flat metric list.
std::vector<Metric> gather(const Config&) const;
};The two synthesised metric families (node_scrape_collector_success,
node_scrape_collector_duration_seconds) match -rs byte-for-byte —
same names, same labels, same HELP text. See
src/registry.rs::Registry::gather for the reference behaviour.
cpp-httplib blocking server bound to --web.listen-address. Single
GET <telemetry_path> route. Steps per request:
- Run
registry.gather(config). encoding::encode(metrics)into astd::string.- Respond with
Content-Type: text/plain; version=0.0.4; charset=utf-8.
Errors are logged via spdlog and produce a 500 with body
"internal error\n" — same wording as -rs.
Graceful shutdown: SIGINT / SIGTERM triggers server.stop().
Flag-for-flag parity with -rs (and therefore upstream node_exporter).
Use CLI11:
--web.listen-address <ADDR> default :9100 (env MNE_LISTEN_ADDRESS)
--web.telemetry-path <PATH> default /metrics (env MNE_TELEMETRY_PATH)
--path.procfs <DIR> default /proc (env MNE_PROCFS)
--path.sysfs <DIR> default /sys (env MNE_SYSFS)
--path.rootfs <DIR> default / (env MNE_ROOTFS)
--no-collector <NAME> repeatable
--collector.only <NAME> repeatable
:9100 resolves to 0.0.0.0:9100 in the server-bind step, matching
-rs's Cli::resolved_listen_address.
spdlog default logger; level controlled by MNE_LOG_LEVEL env
(values: trace|debug|info|warn|error, default info). The Rust side
honours RUST_LOG; the C++ side uses MNE_LOG_LEVEL to avoid
ambiguity. Document this divergence in the binary's --help output.
CMakeLists.txt at the project root drives everything. Highlights:
cmake_minimum_required(VERSION 3.28)andset(CMAKE_CXX_STANDARD 23).- Detects the toolchain. On macOS, if
CMAKE_CXX_COMPILERis not set, emit aFATAL_ERRORpointing the user at the README's Homebrew LLVM instructions — silently falling back to Apple Clang is a footgun. FetchContent_Declareblocks for cpp-httplib, CLI11, spdlog, Catch2, pinned to specific versions (noGIT_TAG main).- Builds two targets:
merlion_node_exporter_lib(static): everything insrc/exceptmain.cpp.merlion-node-exporter(executable): links the lib +main.cpp.
MNE_BUILD_TESTSoption (defaultONwhen the project is the top level,OFFwhen included as a subdir). When on, buildsmerlion_node_exporter_testsagainst Catch2 and registers it withctest.- Generates
include/merlion_node_exporter/version.hppfromPROJECT_VERSIONso--versionoutput stays in sync withCMakeLists.txt.
Performance-relevant flags for the release config:
-O3 -fno-plt -ffunction-sections -fdata-sections
-Wl,--gc-sections # Linux
-Wl,-dead_strip # macOS
LTO is enabled when supported (check_ipo_supported).
- Inside a collector: prefer
std::expectedand short-circuit helpers. If the kernel surface throws (std::filesystem), catch at the collector boundary and returnstd::unexpected. - Registry: catches exceptions from
Collector::collect, logs them viaspdlog::error, recordsnode_scrape_collector_success=0, still records the duration sample. - Server: any uncaught exception in a handler becomes a 500 + log line. The process never exits because a single scrape failed.
- Pure parsers (
encoding,loadavg::parse,meminfo::parse) are tested with hard-coded fixtures stored inline in the test source. Tests must include the same fixtures used by-rsso we know we agree byte-for-byte on synthetic input. - Filesystem-bound collectors are tested by pointing
Config::procfsat a temp directory the test populates. No/procaccess in unit tests. - Encoder round-trip:
tests/encoding_test.cppincludes a snapshot text file (tests/data/expected_scrape.txt) and asserts the encoder output equals it. The same snapshot is checked in to-rsand a CI job in both repos runs the snapshot through both implementations (TODO — track in [issue #2]). - Smoke test:
cteststarts the binary on a random port, scrapes/metrics, asserts response code 200 andContent-Typecorrect.
Ordered so the project is useful as soon as possible. One PR per checkbox. Land scaffold + first three collectors first to validate the architecture; everything after that is mechanical extension.
- PR #1:
CMakeLists.txt,.clang-format,cmake/ToolchainLLVM.cmake, stubmain.cpp, CI workflow. Builds and runs--versionand exits 0, even though no collectors exist yet. No HTTP server wired up.
- PR #2:
include/.../metric.hpp+ tests - PR #3:
encoding.{hpp,cpp}+ tests (must pass the cross-language snapshot fixture) - PR #4:
config.{hpp,cpp}+ tests - PR #5:
registry.{hpp,cpp}+ tests (includes synthesised scrape metrics) - PR #6:
cli.{hpp,cpp}+server.{hpp,cpp}, wires up/metrics.main.cppbecomes the production entry point.
- PR #7:
loadavg - PR #8:
meminfo - PR #9:
uname
- PR #10:
cpu—/proc/statper-CPU jiffies - PR #11:
diskstats—/proc/diskstats - PR #12:
netdev—/proc/net/dev - PR #13:
filesystem—getmntinfo+statvfs - PR #14:
stat—/proc/stat(boot time, intr, ctxt, processes) - PR #15:
vmstat—/proc/vmstat - PR #16:
netstat—/proc/net/{netstat,snmp,snmp6} - PR #17:
sockstat—/proc/net/sockstat{,6} - PR #18:
pressure—/proc/pressure/{cpu,memory,io} - PR #19:
hwmon—/sys/class/hwmon/ - PR #20:
thermal_zone—/sys/class/thermal/thermal_zone* - PR #21:
time— system clock + NTP sync state - PR #22:
textfile—*.promfiles from a configured directory
- Container image (
Dockerfile) - Homebrew formula in the
MerlionOS/homebrew-merliontap - eBPF-backed collectors behind a CMake option (Linux only)
Whenever this document or -rs's public behaviour changes in a way
that affects scrape output (new metric, new label, renamed metric,
changed HELP text, …), update both repos in lock-step:
- PR against this repo with the design update.
- PR against
-rsimplementing the change. - Cross-link the PRs in both descriptions.
Disagreement between the two implementations is a bug in whichever one diverged from this document.