"Where's Waldo?" for Cybersecurity β Fleet-wide anomaly detection powered by unsupervised machine learning.
Created with Claude.ai but supervised by a human (me apparently).
IronSift is a Rust-based security analyzer that finds anomalous machines in a fleet by comparing their process (and optionally file access) behavior. It does not rely on attack signatures or threat feeds: it learns what is βnormalβ from your own data and flags machines that stand out.
- Fleet mode (default): You feed process logs from many machines (CSV, JSON, or JSONL). IronSift builds a behavioral profile per machine, turns them into vectors (TF-IDF), and runs DBSCAN clustering. Machines that end up alone (noise) or in a small minority cluster are reported as anomalies, with severity and risk factors (entropy, suspicious paths, unexpected root, etc.).
- Temporal mode: For a single machine, you can compare two or more snapshots over time. IronSift reports new processes, new or modified files, and new IP connections between snapshots β no clustering involved.
- File mode (
--files): File access logs per host are turned into file profiles (counts perFileSignature, plus per-path mtime and metadata). Fleet analysis combines TFβIDF + DBSCAN (same pattern as process mode: noise / minority cluster) with explicit cross-host rules: mtime vs fleet median, owner/group/size baselines on comparable paths, fleet-relative access outliers (e.g. root UID on a path only where most peers use non-rootβno blanket βroot readβ alerts), rare signatures seen on a single host, and configurable mtime-vs-access heuristics. Rows matchingfile_excluded_*regexes are never merged into profiles.
Input can come from CSV, JSON, or JSONL (one JSON object per line; each file can be one machine). Output is a console report and an optional JSON forensic report for integration with other tools.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INPUTS β
β Process logs (CSV / JSON / JSONL) or File access logs or Temporal snapshots β
βββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ
β FLEET ANALYSIS β β FILE ANALYSIS β β TEMPORAL β
β (process logs) β β (--files) β β (same machine β
β β β β β over time) β
βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ
β β β
β Group by machine_id β Group by machine_id β Build snapshot
β Resolve parents, β Per-path mtime + metadata β per time point
β compute entropy & paths β β
βΌ βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ
β One profile per β β One file profile β β Diff snapshots: β
β machine β β per machine β β new processes, β
β (process counts) β β (file + mtime + β β new/modified β
β β β owner/group/size)β β files, new IPs β
βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ βββββββββββββββββββββ
β β
β TF-IDF matrix β TF-IDF + mtime +
β (machines Γ features) β metadata fleet checks
βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββ
β DBSCAN β β DBSCAN + fleet β
β Noise = outlier β β rules: mtime, β
β Small cluster = β β metadata, FLEET β
β minority β β OUTLIER, rare β
βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ
β β
ββββββββββββββββ¬βββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OUTPUTS β
β Console report (anomalies, severity, process/file risk factors) + optional JSON exportβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
In short: Fleet and file modes turn many machines into profiles, then use TF-IDF + DBSCAN so hosts in noise or a small cluster diverge from the dense majority. File mode also flags hosts using fleet baselines (mtime, owner/group/size) and path-level minority patterns (root vs non-root, permissions, recent-mtime signal)βnot per-row βevery root access is suspicious.β Temporal mode skips clustering and diffs snapshots of one machine.
use ironsift::{build_profiles_simple, analyze_fleet, DetectionConfig};
fn main() {
let config = DetectionConfig::default();
// Just provide (machine_id, process_name, parent_name) - PIDs handled automatically!
let processes = vec![
("server1".to_string(), "nginx".to_string(), "systemd".to_string()),
("server1".to_string(), "worker".to_string(), "nginx".to_string()),
("server2".to_string(), "miner".to_string(), "systemd".to_string()), // β οΈ Anomaly
];
let profiles = build_profiles_simple(processes, &config);
let report = analyze_fleet(&profiles, &config).unwrap();
report.print();
}use ironsift::{ProcessBuilder, ProcessEntry, build_profiles, analyze_fleet, DetectionConfig};
fn main() {
let config = DetectionConfig::default();
let mut builder = ProcessBuilder::new();
// Simple method
builder.add_process("server1", "nginx", "systemd");
// Or fluent API with full control
builder.add(
ProcessEntry::new("server1".to_string(), "worker".to_string())
.parent("nginx")
.uid(33)
.path("/usr/sbin/nginx")
.args("worker process")
);
// NEW: Automatic command line parsing!
builder.add_command("server2", "/usr/bin/postgres -D /var/lib/postgresql/data", Some("systemd"));
// NEW: Bare commands (no full path) work too!
builder.add_command("server3", "ls /etc/", Some("bash"));
// NEW: JSON log parsing!
builder.add_json(r#"{"host": "server4", "cmd": "nginx", "uid": 33}"#);
let profiles = build_profiles(builder.build(), &config);
let report = analyze_fleet(&profiles, &config).unwrap();
report.print();
}use ironsift::{RawLogEntry, build_profiles, analyze_fleet, DetectionConfig};
fn main() {
let config = DetectionConfig::default();
let entries = vec![
RawLogEntry {
machine_id: "server1".to_string(),
pid: 1, ppid: 0,
name: "systemd".to_string(),
uid: 0,
path: "/usr/lib/systemd/systemd".to_string(),
args: "--system".to_string(),
timestamp: None,
},
// ... more entries
];
let profiles = build_profiles(entries, &config);
let report = analyze_fleet(&profiles, &config).unwrap();
report.print();
}See EXAMPLES.md for complete usage examples.
Compare multiple snapshots of the same machine over time to spot new processes, new or modified files, and new IP connections β without fleet-wide clustering.
| Concept | Description |
|---|---|
| MachineSnapshot | One point-in-time view: processes + file accesses + connections for a single machine |
| TemporalDiff | Diff between two snapshots: new_processes, new_files, modified_files (mtime), new_connections |
| RawConnectionEntry | Connection log: machine_id, remote_ip, optional local_ip, remote_port, process_name, timestamp |
Example: build a baseline snapshot (e.g. Monday 10:00), then a current snapshot (Monday 14:00); compare_temporal(&baseline, ¤t) yields new processes, files, and IPs.
use ironsift::{build_machine_snapshot, compare_temporal, compare_temporal_series,
DetectionConfig, RawLogEntry, RawFileEntry, RawConnectionEntry};
let config = DetectionConfig::default();
let baseline = build_machine_snapshot("server1", "2024-01-01T10:00Z",
process_entries_t1, file_entries_t1, connection_entries_t1, &config);
let current = build_machine_snapshot("server1", "2024-01-01T14:00Z",
process_entries_t2, file_entries_t2, connection_entries_t2, &config);
let diff = compare_temporal(&baseline, ¤t);
// diff.new_processes, diff.new_files, diff.modified_files, diff.new_connections
// Or compare a series of snapshots (T1 vs T2, T2 vs T3, ...)
let diffs = compare_temporal_series(&[snap1, snap2, snap3]);Run the demo: cargo run --example temporal
- File fleet: Fleet-relative
FLEET OUTLIERsignals (root/permissions/recent-mtime per path vs majority), ingestfile_excluded_*, configurablefile_recent_mtime, stricter recent-mtime heuristic to cut false positives. - Process/file profiles: Hot strings deduplicated via interning (
Arc<str>on signatures and file maps). - π§ͺ Expanded tests for file fleet and exclusions.
- β¨ Enhanced Detailed Console Output - Rich reporting with attack categorization
- β¨ Automatic Command Line Parsing - Handles bare commands (
ls /etc/) and full paths - β¨ Native JSON Log Parsing - Docker, Kubernetes, CloudWatch, Elasticsearch support
- π Comprehensive documentation (15+ guides)
- π§ͺ Broad test coverage
- π― Three flexible APIs (Simple, Builder, Direct)
- π Automatic PID/PPID resolution
- π Reorganized project structure (CLI separated)
- π Extensive documentation
- π Core DBSCAN clustering
- π TF-IDF feature engineering
- π¨ Anomaly detection
- π Basic reporting
IronSift accepts data in various formats - choose what works for your logs:
builder.add_command("server1", "/usr/bin/nginx -c /etc/nginx.conf", Some("systemd"));
// β Automatically extracts: name="nginx", path="/usr/bin/nginx", args="-c /etc/nginx.conf"// Common in ps output, shell commands
builder.add_command("server1", "ls /etc/", Some("bash"));
builder.add_command("server1", "grep error app.log", Some("bash"));
// β Works perfectly! name="ls", path="ls", args="/etc/"// Single JSON entry
builder.add_json(r#"{"host": "server1", "cmd": "/usr/bin/nginx", "uid": 33}"#);
// Batch (JSON array or NDJSON)
builder.add_json_batch(r#"[
{"container": "web-1", "command": "nginx", "userid": 33},
{"node": "worker-1", "cmd": "python3 app.py", "uid": 1000}
]"#);Supported JSON key names:
- Machine:
machine_id,hostname,host,server,node,container,pod - Command:
command,cmd,cmdline,commandline - User:
uid,user_id,userid
See JSON_PARSING.md and COMMAND_PARSING.md for complete documentation.
| Feature | Description |
|---|---|
| Multivariate Analysis | Analyzes 6 dimensions: Process Name, Parent (auto-resolved), UID, Path, Entropy, Path Risk |
| PID Awareness | Automatically resolves parent processes from PID/PPID relationships |
| Unsupervised Learning | Zero-config detection β no signature database required |
| Scale Invariant | Works on 10 logs or 10 million logs |
| Minority Cluster Detection | Identifies coordinated attacks (botnets, APTs) |
| High Entropy Detection | Flags obfuscated commands and encoded payloads |
| Suspicious Path Analysis | Detects execution from /tmp, /dev/shm, hidden directories |
Logs can include path, mtime, permissions, owner, group, and size (CSV columns or JSON/JSONL fields). Profiles aggregate per FileSignature (path + uid + flags + optional metadata). IronSift compares hosts using several independent signals:
| Signal | What it does |
|---|---|
| DBSCAN (clustering-only) | TFβIDF over unique file signatures Γ normalized counts per host β DBSCAN. Noise (cluster_id = none) or membership in a non-largest cluster can mark a host as anomalous even if no text feature lines are attachedβpurely geometric distance from the main blob. |
| MTIME anomaly | Same path on β₯3 hosts: flags machines whose mtime is >24 hours from the fleet median for that path (MTIME ANOMALY). |
| Metadata anomaly | On comparable paths (/etc/β¦, */bin/*, */sbin/*, /usr/bin/β¦, /usr/sbin/β¦, /var/log/β¦), with β₯3 hosts and a majority value appearing β₯2 times, hosts that disagree on owner, group, or size are flagged (METADATA ANOMALY). |
| FLEET OUTLIER (path minorities) | For paths seen on β₯3 hosts with a strict majority (majority count β₯2): flags hosts in the minority class for root vs non-root access to that path, world-writable vs not, group-writable (only under paths containing /etc or /tmp), and recent mtime vs access (see file_recent_mtime in config). Avoids fleet-wide false positives when everyone behaves the same. |
| Rare file access | A full signature appears on exactly one machine in the fleet (Rare file access: β¦). |
| Ingest exclusions | file_excluded_path_regexes / file_excluded_filename_regexes: matching rows are not merged into profiles (enforced in the merge path). |
Per-signature helpers (e.g. in FileSignature::risk_factors) still describe suspicious path, system dir, root, etc. for local explanations; the fleet report does not treat βroot readβ or βsystem directoryβ as automatic anomalies without a fleet-relative or rare-signature signal above.
Config: file_recent_mtime tunes clock skew, time windows, and volatile path prefixes for the recent-mtime heuristic. Metadata comparison stays scoped so /home/β¦ variation does not dominate.
IronSift can identify:
- Cryptominers: Unusual processes with high CPU, suspicious paths
- Web Shells: PHP/Python processes with high-entropy eval() payloads
- Privilege Escalation: Normal processes suddenly running as root (UID 0)
- Lateral Movement: Unusual SSH/SCP activity with anomalous targets
- Rootkits: Processes masquerading as system services
- APT Campaigns: Small clusters of compromised machines with identical malware
- Rust 1.70+ (
rustuprecommended) - 4GB+ RAM for large datasets
cd ironsift
cargo build --releaseCreate a realistic dataset with 100 machines and embedded attack scenarios:
cargo run --release --bin generatorOutput: large_dataset.csv (100,000 logs with 10 compromised machines)
The generated data includes:
- Realistic PID/PPID relationships
- systemd as PID 1 on each machine
- Normal processes as children of systemd
- Attack processes with proper parent relationships
For file datasets, run cargo run --release --bin generator -- --files. The sample CSV includes mtime and metadata scenarios (fleet baseline owner/group/size on common paths, with a few hosts intentionally diverging) so ironsift --files exercises MTIME ANOMALY and METADATA ANOMALY lines in the report.
Analyze the fleet and display results:
cargo run --release --bin ironsiftSample Output:
================================================================================
IRONSIFT ANALYSIS REPORT
================================================================================
Fleet Size: 100 machines
Detection Sensitivity: High
--- Configuration ---
DBSCAN Tolerance: 0.35
Entropy Threshold: 4.5
Minority Cluster Ratio: 10%
--- Cluster Distribution ---
Cluster 0: 90 machines (90.0%)
Noise (Outliers): 10 machines (10.0%)
================================================================================
Status: π¨ ANOMALIES DETECTED
================================================================================
Suspicious Machines: 10
π CRITICAL (3):
These machines are isolated outliers - likely compromised
π machine_013 (Distance: 1.500)
ββ Cluster: Noise (isolated outlier)
ββ Total processes: 150
ββ Suspicious processes: 50 β οΈ
ββ Rare processes (< 5% of fleet):
β β’ kworker (path: /tmp/.X11-unix/kworker)
β β’ systemd (path: /var/tmp/.cache/systemd)
ββ Suspicious processes detected:
β
β π kworker (count: 30)
β Parent: systemd
β Path: /tmp/.X11-unix/kworker
β UID: 0 (root) β οΈ
β Risk factors:
β π¨ High entropy arguments (possible obfuscation)
β π¨ Suspicious execution path: /tmp/.X11-unix/kworker
β π¨ Running as root (UID 0)
β π¨ Executing from temporary directory
ββ Activity period: 2024-01-01 10:00:00 to 2024-01-07 15:30:00
π΄ HIGH (4):
Strong deviation from baseline - investigate immediately
π΄ machine_042 (Distance: 0.823)
ββ Suspicious processes: 15 β οΈ
ββ Unusual: php-fpm (high entropy eval payloads)
...
--- Detected Attack Patterns ---
βοΈ Cryptomining (3 machines): machine_013, machine_027, machine_065
πΈοΈ Web Shells (2 machines): machine_042, machine_088
β¬οΈ Privilege Escalation (4 machines): machine_019, machine_051, ...
π Suspicious Execution Paths (5 machines): machine_013, machine_027, ...
================================================================================
Recommended Actions:
1. Review flagged machines and investigate anomalous processes
2. Check process execution paths and command arguments
3. Verify parent-child process relationships
4. Cross-reference with network logs and file access logs
5. Export detailed report: cargo run --bin ironsift -- --export-json
================================================================================
See OUTPUT_EXAMPLES.md for complete output examples.
Generate a detailed JSON report for incident response:
cargo run --release --bin ironsift -- --export-jsonOutput: forensic_report.json
For use by other tools or in scripts:
| Option | Effect |
|---|---|
-q, --quiet |
Minimal output: one-line summary only (e.g. CLEAN or ANOMALIES: 5 (Critical: 2, High: 1, β¦)). Progress and config are suppressed. |
--export-json - |
Write the JSON report to stdout (nothing else on stdout). Use 2>/dev/null to hide progress on stderr. |
| Progress messages | Loading/config/progress lines are sent to stderr so stdout can be piped or parsed. |
Examples:
# One-line result for scripting
ironsift -q --input data.csv
# JSON only on stdout (e.g. pipe to jq or another tool)
ironsift --export-json - --input data.csv 2>/dev/null | jq '.anomalies_detected'
# Quiet + export to file
ironsift -q --export-json report.json --input data.csvironsift [OPTIONS]
Options:
--config <file> Load configuration from JSON file
--export-json Export detailed forensic report
--tolerance <value> Override DBSCAN tolerance (default: from config, 0.35)
--help Show help messageOn first run, IronSift creates ironsift_config.json. Important keys:
{
"entropy_threshold": 4.5,
"minority_cluster_ratio": 0.10,
"dbscan_tolerance": 0.35,
"dbscan_min_samples": 2,
"normalize_features": true,
"suspicious_path_patterns": [
"/tmp/",
"/dev/shm/",
"/var/tmp/",
"/home/[^/]+/\\.[^/]+",
"^\\./",
"/(?:bin|sbin|usr/bin|usr/sbin)/\\.[^/]+"
],
"file_excluded_path_regexes": [],
"file_excluded_filename_regexes": [],
"file_recent_mtime": {
"clock_skew_minutes": 5,
"max_hours_critical_paths": 12,
"max_hours_system_elevated": 6,
"max_hours_suspicious_only": 3,
"volatile_path_prefixes": [
"/var/log/",
"/var/cache/",
"/var/lib/dpkg/",
"/var/lib/apt/",
"/var/tmp/",
"/tmp/",
"/run/",
"/proc/",
"/sys/",
"/dev/"
]
}
}file_excluded_path_regexes/file_excluded_filename_regexes: Rust regexes; matching file-log rows are dropped before profiling (e.g.^/proc/,^/var/cache/).file_recent_mtime: Controls the mtime vs access-time signal used in profiles and in FLEET OUTLIER comparisons (volatile prefixes, tiered hour limits).
| Parameter | Effect | Recommended Range |
|---|---|---|
dbscan_tolerance |
Detection sensitivity (process & file TFβIDF clustering) | Default 0.35; lower (e.g. 0.03β0.10) = stricter, higher = looser |
minority_cluster_ratio |
Botnet detection threshold | 0.05 - 0.15 |
entropy_threshold |
Obfuscation detection | 3.5 (sensitive) - 5.5 (strict) |
file_recent_mtime.* |
Strictness of βrecent mtime near accessβ (file mode) | Adjust max_hours_* / volatile_path_prefixes if too noisy or too quiet |
Example: Increase sensitivity for high-security environments:
cargo run --bin ironsift -- --tolerance 0.03| Level | Score | Meaning | Action |
|---|---|---|---|
| π Critical | > 1.0 | Isolated outlier, likely compromised | Immediate isolation |
| π΄ High | 0.6-1.0 | Strong deviation, investigate ASAP | Priority investigation |
| π Medium | 0.3-0.6 | Moderate anomaly, worth reviewing | Schedule review |
| π‘ Low | 0.0-0.3 | Minor deviation, may be benign | Monitor |
The JSON export includes:
{
"report_timestamp": "2024-12-10T15:30:00Z",
"fleet_size": 100,
"anomalies_detected": 10,
"config": { ... },
"investigation_targets": [
{
"machine_id": "machine_013",
"severity": "Critical",
"distance_score": 1.5,
"suspicious_processes": [
{
"name": "kworker",
"path": "/tmp/.X11-unix/kworker",
"parent": "systemd",
"risk_factors": [
"High entropy arguments (possible obfuscation)",
"Suspicious execution path: /tmp/.X11-unix/kworker",
"Running as root (UID 0)"
]
}
]
}
]
}Run the comprehensive test suite:
cargo testTo check that the generator output is correctly analyzed by the CLI (catches regressions in ingestion or reporting):
./scripts/test_generator_ironsift.shThis script builds release, generates process and file datasets, runs ironsift (and ironsift --files) on them, and verifies that anomalies are reportedβincluding at least one MTIME ANOMALY and one METADATA ANOMALY line in the file report. Run from the repo root.
- Shannon entropy calculation
- Suspicious path detection
- Clean fleet (no false positives)
- Single outlier detection
- Minority cluster detection (botnet scenario)
- Process risk factor analysis
- PID/PPID parent resolution
- Unknown parent handling
- File fleet: DBSCAN + mtime/metadata baselines +
FLEET OUTLIERpath minorities + rare signatures + regex exclusions +file_recent_mtime; JSONL/CSV streaming loaders
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IRONSIFT PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Raw Input Profile Building Analysis
βββββββββ ββββββββββββββββ βββββββ
ββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β CSV / JSON β β Group by β β TF-IDF β
β Process Logs ββββββββββββββΊβ machine_id ββββββββββββΊβ Vectorization β
β or File β parse β β build β (rare = signal) β
β Access Logs β β Resolve PPID β β profiles β β
ββββββββββββββββ β parent names β ββββββββββ¬βββββββββ
β β β β
β β Whitelist / β βΌ
β β filter paths β βββββββββββββββββββ
ββββββββββββββββββββββΊβ β β L2 Normalize β
βββββββββββββββββββ β DBSCAN Cluster β
ββββββββββ¬βββββββββ
β
βΌ
Output βββββββββββββββββββ βββββββββββββββββββ
ββββββ β Anomaly Scoring βββββββββββββββ Noise = outlier β
β & Severity β cluster β Small cluster β
ββββββββββββββββ β (CriticalβLow) β ids β = minority β
β Console βββββββββββββ€ β β Large cluster β
β Report β print ββββββββββ¬βββββββββ β = baseline β
ββββββββββββββββ β βββββββββββββββββββ
β² β
β βΌ
ββββββββββββββββ βββββββββββββββββββ
β forensic_ βββββββββββββββ Feature reasons β
β report.json β export β Process: entropyβ
ββββββββββββββββ β path, rootβ¦ β
β File: mtime, β
β metadata, FLEET β
β OUTLIER, rare β
βββββββββββββββββββ
PROCESS MODE (default) FILE MODE (--files)
βββββββββββββββββββββ βββββββββββββββββββ
RawLogEntry RawFileEntry
β’ machine_id, pid, ppid β’ machine_id, path, uid
β’ name, path, args, uid β’ timestamp, mtime
β’ timestamp β’ permissions, owner, group, size (optional)
β β
βΌ βΌ
ProcessSignature FileSignature
β’ name + parent + uid + path β’ path + uid + metadata
β’ is_suspicious_path, entropy β’ is_suspicious_path, permissions
β β’ owner, group, size; mtime flags
βΌ β
MachineProfile MachineFileProfile
(counts per process) (counts per file signature +
latest mtime + owner/group/size per path)
β β
ββββββββββββββββ¬ββββββββββββββββββββββ
βΌ
analyze_fleet / analyze_files_fleet
β
βΌ
AnalysisReport (anomalies, severity)
- PID Resolution: Automatically maps PPID to parent process names
- TF-IDF Weighting: Boosts rare processes, reduces noise from common ones
- L2 Normalization: Ensures distance metrics work correctly across varied fleet sizes
- DBSCAN: Density-based clustering that naturally identifies outliers
- Shannon Entropy: Measures randomness in command arguments (detects obfuscation)
- File fleet baselines: Median mtime and majority owner/group/size per path (on comparable paths); path-level binary minorities (root, writable flags, recent-mtime pattern) when β₯3 hosts and a clear majority; rare signatures (single-host
FileSignature)
IronSift treats each machine as a vector in N-dimensional feature space:
- Normal machines cluster tightly (distance β 0)
- Compromised machines drift away due to:
- Rare processes not seen elsewhere
- Unusual execution paths
- High-entropy obfuscated commands
- Privilege escalation patterns
- Abnormal parent-child relationships
- File mode: DBSCAN distance, rare file signatures, mtime far from fleet median, metadata disagreements, or minority access pattern on a path vs most peers (not βevery root read is badβ)
Feature space (simplified 2D view)
βββββββββββββββββββββββββββββββββ
β’ β’ β’ β’ β’
β’ β’ β’ β’ β’ β Normal machines (tight cluster)
β’ β’ β’ β’ β’
β’ β’ β’ β’
β
β Isolated outlier (NOISE)
β π CRITICAL: likely compromised
β β β β β βΊ
small cluster
(minority) β π΄ HIGH: botnet / APT pattern
β³ β³
β³
DBSCAN: density-based clustering
β’ Points in dense regions β same cluster (baseline).
β’ Points in sparse regions β "noise" = anomaly.
β’ Small clusters β minority = coordinated deviance.
Fleet: 100 web servers running nginx, postgres, node
Anomaly: Machine #42 suddenly has:
php-fpm (PID 5432, PPID 108 [apache2]) β eval(base64_decode('aGVsbG8gd29ybGQ='))
IronSift Analysis:
Raw log Resolution TF-IDF DBSCAN
βββββββ ββββββββββ ββββββ ββββββ
machine_42 PPID 108 rare Machine #42 Main cluster
pid 5432, ppid 108 ββββΊ β apache2 process βββΊ vector differs βββΊ β’ β’ β’ β’ β’
name php-fpm parent (1/100) from baseline β’
args eval(base64β¦) resolved βΌ βΌ β
β #42
β IDF boost distance β 1.2 (outlier)
β 100Γ βΌ
β π΄ HIGH severity
ββββββββββββββββββββ anomaly
- Resolves parent: PPID 108 β apache2
- Computes TF-IDF: This exact process appears on 1/100 machines
- IDF boost: 100Γ signal amplification for this rare event
- DBSCAN: Machine #42 is 1.2 units away from main cluster
- Result: π΄ HIGH severity anomaly detected
Benchmarks on a 4-core CPU:
| Fleet Size | Logs | Processing Time | Memory |
|---|---|---|---|
| 100 machines | 100K | 0.8s | 45 MB |
| 1,000 machines | 1M | 6.2s | 320 MB |
| 10,000 machines | 10M | 58s | 2.8 GB |
With parallel processing enabled (Rayon)
# Daily cron job
0 2 * * * cd /opt/ironsift && \
./ingest_logs.sh && \
cargo run --release --bin ironsift -- --export-json && \
./alert_soc.sh forensic_report.json# Quick triage after breach detection
cargo run --release --bin ironsift -- --tolerance 0.03 --export-json# Test detection against custom malware
./inject_attack.sh && cargo run --bin ironsiftStay secure. Sift the iron from the ore. π
