Skip to content

feat(groq): CLI tool that computes Shannon entropy of a given string, helping assess password strength#3872

Open
polsala wants to merge 1 commit intomainfrom
ai/groq-20260321-0840
Open

feat(groq): CLI tool that computes Shannon entropy of a given string, helping assess password strength#3872
polsala wants to merge 1 commit intomainfrom
ai/groq-20260321-0840

Conversation

@polsala
Copy link
Copy Markdown
Owner

@polsala polsala commented Mar 21, 2026

Implementation Summary

  • Utility: nightly-entropy-analyzer
  • Provider: groq
  • Location: rust-utils/nightly-nightly-entropy-analyzer-3
  • Files Created: 5
  • Description: CLI tool that computes Shannon entropy of a given string, helping assess password strength

Rationale

  • Automated proposal from the Groq generator delivering a fresh community utility.
  • This utility was generated using the groq AI provider.

Why safe to merge

  • Utility is isolated to rust-utils/nightly-nightly-entropy-analyzer-3.
  • README + tests ship together (see folder contents).
  • No secrets or credentials touched.
  • All changes are additive and self-contained.

Test Plan

  • Follow the instructions in the generated README at rust-utils/nightly-nightly-entropy-analyzer-3/README.md
  • Run tests located in rust-utils/nightly-nightly-entropy-analyzer-3/tests/

Links

  • Generated docs and examples committed alongside this change.

Mock Justification

  • Not applicable; generator did not introduce new mocks.

@polsala
Copy link
Copy Markdown
Owner Author

polsala commented Mar 21, 2026

🤖 Review by GROQ Agent

✅ What's solid

  • Clear separation of concerns – the core logic lives in src/lib.rs (compute_entropy) while the CLI wrapper is in src/main.rs. This makes the library reusable and testable.
  • Minimal dependency footprint – only the Rust standard library is used, keeping the crate lightweight.
  • Comprehensive README – includes build, run, and test instructions, plus a concrete usage example.
  • Basic test coverage – three unit tests exercise the main edge‑cases (empty string, single‑character string, and a known entropy value).

🧪 Tests

  • Floating‑point tolerance – the test_entropy_known test asserts an absolute difference < 1e‑12. Small variations in log2 implementations across platforms can cause flaky failures.
    const EPS: f64 = 1e-9; // or a relative tolerance
    assert!((entropy - expected).abs() < EPS);
  • Unicode handling – the current implementation uses s.len() (byte length) to compute probabilities, which is incorrect for multi‑byte Unicode characters. Add a test with a non‑ASCII string (e.g., "ááßç"), compute the expected entropy manually, and ensure the function matches.
  • Boundary conditions – consider adding a test for a very long input (e.g., 10 k characters) to verify that the function remains performant and does not overflow.
  • Property‑based testing – a quick proptest that generates random strings and checks that entropy is always non‑negative and bounded by log2(|alphabet|) could catch subtle bugs.

🔒 Security

  • Input size – the CLI reads the entire argument list into memory (args.join(" ")). While typical command‑line limits are generous, a malicious user could invoke the binary with an extremely long string, leading to high memory consumption. Mitigate by:
    • Adding a configurable maximum length (e.g., 1 MiB) and exiting with an error if exceeded.
    • Using streaming input (e.g., read from stdin) for large data sets.
  • Denial‑of‑service via Unicode – because s.len() counts bytes, a crafted UTF‑8 string could cause a mismatch between the byte length and the number of characters, potentially resulting in a division by zero if len is zero but freq is non‑empty (unlikely, but safer to align the metrics). Switch to character count as shown below.

🧩 Docs / Developer Experience

  • Algorithm explanation – augment the README with a short paragraph describing Shannon entropy, the formula used, and how the result relates to password strength (e.g., “> 4 bits/char is generally considered strong”).
  • CLI help flag – currently the tool prints a usage line only when no arguments are supplied. Adding a --help flag (via std::env::args) would improve discoverability. Example snippet:
    if args.iter().any(|a| a == "--help" || a == "-h") {
        println!("Usage: {} <string>", env::args().next().unwrap());
        return;
    }
  • Versioning – expose the crate version in the CLI output (env!("CARGO_PKG_VERSION")) so users can verify which build they are running.
  • Error messages – the current eprintln! is fine, but consider returning a non‑zero exit code only for usage errors; any internal panic should be avoided (e.g., guard against log2(0.0) by early‑return when len == 0.0 – already present, but keep the guard after fixing the length calculation).

🧱 Mocks / Fakes

  • No external services or I/O abstractions are used, so mocks are unnecessary. The pure function design already enables straightforward unit testing without any test doubles.

Quick fix for the Unicode length bug

Replace the byte‑length calculation with a character count:

pub fn compute_entropy(s: &str) -> f64 {
    let len = s.chars().count() as f64;
    if len == 0.0 {
        return 0.0;
    }
    let mut freq = std::collections::HashMap::new();
    for ch in s.chars() {
        *freq.entry(ch).or_insert(0usize) += 1;
    }
    let mut entropy = 0.0;
    for count in freq.values() {
        let p = (*count as f64) / len;
        entropy -= p * p.log2();
    }
    entropy
}

This change aligns the probability denominator with the actual number of characters processed, ensuring correct entropy values for all Unicode inputs.

@polsala
Copy link
Copy Markdown
Owner Author

polsala commented Mar 21, 2026

🤖 Review by GEMINI Agent

✅ What's solid

  • Clear module separation: The core entropy calculation logic is well-encapsulated in src/lib.rs, making it reusable and testable independently of the CLI interface.
  • Minimal dependencies: The project correctly leverages only the Rust standard library, ensuring a lightweight and efficient utility.
  • Robust entropy calculation: The compute_entropy function correctly handles edge cases like empty strings and strings with uniform characters, returning 0.0 as expected. The use of HashMap for character frequency counting is appropriate and efficient.
  • Comprehensive README: The README.md provides clear, concise instructions for usage, building, and testing, including practical examples.

🧪 Tests

  • Floating-point comparison: The test_entropy_known function correctly uses a tolerance-based comparison (abs() < 1e-12) for floating-point results, which is crucial for numerical stability.
  • Edge case coverage: Tests for empty strings and strings with a single unique character (e.g., "aaaaa") are present, demonstrating good coverage for common edge cases.
  • Additional test cases: Consider adding tests for:
    • Strings with special characters: E.g., "!@#$%^&*()" to ensure correct handling of non-alphanumeric characters.
    • Strings with mixed case: E.g., "AaBbCc" to verify if character case is treated distinctly (which it should be for Shannon entropy).
    • Longer, more complex strings: E.g., a sentence or a base64 encoded string, to ensure performance and accuracy with larger inputs.
    • Unicode characters: While char in Rust handles Unicode, it's good to explicitly test strings containing multi-byte UTF-8 characters to confirm correct frequency counting.

🔒 Security

  • No external dependencies: The absence of third-party crates significantly reduces the supply chain attack surface.
  • Self-contained utility: The tool is isolated and does not interact with the file system, network, or other system resources beyond standard input/output, minimizing potential vectors for malicious activity.
  • Input handling: The CLI argument parsing uses std::env::args(). The args.join(" ") approach means that if a user provides multiple arguments without quotes (e.g., cargo run -- hello world), they will be concatenated into a single string "hello world". This is generally safe for an entropy calculation, but it's worth noting that it might not be what a user expects if they intend to pass multiple distinct strings. For this specific utility, it's acceptable.

🧩 Docs/DX

  • Clear CLI usage: The main.rs provides a helpful error message when no arguments are provided, guiding the user on correct usage.
  • Output clarity: The output format Entropy: {:.6} bits per character is clear and precise.
  • Consistency in naming: The package name nightly-entropy-analyzer is consistent across Cargo.toml and the README.md. However, the directory path rust-utils/nightly-nightly-entropy-analyzer-3 has a redundant "nightly". Standardizing the directory name to match the package name (e.g., rust-utils/nightly-entropy-analyzer-3) would improve consistency and discoverability.
  • Argument parsing clarity: The main.rs currently joins all arguments with a space. While functional, for a tool that expects a single string input, using a dedicated argument parsing library like clap could provide a more robust and user-friendly CLI experience, including better help messages, argument validation, and explicit handling of single vs. multiple string inputs. For example, cargo run -- "my password" vs cargo run -- my password. The current implementation treats both as "my password".

🧱 Mocks/Fakes

  • Not applicable: As stated in the PR body, the generator did not introduce new mocks. Given the pure function nature of compute_entropy and its minimal dependencies, mocks are not necessary for testing this utility. The existing unit tests directly call the compute_entropy function with various inputs, which is the most straightforward and effective testing strategy here.

@polsala
Copy link
Copy Markdown
Owner Author

polsala commented Mar 23, 2026

🤖 Review by OPENROUTER Agent

ApocalypsAI Review: nightly-entropy-analyzer

✅ What's solid

  • Clean architecture: Core entropy computation is properly separated into lib.rs with a thin CLI wrapper in main.rs, enabling reusability beyond the command-line interface.
  • Mathematical correctness: The Shannon entropy implementation follows the standard formula: H(X) = -Σ p(x) * log2(p(x)). The probability calculations and logarithmic operations appear accurate.
  • Good error handling for CLI: The main function correctly exits with status code 1 and provides a clear usage message when no arguments are provided.
  • Precision in output: Entropy values are displayed with six decimal places, offering sufficient precision for practical use cases like password strength estimation.

🧪 Tests

  • Basic coverage present: Includes tests for empty strings, single-character inputs, and a known-value case ("aaab"). These cover fundamental edge cases and validate computational accuracy.
  • Floating-point comparison handled: Uses epsilon-based assertion (assert!((entropy - expected).abs() < 1e-12)) for floating-point equality checks, which is the correct approach.

Actionable Feedback:

  • Expand test cases: Add more diverse test scenarios including:

    • High-entropy strings (e.g., random alphanumeric)
    • Unicode characters to verify multibyte support
    • Strings with all unique characters
    #[test]
    fn test_high_entropy_string() {
        let s = "abcdefghijklmnopqrstuvwxyz"; // 26 unique chars
        let entropy = compute_entropy(s);
        // Expected value can be precomputed or approximated
        assert!(entropy > 4.5 && entropy < 5.0); 
    }
    
    #[test]
    fn test_unicode_characters() {
        let s = "🚀🔥💧✨"; // Emoji string
        let entropy = compute_entropy(s);
        assert!(entropy > 0.0); // Should compute without panic
    }
  • Improve test documentation: Add doc comments explaining what each test verifies.

🔒 Security

  • No unsafe code or external deps: The crate uses only safe Rust standard library features, minimizing potential vulnerabilities.
  • Input handling: Basic CLI input parsing via std::env is acceptable for this utility scope but could be enhanced.

Actionable Feedback:

  • Consider input sanitization note: While not critical here, document any assumptions about input encoding (UTF-8) and behavior with special characters in the README or inline comments.
  • Clarify threat model: Since this evaluates password strength, consider adding guidance on interpreting entropy scores (e.g., what constitutes weak vs strong).

🧩 Docs/DX

  • Concise README: Provides essential usage, build, and test instructions clearly.
  • Example output shown: Demonstrates expected behavior with "password123" which helps users quickly understand the tool.

Actionable Feedback:

  • Add interpretation guidelines: Include a brief explanation of how to interpret entropy values (bits per character) in terms of password strength.

    ## Interpreting Results
    
    - **< 2.0 bits/char**: Very weak
    - **2.0–3.5 bits/char**: Weak to moderate
    - **3.5–4.5 bits/char**: Strong
    - **> 4.5 bits/char**: Very strong
  • Enhance CLI help: Consider adding a -h/--help flag for better user experience instead of just printing usage to stderr.

  • Document limitations: Mention that this measures Shannon entropy specifically and doesn't account for other factors like dictionary attacks or pattern recognition.

🧱 Mocks/Fakes

  • Not applicable: As noted in the PR description, there are no mocks or fakes introduced since this is a self-contained mathematical utility with no external service dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant