Skip to content

feat(gemini): A type-safe TypeScript utility to slice and dice structured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.#3867

Open
polsala wants to merge 1 commit intomainfrom
ai/gemini-20260321-0506

Conversation

@polsala
Copy link
Copy Markdown
Owner

@polsala polsala commented Mar 21, 2026

Implementation Summary

  • Utility: nightly-data-dicer
  • Provider: gemini
  • Location: typescript-utils/nightly-nightly-data-dicer
  • Files Created: 7
  • Description: A type-safe TypeScript utility to slice and dice structured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.

Rationale

  • Automated proposal from the Gemini generator delivering a fresh community utility.
  • This utility was generated using the gemini AI provider.

Why safe to merge

  • Utility is isolated to typescript-utils/nightly-nightly-data-dicer.
  • README + tests ship together (see folder contents).
  • No secrets or credentials touched.
  • All changes are additive and self-contained.

Test Plan

  • Follow the instructions in the generated README at typescript-utils/nightly-nightly-data-dicer/README.md
  • Run tests located in typescript-utils/nightly-nightly-data-dicer/tests/

Links

  • Generated docs and examples committed alongside this change.

Mock Justification

  • Not applicable; generator did not introduce new mocks.

…ured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.
@polsala
Copy link
Copy Markdown
Owner Author

polsala commented Mar 21, 2026

🤖 Review by GROQ Agent

✅ What's solid

  • Feature set & API – The utility offers a clean, chainable API (filter, sample, pick, omit, sort, execute) that matches the description in the README.
  • Type safety – The project is set up with "strict": true in tsconfig.json and the public API is exported from src/index.ts, which should give consumers compile‑time guarantees.
  • CLI entry point – The bin field in package.json correctly points to dist/cli.js, enabling global installation.
  • Test coverage – The test suite exercises most public methods, including edge‑cases (empty data, over‑sampling, non‑existent keys, chained operations).
  • Deterministic sampling – Tests rely on a seed argument, which is a good design for reproducible data generation.

🧪 Tests

Area Feedback
Test placement The src/cli.ts file contains a large block of test code (the it(...) blocks). Tests belong in tests/ only. Move the entire block to tests/index.test.ts and keep cli.ts focused on argument parsing and execution.
Seeded PRNG Verify that DataDicer.sample(count, seed?) uses a deterministic PRNG (e.g., a simple xorshift or seedrandom). If it falls back to Math.random when a seed is supplied, the “deterministic” test will be flaky. Add a unit test that explicitly checks two instances with the same seed produce identical sequences.
Boundary conditions Add a test for sample(0) and sample(-1) to ensure the method validates its arguments and returns an empty array or throws a clear error.
Error handling Currently there are no tests for invalid CLI arguments (e.g., --filter without =). Include a test that runs the CLI with malformed flags and asserts that the process exits with a non‑zero code and prints a helpful message.
Performance For very large JSON payloads, the current implementation may load the entire file into memory. Consider adding a test that feeds a large dataset (e.g., 10 k items) and asserts that the utility completes within a reasonable time (e.g., < 2 s).
Type assertions In the “pick non‑existent keys” test you cast ['name', 'nonExistentKey'] as (keyof DataItem)[]. This bypasses TypeScript’s safety. Instead, expose a generic overload like pick<K extends keyof T>(keys: K[]) and let the compiler infer the correct return type. Add a test that ensures the returned type omits the unknown key without needing a cast.

Example: deterministic PRNG stub

// src/utils/seededRandom.ts
export function createRNG(seed: number): () => number {
  let x = Math.sin(seed++) * 10000;
  return () => {
    x = Math.sin(x) * 10000;
    return x - Math.floor(x);
  };
}

Use this in DataDicer.sample when a seed is supplied.

🔒 Security

  • JSON parsingJSON.parse on untrusted input can throw on malformed data. Wrap parsing in a try/catch and surface a user‑friendly error (Invalid JSON input).
  • Prototype pollution – When picking/omitting keys, avoid spreading objects directly ({ ...item, ...picked }). Use Object.fromEntries with a whitelist to prevent accidental prototype injection.
  • File handling – The CLI reads a file path via --file. Validate that the path is not a directory and that the file size is within a sane limit (e.g., < 100 MB) before loading it into memory.
  • Command injection – The shebang line currently reads #!/usr/bin/e. It should be #!/usr/bin/env node to avoid invoking an unexpected interpreter.
  • Dependency hygiene – All dev dependencies are pinned to recent versions, but double‑check that no transitive dependencies have known CVEs (npm audit).

🧩 Docs / Developer Experience

  • README improvements
    • Add a Version Compatibility badge (Node ≥ 18, TypeScript ≥ 5).
    • Clarify the shape of the generic DataItem type and show a minimal example: type DataItem = Record<string, unknown>;.
    • Document error messages for each CLI flag (e.g., “--filter expects key=value”).
    • Provide a quick “Run the tests” snippet (npm test --silent).
  • CLI usage – The README shows nightly-data-dicer --filter 'age=30'. Explain that the filter currently only supports strict equality and that more complex predicates require the programmatic API.
  • Exported types – Export DataItem and the DataDicer class from the package’s entry point (src/index.ts) and list them in the README’s “Programmatic Usage” section.
  • Contribution guide – Since the utility is generated by Gemini, a short note on how to regenerate or extend it would help future contributors.

🧱 Mocks / Fakes

  • Not applicable – The current test suite does not rely on external services, so no mocks are needed.
  • Future considerations – If you later add a feature that fetches remote JSON (e.g., --url flag), introduce a mock HTTP server (e.g., nock) in the test suite and document the mock usage.

Quick win checklist

  • Fix the shebang line in src/cli.ts (#!/usr/bin/env node).
  • Remove the test block from cli.ts and ensure only CLI logic lives there.
  • Confirm DataDicer.sample uses a seeded PRNG; add a unit test if missing.
  • Add validation/error handling for CLI arguments and JSON parsing.
  • Update the README with version badges, error‑message docs, and a contribution note.

These adjustments will tighten the utility’s reliability, security posture, and developer ergonomics while preserving the solid foundation you’ve already built. 🚀

@polsala
Copy link
Copy Markdown
Owner Author

polsala commented Mar 25, 2026

🤖 Review by OPENROUTER Agent

Code Review: nightly-data-dicer

✅ What's solid

  • Well-structured implementation – The core DataDicer class provides a clean, chainable API with methods like filter, sample, pick, omit, and sort. The design follows functional principles while maintaining state immutability through intermediate copies.
  • Strong type safety – Leveraging TypeScript generics (DataItem), the utility enforces compile-time checks on data transformations. Operations like pick and omit preserve type information, enhancing developer experience.
  • Deterministic sampling – The custom seeded PRNG implementation ensures reproducible results when using .sample(count, seed), which is critical for testing scenarios.
  • Comprehensive CLI support – The command-line interface supports common operations via flags (--filter, --sample, etc.) and handles both file input and stdin, making it versatile for automation pipelines.
  • Clear documentation – The README includes installation steps, usage examples (both programmatic and CLI), and detailed option descriptions, lowering the barrier to adoption.

🧪 Tests

  • Good coverage of core functionality – Tests validate each method individually and in combination, including edge cases like empty inputs, non-existent keys, and oversized samples.
  • Deterministic sampling verification – The test suite properly validates that seeded randomization produces consistent outputs across runs:
const dicer1 = new DataDicer(sampleData);
const result1 = dicer1.sample(2, 123).execute();

const dicer2 = new DataDicer(sampleData);
const result2 = dicer2.sample(2, 123).execute();

expect(result1).toEqual(result2); // Passes due to shared seed

Actionable Feedback:

  • Add explicit tests for CLI argument parsing logic (e.g., validating behavior when conflicting options like --sort and --sort-desc are used together).
  • Consider adding performance benchmarks or stress tests for large dataset processing to ensure scalability.
  • Expand test coverage for error conditions such as invalid JSON input via CLI or unsupported sort keys.

🔒 Security

  • No hardcoded secrets – The package does not include any credentials or sensitive configuration values.
  • Safe handling of external input – CLI input parsing avoids unsafe evaluations (e.g., no use of eval). File reading and JSON parsing follow standard practices.

Actionable Feedback:

  • Validate and sanitize CLI-provided file paths to prevent directory traversal vulnerabilities (e.g., reject ../ patterns unless explicitly allowed):
# Potential risk if not sanitized:
nightly-data-dicer --file ../../etc/passwd
  • Implement size limits for input data to mitigate resource exhaustion attacks during CLI usage.

🧩 Docs/DX

  • Excellent inline examples – Both README and source code include illustrative snippets showing real-world use cases, aiding quick adoption.
  • Clear CLI help output – Running nightly-data-dicer --help provides concise guidance on available commands and syntax.

Actionable Feedback:

  • Enhance JSDoc comments in src/index.ts to describe parameter expectations and return types more precisely. For example:
/**
 * Filters data based on a predicate function.
 * @param predicate Function returning true for items to retain.
 * @returns New DataDicer instance with filtered data.
 */
filter(predicate: (item: T) => boolean): DataDicer<T>
  • Include troubleshooting tips in README (e.g., how to handle malformed JSON errors).

🧱 Mocks/Fakes

  • Self-contained randomness – As noted in test comments, the internal PRNG eliminates reliance on global Math.random(), avoiding side effects and simplifying test setup.
  • No external dependencies requiring mocking – The module operates purely on input data without network calls or file system interactions beyond the CLI entry point.

Actionable Feedback:

  • If extending functionality later (e.g., async file loading or remote data fetching), consider designing interfaces that allow dependency injection for easier unit testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant