feat(gemini): A type-safe TypeScript utility to slice and dice structured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis. by polsala · Pull Request #3867 · polsala/ApocalypsAI

polsala · 2026-03-21T05:06:26Z

Implementation Summary

Utility: nightly-data-dicer
Provider: gemini
Location: typescript-utils/nightly-nightly-data-dicer
Files Created: 7
Description: A type-safe TypeScript utility to slice and dice structured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.

Rationale

Automated proposal from the Gemini generator delivering a fresh community utility.
This utility was generated using the gemini AI provider.

Why safe to merge

Utility is isolated to typescript-utils/nightly-nightly-data-dicer.
README + tests ship together (see folder contents).
No secrets or credentials touched.
All changes are additive and self-contained.

Test Plan

Follow the instructions in the generated README at typescript-utils/nightly-nightly-data-dicer/README.md
Run tests located in typescript-utils/nightly-nightly-data-dicer/tests/

Links

Generated docs and examples committed alongside this change.

Mock Justification

Not applicable; generator did not introduce new mocks.

…ured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.

polsala · 2026-03-21T05:06:51Z

🤖 Review by GROQ Agent

✅ What's solid

Feature set & API – The utility offers a clean, chainable API (filter, sample, pick, omit, sort, execute) that matches the description in the README.
Type safety – The project is set up with "strict": true in tsconfig.json and the public API is exported from src/index.ts, which should give consumers compile‑time guarantees.
CLI entry point – The bin field in package.json correctly points to dist/cli.js, enabling global installation.
Test coverage – The test suite exercises most public methods, including edge‑cases (empty data, over‑sampling, non‑existent keys, chained operations).
Deterministic sampling – Tests rely on a seed argument, which is a good design for reproducible data generation.

🧪 Tests

Area	Feedback
Test placement	The `src/cli.ts` file contains a large block of test code (the `it(...)` blocks). Tests belong in `tests/` only. Move the entire block to `tests/index.test.ts` and keep `cli.ts` focused on argument parsing and execution.
Seeded PRNG	Verify that `DataDicer.sample(count, seed?)` uses a deterministic PRNG (e.g., a simple xorshift or `seedrandom`). If it falls back to `Math.random` when a seed is supplied, the “deterministic” test will be flaky. Add a unit test that explicitly checks two instances with the same seed produce identical sequences.
Boundary conditions	Add a test for `sample(0)` and `sample(-1)` to ensure the method validates its arguments and returns an empty array or throws a clear error.
Error handling	Currently there are no tests for invalid CLI arguments (e.g., `--filter` without `=`). Include a test that runs the CLI with malformed flags and asserts that the process exits with a non‑zero code and prints a helpful message.
Performance	For very large JSON payloads, the current implementation may load the entire file into memory. Consider adding a test that feeds a large dataset (e.g., 10 k items) and asserts that the utility completes within a reasonable time (e.g., < 2 s).
Type assertions	In the “pick non‑existent keys” test you cast `['name', 'nonExistentKey'] as (keyof DataItem)[]`. This bypasses TypeScript’s safety. Instead, expose a generic overload like `pick<K extends keyof T>(keys: K[])` and let the compiler infer the correct return type. Add a test that ensures the returned type omits the unknown key without needing a cast.

Example: deterministic PRNG stub

// src/utils/seededRandom.ts
export function createRNG(seed: number): () => number {
  let x = Math.sin(seed++) * 10000;
  return () => {
    x = Math.sin(x) * 10000;
    return x - Math.floor(x);
  };
}

Use this in DataDicer.sample when a seed is supplied.

🔒 Security

JSON parsing – JSON.parse on untrusted input can throw on malformed data. Wrap parsing in a try/catch and surface a user‑friendly error (Invalid JSON input).
Prototype pollution – When picking/omitting keys, avoid spreading objects directly ({ ...item, ...picked }). Use Object.fromEntries with a whitelist to prevent accidental prototype injection.
File handling – The CLI reads a file path via --file. Validate that the path is not a directory and that the file size is within a sane limit (e.g., < 100 MB) before loading it into memory.
Command injection – The shebang line currently reads #!/usr/bin/e. It should be #!/usr/bin/env node to avoid invoking an unexpected interpreter.
Dependency hygiene – All dev dependencies are pinned to recent versions, but double‑check that no transitive dependencies have known CVEs (npm audit).

🧩 Docs / Developer Experience

README improvements
- Add a Version Compatibility badge (Node ≥ 18, TypeScript ≥ 5).
- Clarify the shape of the generic DataItem type and show a minimal example: type DataItem = Record<string, unknown>;.
- Document error messages for each CLI flag (e.g., “--filter expects key=value”).
- Provide a quick “Run the tests” snippet (npm test --silent).
CLI usage – The README shows nightly-data-dicer --filter 'age=30'. Explain that the filter currently only supports strict equality and that more complex predicates require the programmatic API.
Exported types – Export DataItem and the DataDicer class from the package’s entry point (src/index.ts) and list them in the README’s “Programmatic Usage” section.
Contribution guide – Since the utility is generated by Gemini, a short note on how to regenerate or extend it would help future contributors.

🧱 Mocks / Fakes

Not applicable – The current test suite does not rely on external services, so no mocks are needed.
Future considerations – If you later add a feature that fetches remote JSON (e.g., --url flag), introduce a mock HTTP server (e.g., nock) in the test suite and document the mock usage.

Quick win checklist

Fix the shebang line in src/cli.ts (#!/usr/bin/env node).
Remove the test block from cli.ts and ensure only CLI logic lives there.
Confirm DataDicer.sample uses a seeded PRNG; add a unit test if missing.
Add validation/error handling for CLI arguments and JSON parsing.
Update the README with version badges, error‑message docs, and a contribution note.

These adjustments will tighten the utility’s reliability, security posture, and developer ergonomics while preserving the solid foundation you’ve already built. 🚀

polsala · 2026-03-25T05:27:23Z

🤖 Review by OPENROUTER Agent

Code Review: nightly-data-dicer

✅ What's solid

Well-structured implementation – The core DataDicer class provides a clean, chainable API with methods like filter, sample, pick, omit, and sort. The design follows functional principles while maintaining state immutability through intermediate copies.
Strong type safety – Leveraging TypeScript generics (DataItem), the utility enforces compile-time checks on data transformations. Operations like pick and omit preserve type information, enhancing developer experience.
Deterministic sampling – The custom seeded PRNG implementation ensures reproducible results when using .sample(count, seed), which is critical for testing scenarios.
Comprehensive CLI support – The command-line interface supports common operations via flags (--filter, --sample, etc.) and handles both file input and stdin, making it versatile for automation pipelines.
Clear documentation – The README includes installation steps, usage examples (both programmatic and CLI), and detailed option descriptions, lowering the barrier to adoption.

🧪 Tests

Good coverage of core functionality – Tests validate each method individually and in combination, including edge cases like empty inputs, non-existent keys, and oversized samples.
Deterministic sampling verification – The test suite properly validates that seeded randomization produces consistent outputs across runs:

const dicer1 = new DataDicer(sampleData);
const result1 = dicer1.sample(2, 123).execute();

const dicer2 = new DataDicer(sampleData);
const result2 = dicer2.sample(2, 123).execute();

expect(result1).toEqual(result2); // Passes due to shared seed

Actionable Feedback:

Add explicit tests for CLI argument parsing logic (e.g., validating behavior when conflicting options like --sort and --sort-desc are used together).
Consider adding performance benchmarks or stress tests for large dataset processing to ensure scalability.
Expand test coverage for error conditions such as invalid JSON input via CLI or unsupported sort keys.

🔒 Security

No hardcoded secrets – The package does not include any credentials or sensitive configuration values.
Safe handling of external input – CLI input parsing avoids unsafe evaluations (e.g., no use of eval). File reading and JSON parsing follow standard practices.

Actionable Feedback:

Validate and sanitize CLI-provided file paths to prevent directory traversal vulnerabilities (e.g., reject ../ patterns unless explicitly allowed):

# Potential risk if not sanitized:
nightly-data-dicer --file ../../etc/passwd

Implement size limits for input data to mitigate resource exhaustion attacks during CLI usage.

🧩 Docs/DX

Excellent inline examples – Both README and source code include illustrative snippets showing real-world use cases, aiding quick adoption.
Clear CLI help output – Running nightly-data-dicer --help provides concise guidance on available commands and syntax.

Actionable Feedback:

Enhance JSDoc comments in src/index.ts to describe parameter expectations and return types more precisely. For example:

/**
 * Filters data based on a predicate function.
 * @param predicate Function returning true for items to retain.
 * @returns New DataDicer instance with filtered data.
 */
filter(predicate: (item: T) => boolean): DataDicer<T>

Include troubleshooting tips in README (e.g., how to handle malformed JSON errors).

🧱 Mocks/Fakes

Self-contained randomness – As noted in test comments, the internal PRNG eliminates reliance on global Math.random(), avoiding side effects and simplifying test setup.
No external dependencies requiring mocking – The module operates purely on input data without network calls or file system interactions beyond the CLI entry point.

Actionable Feedback:

If extending functionality later (e.g., async file loading or remote data fetching), consider designing interfaces that allow dependency injection for easier unit testing.

feat(gemini): A type-safe TypeScript utility to slice and dice struct…

30ea9f0

…ured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gemini): A type-safe TypeScript utility to slice and dice structured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.#3867

feat(gemini): A type-safe TypeScript utility to slice and dice structured data (JSON/arrays of objects) into smaller, filtered, or randomized subsets, perfect for testing and analysis.#3867
polsala wants to merge 1 commit intomainfrom
ai/gemini-20260321-0506

polsala commented Mar 21, 2026

Uh oh!

polsala commented Mar 21, 2026

Uh oh!

polsala commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

polsala commented Mar 21, 2026

Implementation Summary

Rationale

Why safe to merge

Test Plan

Links

Mock Justification

Uh oh!

polsala commented Mar 21, 2026

🤖 Review by GROQ Agent

✅ What's solid

🧪 Tests

Example: deterministic PRNG stub

🔒 Security

🧩 Docs / Developer Experience

🧱 Mocks / Fakes

Quick win checklist

Uh oh!

polsala commented Mar 25, 2026

🤖 Review by OPENROUTER Agent

Code Review: nightly-data-dicer

✅ What's solid

🧪 Tests

Actionable Feedback:

🔒 Security

Actionable Feedback:

🧩 Docs/DX

Actionable Feedback:

🧱 Mocks/Fakes

Actionable Feedback:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant