feat: data-parity skill — algorithm guardrails and output style by suryaiyer95 · Pull Request #493 · AltimateAI/altimate-code

suryaiyer95 · 2026-03-27T00:29:44Z

Summary

Two improvements to the data-parity LLM skill based on real-world testing:

Algorithm guardrail — joindiff physically cannot see a second table when source_warehouse ≠ target_warehouse. It runs a single FULL OUTER JOIN on one connection, so it always reports 0 differences cross-database. Added a CRITICAL warning to the skill so the LLM always chooses hashdiff or auto for cross-DB comparisons.

Output style — Added explicit instruction to report facts only: counts, changed values, missing rows. No editorializing, no pitching the tool, no "this is exactly why row-level diffing matters" commentary.

Default model — Set anthropic/claude-sonnet-4-6 as the default in opencode.jsonc.

Test plan

Ran cross-DB comparison (pg_source vs pg_target) — agent now uses hashdiff automatically
Ran TPC-H migration validation — output is clean fact-reporting, no promotional commentary
Ran SQL query comparison (same-warehouse) — joindiff still used correctly for same-DB

coderabbitai · 2026-03-27T00:29:52Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 89864d83-39a2-4a29-8350-0b02c696a0aa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/data-parity-skill-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

suryaiyer95 · 2026-03-27T00:32:58Z

Closing — .opencode/ skill config and model defaults should not live in the open source repo.

- Add DataParity engine integration via native Rust bindings - Add data-diff tool for LLM agent (profile, joindiff, hashdiff, cascade, auto) - Add ClickHouse driver support - Add data-parity skill: profile-first workflow, algorithm selection guide, CRITICAL warning that joindiff cannot run cross-database (always returns 0 diffs), output style rules (facts only, no editorializing) - Gitignore .altimate-code/ (credentials) and *.node (platform binaries)

Split large tables by a date or numeric column before diffing. Each partition is diffed independently then results are aggregated. New params: - partition_column: column to split on (date or numeric) - partition_granularity: day | week | month | year (for dates) - partition_bucket_size: bucket width for numeric columns New output field: - partition_results: per-partition breakdown (identical / differ / error) Dialect-aware SQL: Postgres, Snowflake, BigQuery, ClickHouse, MySQL. Skill updated with partition guidance and examples.

When partition_column is set without partition_granularity or partition_bucket_size, groups by raw DISTINCT values. Works for any non-date, non-numeric column: status, region, country, etc. WHERE clause uses equality: col = 'value' with proper escaping.

Rust serializes ReladiffOutcome with serde tag 'mode', producing: {mode: 'diff', diff_rows: [...], stats: {rows_table1, rows_table2, exclusive_table1, exclusive_table2, updated, unchanged}} Previous code checked for {Match: {...}} / {Diff: {...}} shapes that never matched, causing partitioned diff to report all partitions as 'identical' with 0 rows. - extractStats(): check outcome.mode === 'diff', read from stats fields - mergeOutcomes(): aggregate mode-based outcomes correctly - summarize()/formatOutcome(): display mode-based shape with correct labels

github-actions bot added the contributor label Mar 27, 2026

suryaiyer95 closed this Mar 27, 2026

suryaiyer95 reopened this Mar 27, 2026

suryaiyer95 force-pushed the feat/data-parity-skill-improvements branch from 2bc4608 to 0f8c7ac Compare March 27, 2026 00:39

suryaiyer95 force-pushed the feat/data-parity-skill-improvements branch from 0f8c7ac to 7909e55 Compare March 27, 2026 00:41

suryaiyer95 added 3 commits March 26, 2026 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: data-parity skill — algorithm guardrails and output style#493

feat: data-parity skill — algorithm guardrails and output style#493
suryaiyer95 wants to merge 4 commits intomainfrom
feat/data-parity-skill-improvements

suryaiyer95 commented Mar 27, 2026

Uh oh!

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

Review skipped

Uh oh!

suryaiyer95 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suryaiyer95 commented Mar 27, 2026

Summary

Test plan

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

suryaiyer95 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 27, 2026 •

edited

Loading