Skip to content

Add SQL DB ingestion to elt-common#339

Open
WHTaylor wants to merge 2 commits into
mainfrom
321-sql-db-source
Open

Add SQL DB ingestion to elt-common#339
WHTaylor wants to merge 2 commits into
mainfrom
321-sql-db-source

Conversation

@WHTaylor

@WHTaylor WHTaylor commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ref #321

Adds the SqlDatabaseExtract class, which ingest scripts can extend to ingest data from SQL databases. This is again based on the work from elt-command-without-dlt, but diverges significantly more than other PRs for this issue. The main differences are:

  • Pulling the fields defining 'write properties' on ResourceProperties into a separate ResourceWriteProperties class to make it easier for users to specify them (see the usage example in sqldatabase)
  • The main resource_properties function on the extract class is a Generator[tuple[str, ResourceProperties]] rather than returning a dict[str, ResourceProperties] as was originally planned. This is so that the SQL extractor can hold a single active connection for the entire ingest process.

These will require some minor changes to runner.py, once that gets implemented.

Some open questions/follow up work:

  • Should it be valid to append without watermarking? Seems like it'd be a bad idea
  • Is there a better name than resource_properties for the main method of Extract classes?
  • Some of SqlDatabaseSourceConfig may need to change for real usage, and it needs E2E testing to be added once the whole pipeline is in place
  • WriteMode is currently being duplicated between extract on this branch, and typing in Add non-dlt iceberg writer #337. Once both PRs are merged I'll combine the definitions into one

Summary by CodeRabbit

Release Notes

New Features

  • Added SQL database extraction source with support for multiple table ingestion, configurable write modes (append, merge, replace), watermark-based incremental extraction, and custom merge and sort configurations.

Chores

  • Updated core package dependencies (Pydantic and Pydantic Settings).

Tests

  • Added comprehensive unit tests for SQL database extraction functionality.

@WHTaylor WHTaylor requested a review from a team as a code owner June 9, 2026 15:16
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 376fabb7-455f-49d2-aa1d-863153a96fd1

📥 Commits

Reviewing files that changed from the base of the PR and between b93399d and 3be141b.

⛔ Files ignored due to path filters (1)
  • elt-common/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • elt-common/pyproject.toml
  • elt-common/src/elt_common/extract.py
  • elt-common/src/elt_common/sources/__init__.py
  • elt-common/src/elt_common/sources/sqldatabase/__init__.py
  • elt-common/tests/unit_tests/sources/__init__.py
  • elt-common/tests/unit_tests/sources/test_sqldatabase.py

📝 Walkthrough

Walkthrough

This PR introduces a SQL database extraction framework to the ELT common library. It relocates Pydantic dependencies to the main dependency set, defines shared extraction type contracts (write modes, watermarks, resource properties), implements a Pydantic-backed SQL config, and provides an abstract base class that handles SQLAlchemy engine creation, chunked table reading with watermark filtering, and PyArrow table conversion.

Changes

SQL Database Extraction Framework

Layer / File(s) Summary
Pydantic dependencies moved to main
elt-common/pyproject.toml
Promotes pydantic>=2.12.5 and pydantic-settings>=2.14.1 from optional/dev dependency groups to main runtime dependencies to support the new extraction module.
Extraction configuration contracts
elt-common/src/elt_common/extract.py
Defines WriteMode literal union, Watermark dataclass for optional column-based watermarking, ResourceWriteProperties with merge/partition/sort configuration and validation, and ResourceProperties pairing an extractor callable with write metadata.
SQL database extraction source
elt-common/src/elt_common/sources/sqldatabase/__init__.py
SqlDatabaseSourceConfig provides Pydantic settings for connection and chunking; TableInfo captures per-table overrides; SqlDatabaseExtract abstract base class initializes SQLAlchemy, exposes chunk size, requires table_info() implementation, and provides resource_properties() generator that reflects tables, applies watermark filters, executes with yield_per chunking, and yields PyArrow tables.
SQL extraction validation
elt-common/tests/unit_tests/sources/test_sqldatabase.py
Tests verify empty resources, chunked reading across multiple chunk sizes, multi-table extraction with value equality, write properties pass-through, and watermark filtering on integer and string columns.

Poem

🐇 A Pydantic hop skips through the SQL,
Watermarks guide the arrow's swift swell,
Chunks tumble down like carrots in rows,
From databases deep to where the data flows!
—by CodeRabbit Inc., hopping with joy.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add SQL DB ingestion to elt-common' directly and accurately reflects the primary change: introducing SqlDatabaseExtract and related infrastructure for SQL database ingestion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant