Add SQL DB ingestion to elt-common#339
Conversation
Co-authored-by: Martyn Gigg <martyn.gigg@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis PR introduces a SQL database extraction framework to the ELT common library. It relocates Pydantic dependencies to the main dependency set, defines shared extraction type contracts (write modes, watermarks, resource properties), implements a Pydantic-backed SQL config, and provides an abstract base class that handles SQLAlchemy engine creation, chunked table reading with watermark filtering, and PyArrow table conversion. ChangesSQL Database Extraction Framework
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ref #321
Adds the
SqlDatabaseExtractclass, which ingest scripts can extend to ingest data from SQL databases. This is again based on the work fromelt-command-without-dlt, but diverges significantly more than other PRs for this issue. The main differences are:ResourcePropertiesinto a separateResourceWritePropertiesclass to make it easier for users to specify them (see the usage example insqldatabase)resource_propertiesfunction on the extract class is aGenerator[tuple[str, ResourceProperties]]rather than returning adict[str, ResourceProperties]as was originally planned. This is so that the SQL extractor can hold a single active connection for the entire ingest process.These will require some minor changes to
runner.py, once that gets implemented.Some open questions/follow up work:
resource_propertiesfor the main method ofExtractclasses?SqlDatabaseSourceConfigmay need to change for real usage, and it needs E2E testing to be added once the whole pipeline is in placeWriteModeis currently being duplicated betweenextracton this branch, andtypingin Add non-dlt iceberg writer #337. Once both PRs are merged I'll combine the definitions into oneSummary by CodeRabbit
Release Notes
New Features
Chores
Tests