Skip to content
View Tarun-B-12's full-sized avatar

Block or report Tarun-B-12

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Tarun-B-12/README.md

Hi, I'm Tarun 👋

Data Engineer and AI Data Engineer based in Dallas, TX

I build production-grade data systems where AI meets data engineering. My work spans real-time streaming pipelines, RAG systems, agent-orchestrated observability, analytics engineering, and cloud data platforms on AWS.

Currently at Fifth Third Bank building ELT platforms on AWS and Snowflake processing 2B+ daily transactions.


What I Build

Area What I Work On
AI Data Engineering RAG pipelines, LLM agents, tool calling, hallucination detection
Data Engineering Kafka streaming, ETL/ELT, cloud pipelines, AWS S3 and EC2
Analytics Engineering dbt Core, dimensional modeling, KPI marts, data contracts
Data Quality Observability pipelines, validation frameworks, health scoring
BI and Reporting Power BI, DAX, KPI dashboards, executive reporting

Featured Projects

FinDocRAG — Financial Document RAG Pipeline on AWS

Natural language question answering over real SEC 10-K filings from JPMorgan, Goldman Sachs, and Apple. Deployed live on AWS EC2 with source citations and hallucination detection on every answer.

Stack: Python, AWS S3, AWS EC2, ChromaDB, sentence-transformers, Claude API

View Project | Live Demo


Retail Data Quality and Observability Pipeline

Agent-orchestrated pipeline using Claude API tool calling to autonomously profile, validate, explain anomalies in plain English, and generate HTML quality reports. Tracks data health score over time in DuckDB.

Stack: Python, Claude API (Haiku), DuckDB, Great Expectations, matplotlib

View Project


RetailStream — Real-Time Kafka Streaming Pipeline

End-to-end streaming pipeline ingesting retail transaction events through Confluent Cloud Kafka, transforming in Python, storing results in SQLite, and visualizing live KPIs on a Streamlit dashboard that auto-refreshes every 5 seconds.

Stack: Python, Confluent Cloud Kafka, SQLite, Streamlit

View Project


ecommerce KPI Mart (dbt + DuckDB)

Production analytics engineering pipeline with dbt Core, star schema dimensional modeling, KPI definitions, dbt tests, and documentation. Built on a 100k+ row ecommerce dataset.

Stack: dbt Core, DuckDB, SQL, dimensional modeling

View Project


AI Transaction Classifier

LLM-powered benchmarking system comparing Claude API against a rule-based baseline on 2,000 financial transactions. LLM achieved 95.3% accuracy vs 58.7% baseline with structured evaluation framework covering per-category precision, recall, F1, confidence scoring, and cost analysis at $0.0003 per transaction.

Stack: Python, Claude API, pandas, scikit-learn, evaluation framework

View Project


Revenue Leakage and Margin Dashboard

End-to-end margin analysis pipeline identifying revenue leakage across 9,994 transactions using SQL, Python, and DuckDB. Includes a Tableau dashboard with KPI drill-down.

Stack: Python, SQL, DuckDB, Tableau

View Project


Tech Stack

Languages: Python, SQL, Scala, Bash, DAX

Data Engineering: Apache Spark, Kafka, Airflow, dbt Core, AWS Glue, Azure Data Factory, Databricks, Fivetran, Delta Lake

Cloud: AWS (S3, EC2, Glue, Redshift, EMR, Lambda), Azure (ADF, Synapse, ADLS), GCP (BigQuery)

Databases: Snowflake, PostgreSQL, DuckDB, SQL Server, Redshift, Hive, MongoDB

AI and LLMs: Claude API, RAG pipelines, LLM agents, tool calling, ChromaDB, sentence-transformers, prompt engineering, LLM evaluation

BI: Power BI, Tableau, Looker, Streamlit, DAX, SSRS

DevOps: Docker, Kubernetes, Terraform, GitHub Actions, CI/CD


Domain Experience

Banking and Financial Services, Retail and eCommerce, Healthcare, Insurance, HR Tech, Supply Chain, Procurement


Connect

Email: buildwithtarun@gmail.com

LinkedIn: linkedin.com/in/tarun-b-k

Location: Dallas, TX

Open to: Data Engineer, AI Data Engineer, Analytics Engineer, Senior Data Engineer roles

Pinned Loading

  1. financial-doc-rag-aws financial-doc-rag-aws Public

    Production RAG pipeline on AWS EC2 and S3 answering natural language questions against SEC 10-K filings with source citations and hallucination detection. Live demo included.

    Python

  2. retail-data-quality-observability retail-data-quality-observability Public

    Agent-orchestrated data quality pipeline using Claude API tool calling to profile, validate, explain anomalies in plain English, and track health scores over time across 1M+ rows.

    Python

  3. retailstream-kafka-pipeline retailstream-kafka-pipeline Public

    Real-time retail event streaming pipeline with Confluent Cloud Kafka, Python consumer, SQLite aggregation, and a live Streamlit dashboard showing revenue and transaction KPIs.

    Python

  4. ecommerce-kpi-mart ecommerce-kpi-mart Public

    Analytics engineering pipeline with dbt Core, star schema dimensional modeling, KPI mart, dbt tests and documentation built on a 100k+ row ecommerce dataset using DuckDB.

    Python

  5. transaction-classifier-ai transaction-classifier-ai Public

    LLM-powered transaction classification system with a structured evaluation framework benchmarking Claude API against a rule-based baseline with confidence scoring and cost tracking.

    Python

  6. revenue-margin-analysis revenue-margin-analysis Public

    End-to-end margin analysis identifying $566k in revenue leakage across 9,994 transactions using Python, DuckDB, SQL, and Tableau.

    Python