Tarun Kumar Bosupally Tarun-B-12

Hi, I'm Tarun 👋

Data Engineer and AI Data Engineer based in Dallas, TX

I build production-grade data systems where AI meets data engineering. My work spans real-time streaming pipelines, RAG systems, agent-orchestrated observability, analytics engineering, and cloud data platforms on AWS.

Currently at Fifth Third Bank building ELT platforms on AWS and Snowflake processing 2B+ daily transactions.

What I Build

Area	What I Work On
AI Data Engineering	RAG pipelines, LLM agents, tool calling, hallucination detection
Data Engineering	Kafka streaming, ETL/ELT, cloud pipelines, AWS S3 and EC2
Analytics Engineering	dbt Core, dimensional modeling, KPI marts, data contracts
Data Quality	Observability pipelines, validation frameworks, health scoring
BI and Reporting	Power BI, DAX, KPI dashboards, executive reporting

Featured Projects

FinDocRAG — Financial Document RAG Pipeline on AWS

Natural language question answering over real SEC 10-K filings from JPMorgan, Goldman Sachs, and Apple. Deployed live on AWS EC2 with source citations and hallucination detection on every answer.

Stack: Python, AWS S3, AWS EC2, ChromaDB, sentence-transformers, Claude API

View Project | Live Demo

Retail Data Quality and Observability Pipeline

Agent-orchestrated pipeline using Claude API tool calling to autonomously profile, validate, explain anomalies in plain English, and generate HTML quality reports. Tracks data health score over time in DuckDB.

Stack: Python, Claude API (Haiku), DuckDB, Great Expectations, matplotlib

View Project

RetailStream — Real-Time Kafka Streaming Pipeline

End-to-end streaming pipeline ingesting retail transaction events through Confluent Cloud Kafka, transforming in Python, storing results in SQLite, and visualizing live KPIs on a Streamlit dashboard that auto-refreshes every 5 seconds.

Stack: Python, Confluent Cloud Kafka, SQLite, Streamlit

View Project

ecommerce KPI Mart (dbt + DuckDB)

Production analytics engineering pipeline with dbt Core, star schema dimensional modeling, KPI definitions, dbt tests, and documentation. Built on a 100k+ row ecommerce dataset.

Stack: dbt Core, DuckDB, SQL, dimensional modeling

View Project

AI Transaction Classifier

LLM-powered benchmarking system comparing Claude API against a rule-based baseline on 2,000 financial transactions. LLM achieved 95.3% accuracy vs 58.7% baseline with structured evaluation framework covering per-category precision, recall, F1, confidence scoring, and cost analysis at $0.0003 per transaction.

Stack: Python, Claude API, pandas, scikit-learn, evaluation framework

View Project

Revenue Leakage and Margin Dashboard

End-to-end margin analysis pipeline identifying revenue leakage across 9,994 transactions using SQL, Python, and DuckDB. Includes a Tableau dashboard with KPI drill-down.

Stack: Python, SQL, DuckDB, Tableau

View Project

Tech Stack

Languages: Python, SQL, Scala, Bash, DAX

Data Engineering: Apache Spark, Kafka, Airflow, dbt Core, AWS Glue, Azure Data Factory, Databricks, Fivetran, Delta Lake

Cloud: AWS (S3, EC2, Glue, Redshift, EMR, Lambda), Azure (ADF, Synapse, ADLS), GCP (BigQuery)

Databases: Snowflake, PostgreSQL, DuckDB, SQL Server, Redshift, Hive, MongoDB

AI and LLMs: Claude API, RAG pipelines, LLM agents, tool calling, ChromaDB, sentence-transformers, prompt engineering, LLM evaluation

BI: Power BI, Tableau, Looker, Streamlit, DAX, SSRS

DevOps: Docker, Kubernetes, Terraform, GitHub Actions, CI/CD

Domain Experience

Banking and Financial Services, Retail and eCommerce, Healthcare, Insurance, HR Tech, Supply Chain, Procurement

Connect

Email: buildwithtarun@gmail.com

LinkedIn: linkedin.com/in/tarun-b-k

Location: Dallas, TX

Open to: Data Engineer, AI Data Engineer, Analytics Engineer, Senior Data Engineer roles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tarun Kumar Bosupally Tarun-B-12

Block or report Tarun-B-12

Hi, I'm Tarun 👋

What I Build

Featured Projects

FinDocRAG — Financial Document RAG Pipeline on AWS

Retail Data Quality and Observability Pipeline

RetailStream — Real-Time Kafka Streaming Pipeline

ecommerce KPI Mart (dbt + DuckDB)

AI Transaction Classifier

Revenue Leakage and Margin Dashboard

Tech Stack

Domain Experience

Connect

Pinned Loading

Uh oh!