Skip to content

ashu273k/Sheet2Neon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sheet2Neon

Data Engineering Assignment: Google Sheets β†’ NeonDB Migration

πŸ“‹ Project Overview

This project migrates Google Sheets workflows to PostgreSQL/NeonDB with automated ETL, SQL optimization, and Google Apps Script automation.


πŸ—‚οΈ Project Structure

sheet2neon/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ etl/
β”‚ β”‚ └── etl_pipeline.py # Main ETL orchestrator
β”‚ β”œβ”€β”€ sql/
β”‚ β”‚ β”œβ”€β”€ queries.sql # Analytics queries
β”‚ β”‚ └── views_and_procedures.sql # Views & stored procedures
β”‚ └── schemas/
β”‚ β”œβ”€β”€ student_enrollment_schema.sql
β”‚ └── seed_data.sql
β”œβ”€β”€ tests/
β”‚ β”œβ”€β”€ test_etl.py # Unit tests
β”‚ β”œβ”€β”€ test_api_integration.py # Integration tests
β”‚ └── test_sql_performance.py # Performance tests
β”œβ”€β”€ config/
β”‚ β”œβ”€β”€ credentials.json # Google API credentials (GITIGNORED)
β”‚ └── settings.py # Configuration
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ raw/ # Raw CSV data
β”‚ └── processed/ # Transformed data
β”œβ”€β”€ logs/
β”‚ β”œβ”€β”€ etl_pipeline.log # ETL execution logs
β”‚ └── etl_log.json # Structured ETL logs
β”œβ”€β”€ docs/
β”‚ β”œβ”€β”€ ER_DIAGRAM.png # Database design
β”‚ β”œβ”€β”€ DATA_AUDIT_REPORT.md # Data quality analysis
β”‚ └── API_DOCUMENTATION.md # API reference
β”œβ”€β”€ google_app_scripts/
β”‚ └── auto_registration.gs # Google Apps Script code
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env.example # Environment variables template
└── README.md # This file

πŸš€ Quick Start

Prerequisites

  • Python 3.9+
  • PostgreSQL/NeonDB account
  • Google Cloud Project with Sheets API enabled
  • Git

1. Clone Repository

git clone https://github.com/ashu273k/Sheet2Neon
cd sheet2neon

2. Set Up Environment

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create .env file
cp .env.example .env
# Edit .env with your NeonDB credentials

3. Initialize Database

# Create schema
psql -h <your-host> -U <your-user> -d neondb -f src/schemas/student_enrollment_schema.sql

# Load seed data
psql -h <your-host> -U <your-user> -d neondb -f src/schemas/seed_data.sql

4. Test Connection

python src/scripts/test_connection.py
# Expected output: βœ… Connected to PostgreSQL/NeonDB!

5. Run ETL Pipeline

python src/etl/etl_pipeline.py
# Manually uncomment and configure source file in main block

6. Run Tests

pytest tests/ -v

πŸ“Š Database Schema

Entities

Table Purpose Relationships
department Department info 1:N to Student, 1:N to Course
student Student info N:M to Course (via Enrollment)
course Course info N:M to Student (via Enrollment)
enrollment Course registrations Links Student ↔ Course

ER Diagram

See docs/ER_DIAGRAM.png


πŸ”„ ETL Pipeline

Flow

Google Sheets / CSV
↓
EXTRACT (Read raw data)
↓
TRANSFORM (Clean, validate, deduplicate)
↓
LOAD (Insert into PostgreSQL)
↓
NeonDB (Final destination)

Running ETL

from src.etl.etl_pipeline import ETLPipeline

pipeline = ETLPipeline()
pipeline.run_etl('data/raw/students.csv', entity_type='students')

Logging

  • Logs: logs/etl_pipeline.log
  • Structured data: logs/etl_log.json

πŸ“ˆ SQL Queries & Optimization

Common Queries

Run from src/sql/queries.sql:

-- Students per department
SELECT d.name, COUNT(s.student_id)
FROM department d
LEFT JOIN student s ON d.department_id = s.department_id
GROUP BY d.department_id, d.name;

-- Student GPA calculation
SELECT s.name, ROUND(AVG(grade_points), 2) AS gpa
FROM student s
LEFT JOIN enrollment e ON s.student_id = e.student_id
GROUP BY s.student_id, s.name;

Views

  • v_student_summary - Student overview with GPA
  • v_course_analytics - Course enrollment metrics

Stored Procedures

  • register_student_course() - Enroll student in course
  • calculate_student_gpa() - Compute student GPA

πŸ€– Google Apps Script Automation

Setup

  1. Create Google Apps Script project
  2. Copy code from google_app_scripts/auto_registration.gs
  3. Set triggers: onEdit trigger on Google Sheet
  4. Configure API endpoint in script

Workflow

New row added to Google Sheet
↓
onEdit trigger fires
↓
Validate row data
↓
If valid: Register in NeonDB
If invalid: Highlight row + send email
↓
Mark status in Sheet

βœ… Testing

Run Tests

# All tests
pytest tests/ -v

# Specific test file
pytest tests/test_etl.py -v

# With coverage
pytest tests/ --cov=src

Test Coverage

  • βœ… Validation functions
  • βœ… Transformation logic
  • βœ… Database operations
  • βœ… SQL queries
  • βœ… API integration

πŸ“š Documentation

  • Data Audit: docs/DATA_AUDIT_REPORT.md
  • Database Design: docs/ER_DIAGRAM.png + docs/SCHEMA_DOCUMENTATION.md
  • ETL Process: docs/ETL_PIPELINE.md
  • API Guide: docs/API_DOCUMENTATION.md

πŸ” Security

  • .env file added to .gitignore - Never commit credentials
  • Google credentials in config/ - Added to .gitignore
  • Use environment variables for all sensitive data
  • Validate all user inputs

πŸ“ž Troubleshooting

Connection Issues

Error: "Could not connect to PostgreSQL"
Solution: Check DATABASE_URL in .env file

ETL Errors

Error: "Validation failed"
Solution: Check data format in CSV - review logs/etl_pipeline.log

Google API Issues

Error: "Credentials not found"
Solution: Download credentials.json from Google Cloud Console

πŸ“ž Support

For issues:

  1. Check logs/ folder for error messages
  2. Review test output (pytest -v)
  3. Check README troubleshooting section
  4. Contact instructor with logs attached

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors