This project migrates Google Sheets workflows to PostgreSQL/NeonDB with automated ETL, SQL optimization, and Google Apps Script automation.
sheet2neon/
βββ src/
β βββ etl/
β β βββ etl_pipeline.py # Main ETL orchestrator
β βββ sql/
β β βββ queries.sql # Analytics queries
β β βββ views_and_procedures.sql # Views & stored procedures
β βββ schemas/
β βββ student_enrollment_schema.sql
β βββ seed_data.sql
βββ tests/
β βββ test_etl.py # Unit tests
β βββ test_api_integration.py # Integration tests
β βββ test_sql_performance.py # Performance tests
βββ config/
β βββ credentials.json # Google API credentials (GITIGNORED)
β βββ settings.py # Configuration
βββ data/
β βββ raw/ # Raw CSV data
β βββ processed/ # Transformed data
βββ logs/
β βββ etl_pipeline.log # ETL execution logs
β βββ etl_log.json # Structured ETL logs
βββ docs/
β βββ ER_DIAGRAM.png # Database design
β βββ DATA_AUDIT_REPORT.md # Data quality analysis
β βββ API_DOCUMENTATION.md # API reference
βββ google_app_scripts/
β βββ auto_registration.gs # Google Apps Script code
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ README.md # This file
- Python 3.9+
- PostgreSQL/NeonDB account
- Google Cloud Project with Sheets API enabled
- Git
git clone https://github.com/ashu273k/Sheet2Neon
cd sheet2neon# Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
cp .env.example .env
# Edit .env with your NeonDB credentials# Create schema
psql -h <your-host> -U <your-user> -d neondb -f src/schemas/student_enrollment_schema.sql
# Load seed data
psql -h <your-host> -U <your-user> -d neondb -f src/schemas/seed_data.sqlpython src/scripts/test_connection.py
# Expected output: β
Connected to PostgreSQL/NeonDB!python src/etl/etl_pipeline.py
# Manually uncomment and configure source file in main blockpytest tests/ -v| Table | Purpose | Relationships |
|---|---|---|
department |
Department info | 1:N to Student, 1:N to Course |
student |
Student info | N:M to Course (via Enrollment) |
course |
Course info | N:M to Student (via Enrollment) |
enrollment |
Course registrations | Links Student β Course |
See docs/ER_DIAGRAM.png
Google Sheets / CSV
β
EXTRACT (Read raw data)
β
TRANSFORM (Clean, validate, deduplicate)
β
LOAD (Insert into PostgreSQL)
β
NeonDB (Final destination)
from src.etl.etl_pipeline import ETLPipeline
pipeline = ETLPipeline()
pipeline.run_etl('data/raw/students.csv', entity_type='students')- Logs:
logs/etl_pipeline.log - Structured data:
logs/etl_log.json
Run from src/sql/queries.sql:
-- Students per department
SELECT d.name, COUNT(s.student_id)
FROM department d
LEFT JOIN student s ON d.department_id = s.department_id
GROUP BY d.department_id, d.name;
-- Student GPA calculation
SELECT s.name, ROUND(AVG(grade_points), 2) AS gpa
FROM student s
LEFT JOIN enrollment e ON s.student_id = e.student_id
GROUP BY s.student_id, s.name;v_student_summary- Student overview with GPAv_course_analytics- Course enrollment metrics
register_student_course()- Enroll student in coursecalculate_student_gpa()- Compute student GPA
- Create Google Apps Script project
- Copy code from
google_app_scripts/auto_registration.gs - Set triggers:
onEdittrigger on Google Sheet - Configure API endpoint in script
New row added to Google Sheet
β
onEdit trigger fires
β
Validate row data
β
If valid: Register in NeonDB
If invalid: Highlight row + send email
β
Mark status in Sheet
# All tests
pytest tests/ -v
# Specific test file
pytest tests/test_etl.py -v
# With coverage
pytest tests/ --cov=src- β Validation functions
- β Transformation logic
- β Database operations
- β SQL queries
- β API integration
- Data Audit:
docs/DATA_AUDIT_REPORT.md - Database Design:
docs/ER_DIAGRAM.png+docs/SCHEMA_DOCUMENTATION.md - ETL Process:
docs/ETL_PIPELINE.md - API Guide:
docs/API_DOCUMENTATION.md
.envfile added to.gitignore- Never commit credentials- Google credentials in
config/- Added to.gitignore - Use environment variables for all sensitive data
- Validate all user inputs
Error: "Could not connect to PostgreSQL"
Solution: Check DATABASE_URL in .env file
Error: "Validation failed"
Solution: Check data format in CSV - review logs/etl_pipeline.log
Error: "Credentials not found"
Solution: Download credentials.json from Google Cloud Console
For issues:
- Check
logs/folder for error messages - Review test output (
pytest -v) - Check README troubleshooting section
- Contact instructor with logs attached