This project is a complete end-to-end HR Employee Attrition analysis using MySQL. The dataset contains detailed employee information, and this project demonstrates skills in:
- Database design and schema creation
- Data loading and validation
- Data cleaning and preprocessing
- Basic, intermediate, and advanced analytical queries
The goal of this project is to showcase SQL expertise and analytical thinking suitable for entry-level data analyst roles.
- Source file:
hr_employee_attrition.csv - File format: CSV (UTF-8)
- Contains columns such as Age, Department, JobRole, MonthlyIncome, Attrition, OverTime, YearsAtCompany, and many others.
HR-Employee-Attrition-SQL/
│
├── dataset/
│ └── hr_employee_attrition.csv
│
├── schema/
│ └── create_schema.sql
│
├── data_loading/
│ ├── load_data.sql # Documentation of CSV import
│ └── validate_data.sql # Validation queries post-import
│
├── cleaning/
│ └── data_cleaning.sql # Data cleaning & preprocessing queries
│
├── analysis/
│ ├── basic_queries.sql # Basic analysis queries
│ ├── intermediate_queries.sql # Intermediate analysis queries
│ └── advanced_queries.sql # Advanced analysis queries
│
└── README.md # Project documentation
- Database:
hr_analytics - Table:
employee_attrition - Primary Key:
Employee_id(AUTO_INCREMENT) - Columns include all HR features such as Age, Attrition, Department, JobRole, MonthlyIncome, OverTime, YearsAtCompany, etc.
- Schema stored in:
schema/create_schema.sql
-
CSV file imported using MySQL Workbench Table Data Import Wizard
-
Header row included, Employee_id auto-generated
-
Post-import validation queries ensure:
- Correct row count
- Auto-increment integrity
- No NULL or invalid ENUM values
-
Files:
data_loading/load_data.sql(documentation)data_loading/validate_data.sql(validation queries)
- Detect duplicates excluding
Employee_id - Check NULL values across columns
- Standardize categorical data (Attrition, Gender, OverTime, BusinessTravel, MaritalStatus, Department)
- Validate numeric ranges (Age, YearsAtCompany, MonthlyIncome, PercentSalaryHike)
- Logical consistency checks (e.g., YearsInCurrentRole ≤ YearsAtCompany)
- Derived columns: age groups, income bands, attrition flags
- File:
cleaning/data_cleaning.sql
- Total employees
- Attrition counts & percentages
- Department-wise counts
- Average income by job role
- Distinct job roles
- File:
analysis/basic_queries.sql
- Attrition rate by department
- Top 5 job roles with highest attrition
- Average income by Attrition status
- Employees with >10 years who left
- Average age by job level
- File:
analysis/intermediate_queries.sql
- Department ranking by attrition rate
- Overtime correlation with attrition
- High-risk employee segments based on multiple factors
- File:
analysis/advanced_queries.sql
- MySQL 8.x
- MySQL Workbench
- GitHub (for version control and project publishing)
- Dataset: HR Employee Attrition CSV
- Database design & schema creation
- Data loading, validation, and cleaning
- Advanced SQL queries: aggregates, window functions, conditional logic
- Analytical thinking for HR attrition insights
- Professional GitHub project structure
- Clone the repository from GitHub.
- Use
schema/create_schema.sqlto create the database and table. - Load the CSV using MySQL Workbench as documented in
data_loading/load_data.sql. - Run
data_loading/validate_data.sqlto verify successful import. - Execute queries from
cleaning/data_cleaning.sqlfor data quality checks. - Perform analyses using
analysis/*.sqlfiles.
Author: S Mohan