
- Google Colab
- CSV file (hosted on GitHub)
The dataset used is a retail transaction dataset containing:
- Transaction ID: Unique sale ID
- Date: Date of purchase
- Customer ID: ID of the buyer
- Gender: Gender of the buyer
- Age: Age of the customer
- Product Category: Item category
- Quantity: Number of items purchased
- Price per Unit: Price of a single item
- Total Amount: Final billed amount (Quantity Γ Price per Unit)
import pandas as pd
import matplotlib.pyplot as plt
> pandas is used for data manipulation, and matplotlib for visualizations.
---
β
Step 2: Load the Dataset from GitHub
url = "https://raw.githubusercontent.com/LegendSeyi/Retail-Data-Analysis/main/retail_sales_dataset.csv"
data = pd.read_csv(url)
> The dataset is loaded directly into a Pandas DataFrame using the raw GitHub link.
---
β
Step 3: Clean Column Names
data.columns = data.columns.str.strip()
> This removes unwanted spaces in column headers, which helps prevent KeyErrors later.
---
β
Step 4: Convert Date Column to Datetime Format
data['Date'] = pd.to_datetime(data['Date'])
> Allows easy extraction of month and sorting based on time.
---
β
Step 5: Extract Month
data['Month'] = data['Date'].dt.month
> Adds a new Month column to analyze month-wise sales.
---
β
Step 6: Analyze Monthly Sales
monthly_sales = data.groupby('Month')['Total Amount'].sum()
> Groups sales data by month and calculates total sales in each month.
---
β
Step 7: Analyze Sales by Product Category
category_sales = data.groupby('Product Category')['Total Amount'].sum().sort_values(ascending=False)
> Finds out which product category brings in the most revenue.
---
β
Step 8: Analyze Sales by Gender
gender_sales = data.groupby('Gender')['Total Amount'].sum()
> Understands the sales distribution between Male and Female customers.
---
β
Step 9: Visualize the Data
π Monthly Sales Bar Chart
monthly_sales.plot(kind='bar', color='skyblue')
π¦ Product Category Sales Bar Chart
category_sales.plot(kind='bar', color='lightgreen')
π» Gender-wise Sales Pie Chart
gender_sales.plot(kind='pie', autopct='%1.1f%%')
> These visualizations help interpret the trends more intuitively.
---
π Visual Output (See /screenshots/ Folder)
Visualization File Name
Dataset Preview preview.png
Printed Column Names column_names.png
Monthly Sales Output monthly_sales_output.png
Product Category Sales Output category_sales_output.png
Gender Sales Output gender_sales_output.png
Monthly Sales Bar Chart monthly_sales_chart.png
Product Category Bar Chart category_sales_chart.png
Gender Pie Chart gender_sales_chart.png
---
π§ Concepts Used
Concept Used in Code
read_csv() Load dataset
str.strip() Clean headers
to_datetime() Convert Date
groupby() + sum() Aggregations
plot(kind='bar') Bar Charts
plot(kind='pie') Pie Chart
sort_values() Sorting results
---
π Project Folder Structure
task5-retail-data-analysis/
β
βββ retail_sales_analysis.ipynb # β
Final Google Colab notebook
βββ README.md # β
Full documentation
βββ screenshots/ # β
All required visual outputs
β βββ preview.png
β βββ column_names.png
β βββ monthly_sales_output.png
β βββ category_sales_output.png
β βββ gender_sales_output.png
β βββ monthly_sales_chart.png
β βββ category_sales_chart.png
β βββ gender_sales_chart.png
---
π How to Run the Code
1. Open Google Colab
2. Upload the retail_sales_analysis.ipynb
3. Press Runtime β Run all
4. Observe printed insights and generated charts
5. Save as PDF or take screenshots for documentation
---
π§βπΌ Submitted by: Rohith K N
Python Intern @ [Company/Program Name]
Data Analysis Enthusiast | Web Developer
π GitHub: https://github.com/Rohith-a441
---