Add a PR section for Evans in the README. by ericgitangu · Pull Request #1 · ericgitangu/data_analysis

ericgitangu · 2025-01-21T15:16:09Z

Pull Request: Kwanza Tukule Data Analyst Assessment Submission

Summary

This pull request contains the completed submission for the Kwanza Tukule Data Analyst Assessment. The solution meets all the outlined criteria, providing insights and actionable recommendations based on the given dataset. Below is a breakdown of the work completed and how it aligns with the assignment's requirements.

Criteria and Achievements

Section 1: Data Cleaning and Preparation (20 points)

Criteria: Inspect the dataset for missing values, duplicates, and inconsistent data types. Create a Month-Year column.
Achievements:
- Missing values and duplicates were identified and addressed.
- A Month-Year column was successfully added using feature engineering.
- Validation:
  - Logs are color-coded for clarity in the console.
  - Tests confirm the integrity of data cleaning and feature engineering.

Section 2: Exploratory Data Analysis (30 points)

Criteria: Provide insights into total quantity and value by:
- Category
- Business
- Trends over time
Achievements:
- Aggregated sales by anonymized category and business.
- Time-series analysis shows trends in sales over time.
- Visualizations:
  - Bar charts for category and business analysis.
  - Line chart for sales trends over time.
- Validation:
  - Tests confirm the correctness of calculations and visualizations.

Section 3: Advanced Analysis (30 points)

Criteria:
- Segment businesses based on purchasing behavior.
- Forecast total sales for the next three months.
- Detect anomalies in sales data.
Achievements:
- Customer segmentation classifies businesses into high, medium, and low-value groups.
- Anomaly detection identifies unusual sales patterns.
- Validation:
  - Tests validate the segmentation logic and ensure correct groupings.

Section 4: Strategic Insights and Recommendations (20 points)

Criteria:
- Recommend product strategies, customer retention approaches, and operational efficiencies.
- Document these insights.
Achievements:
- Generated recommendations for:
  - Product strategy based on top-performing categories.
  - Customer retention strategies for declining businesses.
  - Operational improvements for inventory optimization.
- Recommendations are output to console and saved in:
  - outputs/product_strategy.txt
  - outputs/customer_retention.txt
  - outputs/operational_efficiency.txt
- Validation:
  - Tests confirm the creation and correctness of these files.

Section 5: Dashboard and Reporting (20 points)

Criteria:
- Create an interactive dashboard summarizing:
  - Sales by category and business.
  - Time-series trends.
  - Segmentation summaries.
Achievements:
- Built an interactive dashboard using plotly.express with:
  - Bar and line charts.
  - Segmentation summaries.
- Validation:
  - Data preparation steps for the dashboard are tested.

Bonus Section: Open-Ended Problem (10 points)

-Criteria:

Address scalability for a larger dataset.
Suggest predictive analysis techniques.
Achievements:
- Discussed scalability improvements using distributed storage and indexing.
- Proposed predictive analysis techniques such as ARIMA and ML models.
- Bonus insights are saved in outputs/bonus_questions.txt.

Testing and Validation

Automated tests were implemented using pytest to validate:
- Data loading, cleaning, and feature engineering.
- Sales overview and trends analysis.
- Recommendation generation and output file creation.
Command to run the whole pipeline

    python3 src/kwanza_tukule_analysis.py

Watch the console output, file outputs and also the browser visualizations for the assessments requirements.

Command to run tests:

    pytest tests/ --tb=short --disable-warnings

Achievements Summary

This submission addresses all required sections with the following highlights:

Cleaned and prepared dataset with missing values and duplicates handled.
Performed advanced analytics, including segmentation and anomaly detection.
Generated actionable recommendations and interactive visualizations.
Provided outputs in the form of files and a dashboard.
Implemented robust testing to validate functionality.

Notes

All output files for submission are located in the strategic_insights_recommendations/ and bonus_questions/ directory.
Dashboard generation relies on plotly.express, which renders visuals in the browser.
Please feel free to suggest further optimizations or improvements!

Thank you for reviewing this submission Evans! 🙏

…ces of data.

…ces of data - Merge conflict resolve

ericgitangu · 2025-02-06T13:32:59Z

@evansonbiwot here's a summary of the achievements in addition to the README file

ericgitangu added 3 commits January 21, 2025 11:24

Add a PR section for Evans in the README.

996b12a

Added an insights view to get a quick glance of the 'interesting' pie…

a9030a3

…ces of data.

Added an insights view to get a quick glance of the 'interesting' pie…

a03b2df

…ces of data - Merge conflict resolve

ericgitangu force-pushed the kwanza_tukule_case_study_assessment_evans_pr_branch branch from f01bac4 to a03b2df Compare January 21, 2025 16:35

ericgitangu requested a review from evansonbiwot February 6, 2025 13:34

ericgitangu self-assigned this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a PR section for Evans in the README.#1

Add a PR section for Evans in the README.#1
ericgitangu wants to merge 3 commits into
mainfrom
kwanza_tukule_case_study_assessment_evans_pr_branch

ericgitangu commented Jan 21, 2025

Uh oh!

ericgitangu commented Feb 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericgitangu commented Jan 21, 2025

Pull Request: Kwanza Tukule Data Analyst Assessment Submission

Summary

Criteria and Achievements

Section 1: Data Cleaning and Preparation (20 points)

Section 2: Exploratory Data Analysis (30 points)

Section 3: Advanced Analysis (30 points)

Section 4: Strategic Insights and Recommendations (20 points)

Section 5: Dashboard and Reporting (20 points)

Bonus Section: Open-Ended Problem (10 points)

Testing and Validation

Achievements Summary

Notes

Uh oh!

ericgitangu commented Feb 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant