Skip to content

feat: S3 Deep Dive, billing analysis mode, enhanced detection & dashboard#4

Open
shubhamperforce wants to merge 4 commits into
p4cloudops:mainfrom
shubhamperforce:shubham
Open

feat: S3 Deep Dive, billing analysis mode, enhanced detection & dashboard#4
shubhamperforce wants to merge 4 commits into
p4cloudops:mainfrom
shubhamperforce:shubham

Conversation

@shubhamperforce

Copy link
Copy Markdown

Summary

This PR adds significant new capabilities to the Cloud Cost Waste Hunter tool, built for Perforce Global Jam 2026.


New Features

🪣 S3 Deep Dive Dashboard Tab

A dedicated S3 Deep Dive tab in the Streamlit dashboard providing full visibility into S3 bucket spend and waste:

  • 4 metric cards: total S3 spend, monthly savings (with % of S3 spend recoverable), annual savings opportunity, total stored GB
  • Access tier distribution bar chart (Active / Infrequent / Cold / Frozen) with bucket count and hover cost details
  • Current cost vs potential saving grouped bar chart (top 10 most expensive buckets, side-by-side red/green bars)
  • Days idle per bucket timeline with dashed reference lines at 30d / 60d / 90d tier thresholds — visually shows which buckets have crossed each tier boundary
  • Savings breakdown by action type donut chart (Delete / Archive to Glacier / Move to Glacier / Switch to S3-IA)
  • Cost by environment grouped bar chart (prod / staging / dev / sandbox) showing current spend and recoverable savings per environment
  • Filterable bucket cards with tier colour-coding, idle days badge, recommendation, and optional CLI remediation commands
  • Termination candidates section listing frozen dev/sandbox buckets with aws s3 rb delete commands and total deletion savings

💰 AWS Cost Explorer Billing Analysis Mode

detection_engine.py now auto-detects the CSV format on startup:

  • Resource-level export (costs.csv from Cost Explorer) → runs run_billing_analysis() which:
    • Parses 500+ resource columns, identifies S3 buckets via ARN cross-reference
    • Infers AWS service for each resource ID (EC2, EBS, RDS, CloudFront, S3, Lambda, EFS, SNS, SQS, EventBridge, Amplify, and more)
    • Extracts last_active / first_active dates per resource from per-date cost rows
    • Writes billing_report.json with service breakdown, top S3 spenders, and zero-cost bucket list
    • Prints a formatted terminal summary with cost bars and S3 bucket ranking
  • Service-level export → clear error message explaining it lacks per-resource metrics
  • Inventory CSV (original format) → existing waste detection pipeline unchanged

🔍 Enhanced S3 Waste Detection

  • detect_cold_s3() enriched with size_gb, access_tier, days_since_access, and last_accessed fields
  • detect_s3_infrequent_access() — new rule flagging buckets accessed 30–59 days ago as candidates for S3-IA tier (45% saving estimate)
  • _build_s3_analysis() — generates s3_analysis.json with full per-bucket breakdown (tier, size, cost, saving, CLI fix, termination flag) and tier summary aggregates

🖥️ CLI File Argument

Both scripts now accept an optional positional filepath argument:

python3 detection_engine.py                    # defaults to aws_cost_data.csv
python3 detection_engine.py costs.csv          # Cost Explorer billing export
python3 detection_engine.py /path/to/any.csv   # any inventory CSV

Changed Files

File Change
detection_engine.py Billing analysis mode, S3 enrichment, new detection rules, CLI arg, _detect_csv_format, run_billing_analysis, _build_s3_analysis
dashboard.py S3 Deep Dive tab, improved metric cards, grouped cost/saving charts, threshold reference lines, savings donut, environment cost chart
.gitignore Added Python venv/cache patterns, ignore billing exports and generated JSON outputs

How to Run

# Activate venv (Python 3.11 required — 3.14 breaks numpy on macOS)
source venv/bin/activate

# Detect waste from the sample inventory CSV
python3 detection_engine.py aws_cost_data.csv

# Analyse a real Cost Explorer resource-level export
python3 detection_engine.py costs.csv

# Launch the Streamlit dashboard
streamlit run dashboard.py

- detection_engine.py
  - Auto-detect CSV format: inventory vs Cost Explorer resource-level export
  - Billing analysis mode (costs.csv): parses AWS Cost Explorer resource-level
    export, identifies services via ARN patterns, cross-references bare S3
    bucket names, writes billing_report.json with service breakdown and top
    S3 bucket cost ranking
  - S3 enrichment: size_gb, access_tier (Active/Infrequent/Cold/Frozen),
    days_since_access, last_accessed on all S3 findings
  - detect_s3_infrequent_access(): new rule flagging 30-59 day idle buckets
    for S3-IA tier (45% saving estimate)
  - _build_s3_analysis(): full per-bucket analysis with termination flags for
    frozen dev/sandbox buckets; writes s3_analysis.json
  - CLI argument support: python3 detection_engine.py [filepath]

- dashboard.py
  - S3 Deep Dive tab (new): metric cards, tier bar chart, grouped cost vs
    potential-saving bar chart, days-idle timeline with 30/60/90d threshold
    reference lines, savings breakdown donut, cost by environment bar,
    filterable bucket cards with CLI remediation commands, termination
    candidates section
  - Metric cards: annual savings opportunity, % of S3 spend recoverable

- dashboard_AI.py: standalone AI-powered dashboard variant
- s3_analysis.json: sample S3 analysis output from aws_cost_data.csv
- .gitignore: excludes venv/, costs.csv, billing_report.json, SM_api_key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants