diff --git a/reporting-toolkit/CLAUDE.md b/reporting-toolkit/CLAUDE.md new file mode 100644 index 0000000..249ae90 --- /dev/null +++ b/reporting-toolkit/CLAUDE.md @@ -0,0 +1,545 @@ +# OpenStack CI Analysis Reporting Toolkit + +This directory contains a complete toolkit for analyzing OpenStack CI job health, performance, and configuration. Use these scripts to generate comprehensive assessment reports. + +## Overview + +The toolkit provides data-driven analysis of OpenStack CI jobs by: +1. Extracting job inventory from CI configuration files +2. Fetching runtime metrics from Sippy API +3. Analyzing job health, coverage, and optimization opportunities +4. Comparing OpenStack against other cloud platforms +5. Categorizing failures by root cause + +## Prerequisites + +Before running the scripts, ensure: + +```bash +# Python 3.6+ with pyyaml +python3 -m pip install pyyaml +``` + +## Running from Any Path + +All scripts support the `--output-dir` parameter, allowing you to run them from anywhere in the filesystem: + +```bash +# Run from any directory, specify output location +python3 /path/to/reporting-toolkit/extract_openstack_jobs.py \ + --config-dir /path/to/release/ci-operator/config \ + --output-dir /tmp/my-analysis \ + --summary + +# Scripts will read input files from and write output files to --output-dir +python3 /path/to/reporting-toolkit/fetch_job_metrics.py --output-dir /tmp/my-analysis +``` + +### Using the Shell Script + +The easiest way to run all analysis is with the shell script: + +```bash +# From repo root - outputs to current directory +./hack/openstack-ci-analysis/reporting-toolkit/run_analysis.sh + +# From anywhere - specify both directories +/path/to/reporting-toolkit/run_analysis.sh \ + --config-dir /path/to/release/ci-operator/config \ + --output-dir /tmp/my-analysis + +# View help +./run_analysis.sh --help +``` + +### Common Options + +All scripts support: +- `--output-dir DIR`: Directory for input/output files (default: script directory or current directory) +- `--help`: Show usage information + +Additional script-specific options: +- `extract_openstack_jobs.py`: `--config-dir` to specify CI config location +- `fetch_job_metrics.py`: `--force` to refetch cached data + +## Script Execution Order + +**IMPORTANT:** Scripts have dependencies and must be run in the correct order. + +### Phase 1: Data Collection + +Run these scripts first to gather raw data: + +```bash +# Set your output directory +OUTPUT_DIR=/tmp/openstack-analysis + +# 1. Extract job inventory from CI configuration +python3 hack/openstack-ci-analysis/reporting-toolkit/extract_openstack_jobs.py \ + --config-dir ci-operator/config \ + --output-dir $OUTPUT_DIR \ + --summary + +# 2. Fetch job metrics from Sippy API +python3 hack/openstack-ci-analysis/reporting-toolkit/fetch_job_metrics.py \ + --output-dir $OUTPUT_DIR + +# 3. Calculate extended metrics (requires step 2) +python3 hack/openstack-ci-analysis/reporting-toolkit/fetch_extended_metrics.py \ + --output-dir $OUTPUT_DIR + +# 4. Fetch platform comparison data +python3 hack/openstack-ci-analysis/reporting-toolkit/fetch_comparison_data.py \ + --output-dir $OUTPUT_DIR +``` + +### Phase 2: Configuration Analysis + +These scripts analyze the job configuration (from Phase 1, step 1): + +```bash +# Analyze potential redundancy +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_redundancy.py \ + --output-dir $OUTPUT_DIR + +# Analyze coverage gaps across releases +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_coverage.py \ + --output-dir $OUTPUT_DIR + +# Analyze trigger optimization opportunities +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_triggers.py \ + --output-dir $OUTPUT_DIR +``` + +### Phase 3: Runtime Analysis + +These scripts analyze runtime metrics (requires Phase 1): + +```bash +# Analyze platform comparison +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_platform_comparison.py \ + --output-dir $OUTPUT_DIR + +# Analyze workflow pass rates +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_workflow_passrate.py \ + --output-dir $OUTPUT_DIR + +# Categorize failures by root cause +python3 hack/openstack-ci-analysis/reporting-toolkit/categorize_failures.py \ + --output-dir $OUTPUT_DIR +``` + +## Script Descriptions + +### Data Collection Scripts + +#### extract_openstack_jobs.py +Extracts all OpenStack CI jobs from `ci-operator/config/` files. + +**Input:** CI configuration YAML files +**Output:** +- `openstack_jobs_inventory.csv` - Complete job inventory +- `openstack_jobs_inventory.json` - Job inventory (JSON format) + +**Key fields extracted:** +- job_name, cluster_profile, job_type (presubmit/periodic) +- workflow, schedule, minimum_interval +- optional, always_run, skip_if_only_changed, run_if_changed +- org, repo, branch, variant, config_file + +**Options:** +- `--config-dir`: Path to config directory (default: ci-operator/config) +- `--output-csv`: Output CSV file path +- `--output-json`: Output JSON file path +- `--summary`: Print summary statistics + +#### fetch_job_metrics.py +Fetches job pass rate metrics from Sippy API. + +**Input:** None (fetches from Sippy API) +**Output:** +- `sippy_jobs_raw.json` - Raw Sippy API data (cached) +- `job_metrics_report.md` - Pass rate metrics report +- `job_metrics_summary.json` - Metrics summary + +**Data collected per job:** +- current_pass_percentage, current_runs, current_passes +- previous_pass_percentage, previous_runs, previous_passes +- open_bugs, last_pass date + +**Options:** +- `--force`: Refetch data even if cache exists + +#### fetch_extended_metrics.py +Calculates extended metrics combining current + previous periods (~14 days). + +**Requires:** `sippy_jobs_raw.json` from fetch_job_metrics.py + +**Output:** +- `extended_metrics.json` - Extended metrics data +- `extended_metrics_jobs.json` - Per-job extended metrics +- `extended_metrics_report.md` - Extended metrics report + +**Calculations:** +- Combined pass rates across 14-day window +- Trend analysis (improving/degrading/stable) +- Problem job identification (<80% pass rate) +- Estimated job durations by cluster profile + +#### fetch_comparison_data.py +Fetches platform comparison data from Sippy API. + +**Input:** None (fetches from Sippy API) +**Output:** +- `platform_comparison_raw.json` - Raw platform data + +**Platforms compared:** +- OpenStack, AWS, GCP, Azure, vSphere, Metal + +**Data collected:** +- Job counts per platform per release +- Total runs and passes +- Pass rates by platform + +### Configuration Analysis Scripts + +#### analyze_redundancy.py +Identifies redundant jobs and consolidation opportunities. + +**Requires:** `openstack_jobs_inventory.json` + +**Output:** +- `redundant_jobs_report.md` - Redundancy analysis report +- `redundant_jobs_report_data.json` - Raw analysis data + +**Analyzes:** +- Jobs duplicated between openshift/ and openshift-priv/ +- Multiple jobs using same workflow + cluster in one repo +- Presubmit trigger patterns + +#### analyze_coverage.py +Analyzes test coverage across releases. + +**Requires:** `openstack_jobs_inventory.json` + +**Output:** +- `coverage_gaps_report.md` - Coverage analysis report +- `coverage_gaps_report_data.json` - Raw analysis data + +**Analyzes:** +- Jobs per release +- Cluster profile usage by release +- Coverage gaps (tests missing from some releases) + +#### analyze_triggers.py +Identifies trigger optimization opportunities. + +**Requires:** `openstack_jobs_inventory.json` + +**Output:** +- `trigger_optimization_report.md` - Trigger optimization report +- `trigger_optimization_report_data.json` - Raw analysis data + +**Analyzes:** +- Jobs missing skip_if_only_changed patterns +- Jobs missing run_if_changed patterns +- Repos that could benefit from smarter triggering + +### Runtime Analysis Scripts + +#### analyze_platform_comparison.py +Analyzes platform comparison data. + +**Requires:** `platform_comparison_raw.json` + +**Output:** +- `platform_comparison_analysis.json` - Analysis results +- `platform_comparison_report.md` - Platform comparison report + +**Provides:** +- Platform ranking by pass rate +- OpenStack vs other platforms comparison +- Per-release platform breakdown +- Gap analysis + +#### analyze_workflow_passrate.py +Analyzes pass rates grouped by workflow/test scenario. + +**Requires:** +- `openstack_jobs_inventory.json` +- `sippy_jobs_raw.json` +- `extended_metrics_jobs.json` (optional, enhances analysis) + +**Output:** +- `workflow_passrate_analysis.json` - Analysis results +- `workflow_passrate_report.md` - Workflow pass rate report + +**Workflow classification:** +- Extracts workflow type from job names +- Groups jobs by scenario (fips, dualstack, serial, etc.) +- Categorizes as Critical (<50%), Warning (50-70%), OK (>70%) + +#### categorize_failures.py +Categorizes job failures using heuristic classification. + +**Requires:** +- `extended_metrics_jobs.json` +- `sippy_jobs_raw.json` (optional, for bug counts) + +**Output:** +- `failure_categories.json` - Categorized failures +- `failure_categories_report.md` - Failure categorization report + +**Categories:** +| Category | Criteria | +|----------|----------| +| Infrastructure | Low pass rate on install/provision jobs | +| Flaky | 30-70% pass rate (inconsistent) | +| Product Bug | 0% or low pass rate with bugs filed | +| Needs Triage | Unknown cause, requires investigation | + +## Output Files Summary + +| Category | File | Description | +|----------|------|-------------| +| **Inventory** | openstack_jobs_inventory.json | Complete job inventory | +| | openstack_jobs_inventory.csv | Job inventory (CSV) | +| **Config Analysis** | redundant_jobs_report.md | Workflow duplication analysis | +| | coverage_gaps_report.md | Cross-release coverage gaps | +| | trigger_optimization_report.md | Trigger pattern analysis | +| **Sippy Metrics** | sippy_jobs_raw.json | Cached Sippy API data | +| | job_metrics_report.md | Pass rate metrics | +| | extended_metrics_report.md | 14-day combined metrics | +| **Platform Comparison** | platform_comparison_raw.json | Raw platform data | +| | platform_comparison_report.md | Platform comparison report | +| **Workflow Analysis** | workflow_passrate_report.md | Workflow pass rate report | +| **Failure Categories** | failure_categories_report.md | Categorized failures | + +## Creating a Complete Assessment Report + +To create a comprehensive assessment report like `TEAM_REVIEW_OpenStack_CI_Assessment.md`, follow this process: + +### Step 1: Run All Scripts + +```bash +# Set up environment +cd /path/to/release + +# Phase 1: Data Collection +python3 hack/openstack-ci-analysis/reporting-toolkit/extract_openstack_jobs.py --summary +python3 hack/openstack-ci-analysis/reporting-toolkit/fetch_job_metrics.py +python3 hack/openstack-ci-analysis/reporting-toolkit/fetch_extended_metrics.py +python3 hack/openstack-ci-analysis/reporting-toolkit/fetch_comparison_data.py + +# Phase 2: Configuration Analysis +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_redundancy.py +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_coverage.py +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_triggers.py + +# Phase 3: Runtime Analysis +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_platform_comparison.py +python3 hack/openstack-ci-analysis/reporting-toolkit/analyze_workflow_passrate.py +python3 hack/openstack-ci-analysis/reporting-toolkit/categorize_failures.py +``` + +### Step 2: Review Generated Reports + +Read all generated `.md` reports to understand the findings: + +1. `job_metrics_report.md` - Overall pass rates by release +2. `extended_metrics_report.md` - 14-day trends and problem jobs +3. `platform_comparison_report.md` - OpenStack vs other platforms +4. `workflow_passrate_report.md` - Which workflows are problematic +5. `failure_categories_report.md` - Root cause categorization +6. `coverage_gaps_report.md` - Missing test coverage +7. `trigger_optimization_report.md` - Quick win optimizations +8. `redundant_jobs_report.md` - Potential consolidation + +### Step 3: Create Executive Summary + +Structure the report with these sections: + +1. **Executive Summary** + - Total jobs analyzed + - Overall pass rate + - Number of problem jobs + - Key priorities + +2. **Job Inventory Overview** + - Distribution by cluster profile + - Distribution by organization + - Jobs by type (presubmit/periodic) + +3. **Periodic Job Health Analysis** + - Overall health metrics + - Pass rate by release + - Critical failures (0% pass rate) + - Degrading jobs + - Platform comparison + - Workflow analysis + - Failure categorization + +4. **Trigger Optimization** + - Jobs missing filters + - Recommended patterns + +5. **Coverage Gaps** + - Missing tests across releases + - CAPI/other notable gaps + +6. **Action Items** + - Immediate actions + - Short-term improvements + - Medium-term investigations + +### Step 4: Key Data Points to Include + +From the JSON data files, extract these key metrics: + +**From extended_metrics.json:** +```python +import json +data = json.load(open('extended_metrics.json')) +print(f"Total jobs: {data['overall']['total_jobs']}") +print(f"Pass rate: {data['overall']['combined_pass_rate']:.1f}%") +print(f"Problem jobs: {data['overall']['problem_job_count']}") +``` + +**From platform_comparison_analysis.json:** +```python +data = json.load(open('platform_comparison_analysis.json')) +for p in data['overall']['platforms']: + print(f"{p['platform']}: {p['pass_rate']:.1f}%") +``` + +**From workflow_passrate_analysis.json:** +```python +data = json.load(open('workflow_passrate_analysis.json')) +critical = [w for w in data['workflows'] if w['severity'] == 'critical'] +for w in critical: + print(f"{w['workflow']}: {w['pass_rate']:.1f}%") +``` + +**From failure_categories.json:** +```python +data = json.load(open('failure_categories.json')) +for cat, count in data['summary']['by_category'].items(): + pct = data['summary']['percentages'][cat] + print(f"{cat}: {count} ({pct}%)") +``` + +## Customization + +### Adding New Cluster Profiles + +Edit `extract_openstack_jobs.py` to add new profiles: + +```python +OPENSTACK_CLUSTER_PROFILES = [ + "openstack-vexxhost", + "openstack-vh-mecha-central", + # Add new profiles here +] +``` + +### Adjusting Pass Rate Thresholds + +Edit `categorize_failures.py` to change thresholds: + +```python +# Current thresholds +CRITICAL_THRESHOLD = 50 # Below this = critical +WARNING_THRESHOLD = 70 # Below this = warning +PROBLEM_THRESHOLD = 80 # Below this = problem job +``` + +### Adding New Workflow Patterns + +Edit `analyze_workflow_passrate.py` to recognize new patterns: + +```python +def extract_workflow_from_name(job_name): + # Add new patterns here + if "newpattern" in name_lower: + characteristics.append("newpattern") +``` + +## Sippy API Reference + +The scripts use these Sippy API endpoints: + +| Endpoint | Description | +|----------|-------------| +| `/api/jobs?release=X` | All jobs for a release | +| `/api/variants?release=X` | Variant (platform) data | + +**Base URL:** https://sippy.dptools.openshift.org/api + +**Rate limiting:** Scripts include 1-second delays between requests. + +## Troubleshooting + +### "No Sippy data found" +Run `fetch_job_metrics.py` before running analysis scripts that require Sippy data. + +### "No job inventory found" +Run `extract_openstack_jobs.py` before running configuration analysis scripts. + +### Script fails with import error +Ensure pyyaml is installed: `python3 -m pip install pyyaml` + +### Old cached data +Use `--force` flag with fetch scripts to refresh cached data: +```bash +python3 fetch_job_metrics.py --force +``` + +## Example Analysis Session + +Here's a complete example of running an analysis and interpreting results: + +```bash +# Run all scripts +cd /path/to/release +for script in extract_openstack_jobs fetch_job_metrics fetch_extended_metrics \ + fetch_comparison_data analyze_redundancy analyze_coverage \ + analyze_triggers analyze_platform_comparison \ + analyze_workflow_passrate categorize_failures; do + echo "Running $script..." + python3 hack/openstack-ci-analysis/reporting-toolkit/${script}.py +done + +# Check key findings +echo "=== Key Findings ===" +python3 -c " +import json +ext = json.load(open('extended_metrics.json')) +plat = json.load(open('platform_comparison_analysis.json')) +fail = json.load(open('failure_categories.json')) + +print(f'Overall pass rate: {ext[\"overall\"][\"combined_pass_rate\"]:.1f}%') +print(f'Problem jobs: {ext[\"overall\"][\"problem_job_count\"]}') +print(f'OpenStack rank: #{plat[\"openstack_position\"][\"rank\"]} of {plat[\"openstack_position\"][\"total\"]}') +print(f'Flaky jobs: {fail[\"summary\"][\"by_category\"][\"flaky\"]}') +print(f'Needs triage: {fail[\"summary\"][\"by_category\"][\"needs_triage\"]}') +" +``` + +## Maintenance + +### Updating for New Releases + +When new OpenShift releases are added, update the RELEASES list in each script: + +```python +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22", "4.23"] +``` + +### Refreshing Data + +For fresh analysis, delete cached JSON files and re-run: + +```bash +rm -f *_raw.json *_jobs.json +# Then run all fetch scripts again +``` diff --git a/reporting-toolkit/README.md b/reporting-toolkit/README.md new file mode 100644 index 0000000..09b0d0c --- /dev/null +++ b/reporting-toolkit/README.md @@ -0,0 +1,224 @@ +# OpenStack CI Analysis Reporting Toolkit + +A portable toolkit for analyzing OpenStack CI job health, performance, and configuration in OpenShift CI infrastructure. + +## Overview + +This toolkit provides comprehensive analysis of OpenStack CI jobs by: + +- **Extracting** job inventory from CI configuration files +- **Fetching** runtime metrics from the [Sippy API](https://sippy.dptools.openshift.org/) +- **Analyzing** job health, coverage gaps, and optimization opportunities +- **Comparing** OpenStack pass rates against other cloud platforms (AWS, GCP, Azure, vSphere) +- **Categorizing** failures by root cause (flaky, product bug, infrastructure, needs triage) + +## Prerequisites + +- Python 3.6+ +- PyYAML library + +```bash +pip install pyyaml +``` + +## Quick Start + +### Option 1: Using the Shell Script (Recommended) + +```bash +# Clone the release repository (or use existing clone) +git clone https://github.com/openshift/release.git +cd release + +# Run complete analysis - outputs to current directory +./path/to/reporting-toolkit/run_analysis.sh + +# Or specify custom paths +./path/to/reporting-toolkit/run_analysis.sh \ + --config-dir /path/to/release/ci-operator/config \ + --output-dir /tmp/my-analysis +``` + +### Option 2: Running Scripts Individually + +```bash +# Set your paths +TOOLKIT=/path/to/reporting-toolkit +CONFIG_DIR=/path/to/release/ci-operator/config +OUTPUT_DIR=/tmp/my-analysis + +# Phase 1: Data Collection +python3 $TOOLKIT/extract_openstack_jobs.py --config-dir $CONFIG_DIR --output-dir $OUTPUT_DIR --summary +python3 $TOOLKIT/fetch_job_metrics.py --output-dir $OUTPUT_DIR +python3 $TOOLKIT/fetch_extended_metrics.py --output-dir $OUTPUT_DIR +python3 $TOOLKIT/fetch_comparison_data.py --output-dir $OUTPUT_DIR + +# Phase 2: Configuration Analysis +python3 $TOOLKIT/analyze_redundancy.py --output-dir $OUTPUT_DIR +python3 $TOOLKIT/analyze_coverage.py --output-dir $OUTPUT_DIR +python3 $TOOLKIT/analyze_triggers.py --output-dir $OUTPUT_DIR + +# Phase 3: Runtime Analysis +python3 $TOOLKIT/analyze_platform_comparison.py --output-dir $OUTPUT_DIR +python3 $TOOLKIT/analyze_workflow_passrate.py --output-dir $OUTPUT_DIR +python3 $TOOLKIT/categorize_failures.py --output-dir $OUTPUT_DIR +``` + +## Scripts + +### Data Collection + +| Script | Description | +|--------|-------------| +| `extract_openstack_jobs.py` | Extracts job inventory from `ci-operator/config/` YAML files | +| `fetch_job_metrics.py` | Fetches pass rates and run counts from Sippy API | +| `fetch_extended_metrics.py` | Calculates 14-day combined metrics and trends | +| `fetch_comparison_data.py` | Fetches platform comparison data from Sippy | + +### Configuration Analysis + +| Script | Description | +|--------|-------------| +| `analyze_redundancy.py` | Identifies duplicate/overlapping jobs | +| `analyze_coverage.py` | Finds test coverage gaps across releases | +| `analyze_triggers.py` | Identifies trigger optimization opportunities | + +### Runtime Analysis + +| Script | Description | +|--------|-------------| +| `analyze_platform_comparison.py` | Compares OpenStack vs AWS/GCP/Azure/vSphere | +| `analyze_workflow_passrate.py` | Analyzes pass rates by workflow/test type | +| `categorize_failures.py` | Classifies failures by root cause | + +## Command Line Options + +All scripts support: +- `--output-dir DIR` - Directory for input/output files (default: script directory) +- `--help` - Show usage information + +Additional options: +- `extract_openstack_jobs.py`: `--config-dir` for CI config location +- `fetch_job_metrics.py`: `--force` to refresh cached data + +## Output Files + +### Reports (Markdown) + +| File | Description | +|------|-------------| +| `job_metrics_report.md` | Pass rate metrics by release | +| `extended_metrics_report.md` | 14-day trends and problem jobs | +| `platform_comparison_report.md` | OpenStack vs other platforms | +| `workflow_passrate_report.md` | Pass rates by workflow type | +| `failure_categories_report.md` | Failures by root cause | +| `coverage_gaps_report.md` | Missing test coverage | +| `trigger_optimization_report.md` | Trigger pattern improvements | +| `redundant_jobs_report.md` | Potential job consolidation | + +### Data (JSON) + +| File | Description | +|------|-------------| +| `openstack_jobs_inventory.json` | Complete job inventory | +| `sippy_jobs_raw.json` | Cached Sippy API data | +| `extended_metrics.json` | Extended metrics data | +| `platform_comparison_raw.json` | Platform comparison data | +| `workflow_passrate_analysis.json` | Workflow analysis data | +| `failure_categories.json` | Categorized failures | + +## Example Output + +After running the analysis, you'll see key findings like: + +``` +Platform Comparison: + 1. vSphere: 80.7% + 2. AWS: 73.9% + 3. GCP: 71.2% + 4. Metal: 69.8% + 5. Azure: 68.2% + 6. OpenStack: 50.4% <-- Gap to address + +Failure Categories: + - Flaky: 41.6% + - Needs Triage: 36.0% + - Product Bug: 22.5% + +Critical Workflows (0% pass rate): + - ccpmso + - upgrade + - singlestackv6 +``` + +## Portability + +This toolkit is designed to be portable: + +1. **No hardcoded paths** - All paths are configurable via command-line options +2. **Self-contained** - All scripts are in a single directory +3. **Minimal dependencies** - Only requires Python 3.6+ and PyYAML + +To use in another project: +```bash +# Copy the toolkit +cp -r reporting-toolkit /path/to/your/project/ + +# Run from anywhere +/path/to/your/project/reporting-toolkit/run_analysis.sh \ + --config-dir /path/to/release/ci-operator/config \ + --output-dir /path/to/output +``` + +## Data Sources + +- **CI Configuration**: `ci-operator/config/` in the [openshift/release](https://github.com/openshift/release) repository +- **Runtime Metrics**: [Sippy API](https://sippy.dptools.openshift.org/) - OpenShift CI analytics platform + +## Cluster Profiles Analyzed + +The toolkit analyzes jobs using these OpenStack cluster profiles: +- `openstack-vexxhost` +- `openstack-vh-mecha-central` +- `openstack-vh-mecha-az0` +- `openstack-vh-bm-rhos` +- `openstack-hwoffload` +- `openstack-nfv` + +## Refreshing Data + +Sippy data is cached to avoid repeated API calls. To refresh: + +```bash +# Refresh all data +./run_analysis.sh --force + +# Or refresh just job metrics +python3 fetch_job_metrics.py --output-dir $OUTPUT_DIR --force +``` + +## Troubleshooting + +### "No Sippy data found" +Run `fetch_job_metrics.py` before analysis scripts that require Sippy data. + +### "No job inventory found" +Run `extract_openstack_jobs.py` before configuration analysis scripts. + +### Import error for yaml +Install PyYAML: `pip install pyyaml` + +### Config directory not found +Ensure the path to `ci-operator/config` is correct. This should point to the config directory in the openshift/release repository. + +## For Claude Code Users + +See `CLAUDE.md` for detailed instructions on using this toolkit with Claude Code, including: +- Step-by-step execution guide +- Creating comprehensive assessment reports +- Customization options +- API reference + +## License + +This toolkit is part of the OpenShift CI infrastructure. See the main repository for license information. diff --git a/reporting-toolkit/analyze_coverage.py b/reporting-toolkit/analyze_coverage.py new file mode 100644 index 0000000..1dbedcc --- /dev/null +++ b/reporting-toolkit/analyze_coverage.py @@ -0,0 +1,403 @@ +#!/usr/bin/env python3 +""" +Analyze OpenStack CI test coverage across releases. + +This script identifies: +1. Coverage matrix (which tests run on which releases) +2. Coverage gaps (tests missing from certain releases) +3. Release-to-release differences +4. Workflow usage across releases +""" + +import argparse +import csv +import json +import sys +from collections import defaultdict +from pathlib import Path + + +# Current and recent releases to focus on +ACTIVE_RELEASES = [ + "release-4.17", + "release-4.18", + "release-4.19", + "release-4.20", + "release-4.21", + "release-4.22", + "release-4.23", +] + +MAIN_BRANCHES = ["main", "master"] + + +def load_inventory(csv_path): + """Load job inventory from CSV.""" + jobs = [] + with open(csv_path, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + row['optional'] = row['optional'].lower() == 'true' + row['always_run'] = row['always_run'].lower() == 'true' + jobs.append(row) + return jobs + + +def normalize_branch(branch): + """Normalize branch name for comparison.""" + if branch in MAIN_BRANCHES: + return "main/master" + return branch + + +def get_release_version(branch): + """Extract version number from release branch.""" + if branch.startswith("release-"): + return branch.replace("release-", "") + return None + + +def build_test_matrix(jobs): + """ + Build a matrix of tests by (workflow, cluster_profile) across releases. + """ + # Group by (org, repo) to see coverage per repo + repo_coverage = defaultdict(lambda: defaultdict(set)) + + # Group by (workflow, cluster_profile) to see overall test coverage + test_coverage = defaultdict(set) + + for job in jobs: + org = job['org'] + repo = job['repo'] + branch = job['branch'] + workflow = job['workflow'] + cluster = job['cluster_profile'] + job_name = job['job_name'] + + # Normalize branch + normalized = normalize_branch(branch) + + # Track per-repo coverage + if workflow: + key = (org, repo, job_name, workflow, cluster) + repo_coverage[key][normalized].add(job['config_file']) + + # Track workflow coverage + if workflow: + test_key = (workflow, cluster, job_name) + test_coverage[test_key].add(normalized) + + return repo_coverage, test_coverage + + +def analyze_release_coverage(jobs): + """ + Analyze which releases have what coverage. + """ + # Count jobs per release + release_counts = defaultdict(int) + for job in jobs: + branch = normalize_branch(job['branch']) + release_counts[branch] += 1 + + # Count unique test types per release + release_tests = defaultdict(set) + for job in jobs: + branch = normalize_branch(job['branch']) + key = (job['workflow'], job['cluster_profile'], job['job_name']) + release_tests[branch].add(key) + + return release_counts, release_tests + + +def find_coverage_gaps(jobs): + """ + Find tests that exist in some releases but not others. + """ + # Group jobs by (org, repo, job_name) + job_releases = defaultdict(set) + for job in jobs: + key = (job['org'], job['repo'], job['job_name']) + job_releases[key].add(job['branch']) + + # Get all releases present per repo + repo_releases = defaultdict(set) + for job in jobs: + key = (job['org'], job['repo']) + repo_releases[key].add(job['branch']) + + # Find gaps + gaps = [] + for (org, repo, job_name), present_releases in job_releases.items(): + all_releases = repo_releases[(org, repo)] + + # Focus on active releases + active_present = set(r for r in present_releases if r in ACTIVE_RELEASES) + active_all = set(r for r in all_releases if r in ACTIVE_RELEASES) + + # If job exists in some active releases but not all, it's a gap + missing = active_all - active_present + if missing and active_present: # Has some active releases but missing others + gaps.append({ + 'org': org, + 'repo': repo, + 'job_name': job_name, + 'present': sorted(active_present), + 'missing': sorted(missing), + }) + + return gaps + + +def analyze_workflow_coverage(jobs): + """ + Analyze which workflows are used in which releases. + """ + workflow_releases = defaultdict(lambda: defaultdict(int)) + + for job in jobs: + workflow = job['workflow'] + if not workflow: + continue + branch = job['branch'] + if branch in ACTIVE_RELEASES or branch in MAIN_BRANCHES: + normalized = normalize_branch(branch) + workflow_releases[workflow][normalized] += 1 + + return workflow_releases + + +def analyze_cluster_profile_usage(jobs): + """ + Analyze cluster profile usage by release. + """ + profile_releases = defaultdict(lambda: defaultdict(int)) + + for job in jobs: + profile = job['cluster_profile'] + branch = job['branch'] + if branch in ACTIVE_RELEASES or branch in MAIN_BRANCHES: + normalized = normalize_branch(branch) + profile_releases[profile][normalized] += 1 + + return profile_releases + + +def generate_report(jobs, output_file): + """Generate comprehensive coverage report.""" + report = [] + report.append("# OpenStack CI Test Coverage Analysis Report\n") + report.append(f"Total jobs analyzed: {len(jobs)}\n") + + # Release Coverage Summary + report.append("\n## 1. Jobs by Release\n\n") + release_counts, release_tests = analyze_release_coverage(jobs) + + # Sort releases naturally + def sort_key(r): + if r == "main/master": + return (1, "zzz") + if r.startswith("release-"): + ver = r.replace("release-", "") + parts = ver.split(".") + return (0, tuple(int(p) for p in parts if p.isdigit())) + return (2, r) + + sorted_releases = sorted(release_counts.keys(), key=sort_key) + + report.append("| Release | Total Jobs | Unique Tests |\n") + report.append("|---------|------------|---------------|\n") + for release in sorted_releases[-15:]: # Last 15 releases + count = release_counts[release] + unique = len(release_tests[release]) + report.append(f"| {release} | {count} | {unique} |\n") + + # Cluster Profile Usage + report.append("\n## 2. Cluster Profile Usage by Release\n\n") + profile_usage = analyze_cluster_profile_usage(jobs) + + # Header + active_releases_sorted = sorted( + [normalize_branch(r) for r in ACTIVE_RELEASES + MAIN_BRANCHES], + key=sort_key + )[-6:] # Last 6 + + report.append("| Cluster Profile | " + + " | ".join(active_releases_sorted) + " |\n") + report.append("|" + "-" * 17 + "|" + + "|".join(["-" * 8 for _ in active_releases_sorted]) + "|\n") + + for profile in sorted(profile_usage.keys()): + counts = [str(profile_usage[profile].get(r, 0)) + for r in active_releases_sorted] + report.append(f"| {profile} | " + " | ".join(counts) + " |\n") + + # Workflow Usage + report.append("\n## 3. Workflow Usage by Release\n\n") + workflow_usage = analyze_workflow_coverage(jobs) + + report.append("| Workflow | " + + " | ".join(active_releases_sorted) + " |\n") + report.append("|" + "-" * 40 + "|" + + "|".join(["-" * 8 for _ in active_releases_sorted]) + "|\n") + + for workflow in sorted(workflow_usage.keys()): + counts = [str(workflow_usage[workflow].get(r, 0)) + for r in active_releases_sorted] + report.append(f"| {workflow} | " + " | ".join(counts) + " |\n") + + # Coverage Gaps + report.append("\n## 4. Coverage Gaps\n") + report.append("Tests present in some active releases but missing from others.\n\n") + + gaps = find_coverage_gaps(jobs) + + if gaps: + # Group by repo + repo_gaps = defaultdict(list) + for gap in gaps: + repo_gaps[(gap['org'], gap['repo'])].append(gap) + + report.append(f"Found {len(gaps)} coverage gaps across " + f"{len(repo_gaps)} repositories.\n\n") + + report.append("### By Repository\n\n") + for (org, repo), repo_gap_list in sorted(repo_gaps.items()): + report.append(f"#### {org}/{repo}\n\n") + report.append("| Job | Present | Missing |\n") + report.append("|-----|---------|----------|\n") + for gap in repo_gap_list[:10]: + present = ', '.join(gap['present'][:3]) + if len(gap['present']) > 3: + present += f" (+{len(gap['present'])-3})" + missing = ', '.join(gap['missing'][:3]) + if len(gap['missing']) > 3: + missing += f" (+{len(gap['missing'])-3})" + report.append(f"| {gap['job_name']} | {present} | {missing} |\n") + if len(repo_gap_list) > 10: + report.append(f"\n... and {len(repo_gap_list)-10} more gaps\n") + report.append("\n") + else: + report.append("No coverage gaps found in active releases.\n") + + # Test Type Analysis + report.append("\n## 5. Test Type Coverage\n") + report.append("Summary of test types and their coverage.\n\n") + + # Categorize by test name patterns + test_categories = { + 'e2e-basic': [], + 'e2e-conformance': [], + 'e2e-csi': [], + 'e2e-nfv': [], + 'e2e-upgrade': [], + 'e2e-other': [], + } + + for job in jobs: + name = job['job_name'].lower() + if 'csi' in name or 'manila' in name or 'cinder' in name: + test_categories['e2e-csi'].append(job) + elif 'nfv' in name or 'sriov' in name or 'hwoffload' in name: + test_categories['e2e-nfv'].append(job) + elif 'upgrade' in name: + test_categories['e2e-upgrade'].append(job) + elif 'parallel' in name or 'serial' in name or 'conformance' in name: + test_categories['e2e-conformance'].append(job) + elif name.endswith('e2e-openstack') or name == 'e2e-openstack-ovn': + test_categories['e2e-basic'].append(job) + else: + test_categories['e2e-other'].append(job) + + report.append("| Category | Total Jobs | Unique Tests |\n") + report.append("|----------|------------|---------------|\n") + for category, cat_jobs in sorted(test_categories.items()): + unique = len(set((j['job_name'], j['workflow']) for j in cat_jobs)) + report.append(f"| {category} | {len(cat_jobs)} | {unique} |\n") + + # Recommendations + report.append("\n## 6. Coverage Recommendations\n\n") + + report.append("### Missing Coverage Areas\n\n") + + # Check for releases with low coverage + active_counts = {r: release_counts.get(r, 0) + for r in ACTIVE_RELEASES} + avg_count = sum(active_counts.values()) / len(active_counts) if active_counts else 0 + + low_coverage = [r for r, c in active_counts.items() if c < avg_count * 0.7] + if low_coverage: + report.append(f"1. **Low coverage releases**: {', '.join(sorted(low_coverage))} " + f"have fewer jobs than average.\n\n") + + # Check for profile gaps + for profile in sorted(profile_usage.keys()): + releases_with_profile = [r for r, c in profile_usage[profile].items() if c > 0] + if len(releases_with_profile) < len(active_releases_sorted) - 1: + missing = set(active_releases_sorted) - set(releases_with_profile) + if missing: + report.append(f"2. **{profile}**: Missing from {', '.join(sorted(missing))}\n\n") + + report.append("### Consolidation Opportunities\n\n") + report.append("- Jobs that appear in all releases with same config could use shared workflows\n") + report.append("- Consider periodic-only coverage for older releases (4.17, 4.18)\n") + report.append("- Evaluate if all cluster profiles need coverage in all releases\n") + + # Write report + with open(output_file, 'w', encoding='utf-8') as f: + f.write(''.join(report)) + + print(f"Report written to {output_file}", file=sys.stderr) + + # Also write machine-readable data + json_output = output_file.replace('.md', '_data.json') + with open(json_output, 'w', encoding='utf-8') as f: + json.dump({ + 'release_counts': dict(release_counts), + 'workflow_usage': {k: dict(v) for k, v in workflow_usage.items()}, + 'profile_usage': {k: dict(v) for k, v in profile_usage.items()}, + 'coverage_gaps': gaps[:100], + 'test_categories': {k: len(v) for k, v in test_categories.items()}, + }, f, indent=2) + + print(f"Data written to {json_output}", file=sys.stderr) + + +def main(): + import os + script_dir = os.path.dirname(os.path.abspath(__file__)) + + parser = argparse.ArgumentParser( + description="Analyze OpenStack CI test coverage" + ) + parser.add_argument( + "--output-dir", + default=script_dir, + help="Directory for input/output files (default: script directory)" + ) + parser.add_argument( + "--inventory", + default="openstack_jobs_inventory.csv", + help="Inventory CSV filename (default: openstack_jobs_inventory.csv)" + ) + + args = parser.parse_args() + + output_dir = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Coverage Analysis") + print("=" * 60) + print(f"Output directory: {output_dir}") + print() + + inventory_path = os.path.join(output_dir, args.inventory) + output_path = os.path.join(output_dir, "coverage_gaps_report.md") + + jobs = load_inventory(inventory_path) + generate_report(jobs, output_path) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/analyze_platform_comparison.py b/reporting-toolkit/analyze_platform_comparison.py new file mode 100644 index 0000000..3cc5658 --- /dev/null +++ b/reporting-toolkit/analyze_platform_comparison.py @@ -0,0 +1,293 @@ +#!/usr/bin/env python3 +""" +Analyze platform comparison data and generate report. +Compares OpenStack CI pass rates against AWS, GCP, Azure, vSphere. +""" + +import argparse +import json +import os +import sys +from datetime import datetime + +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] +TARGET_PLATFORMS = ["OpenStack", "AWS", "GCP", "Azure", "vSphere", "Metal"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Analyze platform comparison data and generate report" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + + +def load_comparison_data(): + """Load platform comparison raw data.""" + filepath = os.path.join(OUTPUT_DIR, "platform_comparison_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_extended_metrics(): + """Load extended metrics for OpenStack-specific data.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def analyze_platforms(data, openstack_metrics): + """Analyze platform comparison data.""" + results = { + "generated": datetime.now().isoformat(), + "overall": {}, + "by_release": {}, + "openstack_position": {}, + } + + # Overall platform comparison + overall = data.get("overall_by_platform", {}) + + # Calculate OpenStack baseline + openstack_rate = 0 + if "OpenStack" in overall: + openstack_rate = overall["OpenStack"].get("pass_rate", 0) + elif openstack_metrics: + openstack_rate = openstack_metrics.get("overall", {}).get("combined_pass_rate", 0) + + # Build comparison table + platforms = [] + for platform in TARGET_PLATFORMS: + if platform in overall: + pdata = overall[platform] + rate = pdata.get("pass_rate", 0) + delta = rate - openstack_rate if platform != "OpenStack" else 0 + platforms.append({ + "platform": platform, + "job_count": pdata.get("job_count", 0), + "total_runs": pdata.get("total_runs", 0), + "total_passes": pdata.get("total_passes", 0), + "pass_rate": rate, + "vs_openstack": delta, + }) + + # Sort by pass rate descending + platforms.sort(key=lambda x: -x["pass_rate"]) + results["overall"]["platforms"] = platforms + + # Find OpenStack position + for i, p in enumerate(platforms): + if p["platform"] == "OpenStack": + results["openstack_position"]["rank"] = i + 1 + results["openstack_position"]["total"] = len(platforms) + break + + # Per-release comparison + for release in RELEASES: + release_data = data.get("releases", {}).get(release, {}) + job_metrics = release_data.get("job_metrics", {}) + + release_platforms = [] + for platform in TARGET_PLATFORMS: + if platform in job_metrics: + pdata = job_metrics[platform] + release_platforms.append({ + "platform": platform, + "job_count": pdata.get("job_count", 0), + "total_runs": pdata.get("total_runs", 0), + "pass_rate": pdata.get("pass_rate", 0), + }) + + release_platforms.sort(key=lambda x: -x["pass_rate"]) + results["by_release"][release] = release_platforms + + return results + + +def generate_report(analysis): + """Generate markdown report for platform comparison.""" + report = [] + report.append("# Platform Comparison Report") + report.append("") + report.append(f"**Generated:** {analysis['generated']}") + report.append("") + report.append("This report compares OpenStack CI job pass rates against other cloud platforms.") + report.append("") + + # Executive summary + report.append("## Executive Summary") + report.append("") + + platforms = analysis.get("overall", {}).get("platforms", []) + pos = analysis.get("openstack_position", {}) + + if pos: + report.append(f"OpenStack ranks **#{pos.get('rank', '?')} of {pos.get('total', '?')}** platforms by pass rate.") + report.append("") + + # Find best performer for comparison + if platforms: + best = platforms[0] + openstack = next((p for p in platforms if p["platform"] == "OpenStack"), None) + if openstack and best["platform"] != "OpenStack": + gap = best["pass_rate"] - openstack["pass_rate"] + report.append(f"- **Gap to best ({best['platform']}):** {gap:+.1f}%") + if openstack: + report.append(f"- **OpenStack pass rate:** {openstack['pass_rate']:.1f}%") + report.append(f"- **OpenStack job volume:** {openstack['total_runs']:,} runs across {openstack['job_count']} jobs") + report.append("") + + # Overall comparison table + report.append("## Overall Platform Comparison") + report.append("") + report.append("| Rank | Platform | Jobs | Runs | Pass Rate | vs OpenStack |") + report.append("|------|----------|------|------|-----------|--------------|") + + for i, p in enumerate(platforms, 1): + delta = p.get("vs_openstack", 0) + delta_str = f"+{delta:.1f}%" if delta > 0 else (f"{delta:.1f}%" if delta < 0 else "baseline") + runs_str = f"{p['total_runs']:,}" if p['total_runs'] >= 1000 else str(p['total_runs']) + report.append( + f"| {i} | {p['platform']} | {p['job_count']} | {runs_str} | " + f"{p['pass_rate']:.1f}% | {delta_str} |" + ) + report.append("") + + # Key observations + report.append("## Key Observations") + report.append("") + + if platforms: + openstack = next((p for p in platforms if p["platform"] == "OpenStack"), None) + if openstack: + # Calculate how many platforms are better + better = [p for p in platforms if p["pass_rate"] > openstack["pass_rate"]] + worse = [p for p in platforms if p["pass_rate"] < openstack["pass_rate"]] + + if better: + report.append(f"### Platforms with Better Pass Rates ({len(better)})") + report.append("") + for p in better: + gap = p["pass_rate"] - openstack["pass_rate"] + report.append(f"- **{p['platform']}:** {p['pass_rate']:.1f}% (+{gap:.1f}% vs OpenStack)") + report.append("") + + if worse: + report.append(f"### Platforms with Lower Pass Rates ({len(worse)})") + report.append("") + for p in worse: + gap = openstack["pass_rate"] - p["pass_rate"] + report.append(f"- **{p['platform']}:** {p['pass_rate']:.1f}% (-{gap:.1f}% vs OpenStack)") + report.append("") + + # Per-release breakdown + report.append("## Pass Rate by Release") + report.append("") + report.append("| Release | " + " | ".join(TARGET_PLATFORMS) + " |") + report.append("|---------|" + "|".join(["-------"] * len(TARGET_PLATFORMS)) + "|") + + for release in RELEASES: + release_data = analysis.get("by_release", {}).get(release, []) + rates = {} + for p in release_data: + rates[p["platform"]] = p["pass_rate"] + + row = f"| {release} |" + for platform in TARGET_PLATFORMS: + if platform in rates: + row += f" {rates[platform]:.1f}% |" + else: + row += " - |" + report.append(row) + report.append("") + + # Analysis + report.append("## Analysis") + report.append("") + report.append("### Potential Causes for Pass Rate Differences") + report.append("") + report.append("1. **Infrastructure maturity**: Platforms with longer CI history may have more stable infrastructure") + report.append("2. **Test suite differences**: Each platform runs different test subsets") + report.append("3. **Job volume**: Higher volume platforms may have more resources/attention") + report.append("4. **Platform complexity**: Some platforms have inherent complexity differences") + report.append("") + + report.append("### Recommendations") + report.append("") + report.append("1. Investigate top-performing platform configurations for applicable improvements") + report.append("2. Compare test failure patterns across platforms") + report.append("3. Review infrastructure provisioning reliability") + report.append("") + + report.append("---") + report.append("") + report.append("*Data Source: [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Platform Comparison Analysis") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load data + data = load_comparison_data() + if not data: + print("Error: No platform comparison data found.") + print("Run fetch_comparison_data.py first.") + sys.exit(1) + + openstack_metrics = load_extended_metrics() + + print(f"Loaded data from: {data.get('fetched_at')}") + print() + + # Analyze + analysis = analyze_platforms(data, openstack_metrics) + + # Save analysis + analysis_path = os.path.join(OUTPUT_DIR, "platform_comparison_analysis.json") + with open(analysis_path, 'w') as f: + json.dump(analysis, f, indent=2) + print(f"Saved: {analysis_path}") + + # Generate report + report = generate_report(analysis) + report_path = os.path.join(OUTPUT_DIR, "platform_comparison_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Print summary + print() + print("=" * 60) + print("Summary:") + platforms = analysis.get("overall", {}).get("platforms", []) + for i, p in enumerate(platforms, 1): + marker = " <-- OpenStack" if p["platform"] == "OpenStack" else "" + print(f" {i}. {p['platform']}: {p['pass_rate']:.1f}%{marker}") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/analyze_redundancy.py b/reporting-toolkit/analyze_redundancy.py new file mode 100644 index 0000000..fb1c92f --- /dev/null +++ b/reporting-toolkit/analyze_redundancy.py @@ -0,0 +1,309 @@ +#!/usr/bin/env python3 +""" +Analyze OpenStack CI jobs for redundancy and consolidation opportunities. + +This script identifies: +1. Duplicate jobs between openshift and openshift-priv organizations +2. Similar tests running on the same code paths +3. Jobs with overlapping functionality +4. Presubmit jobs that could be consolidated +""" + +import argparse +import csv +import json +import sys +from collections import defaultdict +from pathlib import Path + + +def load_inventory(csv_path): + """Load job inventory from CSV.""" + jobs = [] + with open(csv_path, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + # Convert boolean strings to actual booleans + row['optional'] = row['optional'].lower() == 'true' + row['always_run'] = row['always_run'].lower() == 'true' + jobs.append(row) + return jobs + + +def analyze_same_workflow_same_branch(jobs): + """ + Find cases where multiple jobs in the SAME repo/branch use identical + workflow + cluster_profile combinations. + + These might be testing overlapping functionality and could potentially + be consolidated, though they may have different env vars or test suites. + + NOTE: Jobs existing across different branches is EXPECTED, not redundant. + NOTE: Jobs in openshift/ vs openshift-priv/ are separate GitHub gates, not redundant. + """ + duplicates = [] + + # Group jobs by (org, repo, branch, workflow, cluster_profile) + job_groups = defaultdict(list) + for job in jobs: + if not job['workflow']: + continue + key = ( + job['org'], + job['repo'], + job['branch'], + job['workflow'], + job['cluster_profile'] + ) + job_groups[key].append(job) + + for key, group in job_groups.items(): + if len(group) > 1: + duplicates.append({ + 'org': key[0], + 'repo': key[1], + 'branch': key[2], + 'workflow': key[3], + 'cluster_profile': key[4], + 'job_count': len(group), + 'jobs': [j['job_name'] for j in group], + 'files': list(set(j['config_file'] for j in group)) + }) + + return duplicates + + +def analyze_presubmit_triggers(jobs): + """ + Analyze presubmit job trigger patterns. + Identify jobs that are always_run=true without throttling. + """ + presubmit_jobs = [j for j in jobs if j['job_type'] == 'presubmit'] + + # Group by trigger pattern + always_run_no_throttle = [] + always_run_with_throttle = [] + optional_jobs = [] + conditional_jobs = [] + + for job in presubmit_jobs: + if job['always_run']: + if job['minimum_interval']: + always_run_with_throttle.append(job) + else: + always_run_no_throttle.append(job) + elif job['optional']: + optional_jobs.append(job) + elif job['run_if_changed'] or job['skip_if_only_changed']: + conditional_jobs.append(job) + else: + # Default presubmit (runs on PR but not always) + optional_jobs.append(job) + + return { + 'always_run_no_throttle': always_run_no_throttle, + 'always_run_with_throttle': always_run_with_throttle, + 'optional': optional_jobs, + 'conditional': conditional_jobs, + } + + +def analyze_branch_consistency(jobs): + """ + Find jobs that exist on some branches but not others. + Helps identify inconsistent coverage across releases. + """ + # Group by (org, repo, job_name) + job_groups = defaultdict(set) + for job in jobs: + key = (job['org'], job['repo'], job['job_name']) + job_groups[key].add(job['branch']) + + # Find jobs that have inconsistent branch coverage + inconsistencies = [] + repo_branches = defaultdict(set) + + for job in jobs: + repo_branches[(job['org'], job['repo'])].add(job['branch']) + + for (org, repo, job_name), branches in job_groups.items(): + all_branches = repo_branches[(org, repo)] + missing = all_branches - branches + if missing and len(branches) > 1: + inconsistencies.append({ + 'org': org, + 'repo': repo, + 'job_name': job_name, + 'present_branches': sorted(branches), + 'missing_branches': sorted(missing), + }) + + return inconsistencies + + +def generate_report(jobs, output_file): + """Generate comprehensive redundancy report.""" + report = [] + report.append("# OpenStack CI Job Redundancy Analysis Report\n") + report.append(f"Total jobs analyzed: {len(jobs)}\n") + + report.append("\n## Understanding This Report\n\n") + report.append("**What is NOT redundant:**\n") + report.append("- Jobs existing across different branches (release-4.20, release-4.21, etc.)\n") + report.append("- Jobs in both openshift/ and openshift-priv/ (separate GitHub gates)\n\n") + report.append("**What MAY be redundant:**\n") + report.append("- Multiple jobs in the SAME repo/branch using identical workflow+cluster\n") + report.append("- Jobs with overlapping test coverage\n\n") + + # Same Workflow/Cluster in Same Repo/Branch + report.append("\n## 1. Multiple Jobs with Same Workflow+Cluster\n") + report.append("Cases where multiple jobs in the SAME repo/branch use identical\n") + report.append("workflow + cluster_profile combinations.\n\n") + report.append("These MAY be intentional (different test suites, env vars) or\n") + report.append("could potentially be consolidated.\n\n") + + workflow_dups = analyze_same_workflow_same_branch(jobs) + if workflow_dups: + report.append(f"Found {len(workflow_dups)} cases of workflow duplication.\n\n") + report.append("| Org/Repo | Branch | Workflow | Jobs |\n") + report.append("|----------|--------|----------|------|\n") + for dup in sorted(workflow_dups, + key=lambda x: x['job_count'], reverse=True)[:20]: + jobs_str = ', '.join(dup['jobs'][:3]) + if len(dup['jobs']) > 3: + jobs_str += f" (+{len(dup['jobs'])-3} more)" + report.append( + f"| {dup['org']}/{dup['repo']} | {dup['branch']} | " + f"{dup['workflow']} | {jobs_str} |\n" + ) + else: + report.append("No workflow duplications found.\n") + + # Presubmit Trigger Analysis + report.append("\n## 2. Presubmit Trigger Analysis\n") + triggers = analyze_presubmit_triggers(jobs) + + report.append(f"\n### Trigger Pattern Summary\n\n") + report.append("| Pattern | Count | % of Presubmits |\n") + report.append("|---------|-------|------------------|\n") + + total_presubmit = sum(len(v) for v in triggers.values()) + for pattern, jobs_list in triggers.items(): + pct = len(jobs_list) / total_presubmit * 100 if total_presubmit else 0 + report.append(f"| {pattern} | {len(jobs_list)} | {pct:.1f}% |\n") + + # Always run without throttle is concerning + if triggers['always_run_no_throttle']: + report.append("\n### Always Run Jobs Without Throttling\n") + report.append("These run on every PR without minimum_interval.\n\n") + by_repo = defaultdict(list) + for job in triggers['always_run_no_throttle']: + by_repo[(job['org'], job['repo'])].append(job) + + report.append("| Org/Repo | Jobs |\n") + report.append("|----------|------|\n") + for (org, repo), jobs_list in sorted(by_repo.items()): + job_names = ', '.join(set(j['job_name'] for j in jobs_list))[:60] + report.append(f"| {org}/{repo} | {job_names} |\n") + + # Branch Consistency + report.append("\n## 3. Branch Coverage Inconsistencies\n") + report.append("Jobs present on some branches but missing from others.\n\n") + + inconsistencies = analyze_branch_consistency(jobs) + if inconsistencies: + # Filter to significant inconsistencies (missing recent releases) + significant = [i for i in inconsistencies + if any('release-4.2' in b or 'main' in b or 'master' in b + for b in i['missing_branches'])] + + report.append(f"Found {len(significant)} significant inconsistencies.\n\n") + + if significant: + report.append("| Org/Repo | Job | Missing Branches |\n") + report.append("|----------|-----|------------------|\n") + for inc in significant[:30]: + missing = ', '.join(inc['missing_branches'][:3]) + if len(inc['missing_branches']) > 3: + missing += f" (+{len(inc['missing_branches'])-3})" + report.append( + f"| {inc['org']}/{inc['repo']} | {inc['job_name']} | {missing} |\n" + ) + else: + report.append("No significant inconsistencies found.\n") + + # Consolidation Opportunities + report.append("\n## 4. Recommendations\n\n") + + report.append("### Review Items\n\n") + + if workflow_dups: + report.append("1. **Same workflow+cluster jobs**: Review jobs using identical\n") + report.append(" workflow+cluster in the same repo/branch. These may have\n") + report.append(" different env vars or test suites, but could potentially\n") + report.append(" be consolidated if testing overlapping functionality.\n") + report.append(f" - Cases to review: {len(workflow_dups)}\n\n") + + report.append("2. **Always-run jobs**: Review jobs marked `always_run: true` " + "without `minimum_interval` throttling.\n") + report.append(f" - Jobs to review: {len(triggers['always_run_no_throttle'])}\n\n") + + report.append("3. **Branch inconsistencies**: Consider adding missing jobs " + "to recent release branches for consistent coverage.\n") + report.append(f" - Inconsistencies found: {len(inconsistencies)}\n") + + # Write report + with open(output_file, 'w', encoding='utf-8') as f: + f.write(''.join(report)) + + print(f"Report written to {output_file}", file=sys.stderr) + + # Also write machine-readable data + json_output = output_file.replace('.md', '_data.json') + with open(json_output, 'w', encoding='utf-8') as f: + json.dump({ + 'same_workflow_same_branch': workflow_dups, + 'trigger_analysis': {k: len(v) for k, v in triggers.items()}, + 'branch_inconsistencies': inconsistencies[:100], + }, f, indent=2) + + print(f"Data written to {json_output}", file=sys.stderr) + + +def main(): + import os + script_dir = os.path.dirname(os.path.abspath(__file__)) + + parser = argparse.ArgumentParser( + description="Analyze OpenStack CI jobs for redundancy" + ) + parser.add_argument( + "--output-dir", + default=script_dir, + help="Directory for input/output files (default: script directory)" + ) + parser.add_argument( + "--inventory", + default="openstack_jobs_inventory.csv", + help="Inventory CSV filename (default: openstack_jobs_inventory.csv)" + ) + + args = parser.parse_args() + + output_dir = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Redundancy Analysis") + print("=" * 60) + print(f"Output directory: {output_dir}") + print() + + inventory_path = os.path.join(output_dir, args.inventory) + output_path = os.path.join(output_dir, "redundant_jobs_report.md") + + jobs = load_inventory(inventory_path) + generate_report(jobs, output_path) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/analyze_triggers.py b/reporting-toolkit/analyze_triggers.py new file mode 100644 index 0000000..02e2799 --- /dev/null +++ b/reporting-toolkit/analyze_triggers.py @@ -0,0 +1,403 @@ +#!/usr/bin/env python3 +""" +Analyze OpenStack CI job triggers for optimization opportunities. + +This script identifies: +1. Jobs missing file-change filters (skip_if_only_changed, run_if_changed) +2. Always-run jobs without throttling +3. Repos that could benefit from smarter triggering +4. Recommended patterns for skip_if_only_changed +""" + +import argparse +import csv +import json +import sys +from collections import defaultdict +from pathlib import Path + + +# Common patterns for files that typically don't need E2E tests +SKIP_PATTERNS = { + 'documentation': [ + r'^docs/', + r'\.md$', + r'^README', + ], + 'ownership': [ + r'(^|/)OWNERS(_ALIASES)?$', + ], + 'github_config': [ + r'^\.github/', + ], + 'general': [ + r'^CHANGELOG', + r'^LICENSE', + r'^DCO', + r'^SECURITY\.md$', + ], +} + +# Suggested skip pattern for E2E tests +SUGGESTED_SKIP_PATTERN = r'(^docs/)|(\\.md$)|((^|/)OWNERS(_ALIASES)?$)' + + +def load_inventory(csv_path): + """Load job inventory from CSV.""" + jobs = [] + with open(csv_path, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + row['optional'] = row['optional'].lower() == 'true' + row['always_run'] = row['always_run'].lower() == 'true' + jobs.append(row) + return jobs + + +def analyze_trigger_patterns(jobs): + """ + Analyze the current trigger patterns used across jobs. + """ + patterns = { + 'has_skip_if_only_changed': [], + 'has_run_if_changed': [], + 'has_minimum_interval': [], + 'always_run_true': [], + 'optional_true': [], + 'no_filters': [], # Jobs with no trigger optimization + } + + for job in jobs: + if job['skip_if_only_changed']: + patterns['has_skip_if_only_changed'].append(job) + if job['run_if_changed']: + patterns['has_run_if_changed'].append(job) + if job['minimum_interval']: + patterns['has_minimum_interval'].append(job) + if job['always_run']: + patterns['always_run_true'].append(job) + if job['optional']: + patterns['optional_true'].append(job) + + # Jobs that could benefit from trigger optimization + if (job['job_type'] == 'presubmit' and + not job['skip_if_only_changed'] and + not job['run_if_changed'] and + not job['optional']): + patterns['no_filters'].append(job) + + return patterns + + +def group_jobs_by_repo(jobs): + """Group jobs by org/repo for analysis.""" + repos = defaultdict(list) + for job in jobs: + key = (job['org'], job['repo']) + repos[key].append(job) + return repos + + +def analyze_repo_trigger_status(repos): + """ + For each repo, determine if it would benefit from skip_if_only_changed. + """ + repo_analysis = [] + + for (org, repo), jobs in repos.items(): + presubmit_jobs = [j for j in jobs if j['job_type'] == 'presubmit'] + + if not presubmit_jobs: + continue + + # Count jobs with/without filters + with_skip = len([j for j in presubmit_jobs if j['skip_if_only_changed']]) + with_run_if = len([j for j in presubmit_jobs if j['run_if_changed']]) + optional = len([j for j in presubmit_jobs if j['optional']]) + always_run = len([j for j in presubmit_jobs if j['always_run']]) + no_filter = len([j for j in presubmit_jobs + if not j['skip_if_only_changed'] + and not j['run_if_changed'] + and not j['optional']]) + + # Determine if repo could benefit + could_benefit = no_filter > 0 and with_skip == 0 + + repo_analysis.append({ + 'org': org, + 'repo': repo, + 'total_presubmit': len(presubmit_jobs), + 'with_skip_pattern': with_skip, + 'with_run_if_changed': with_run_if, + 'optional': optional, + 'always_run': always_run, + 'no_filter': no_filter, + 'could_benefit': could_benefit, + 'job_names': sorted(set(j['job_name'] for j in presubmit_jobs)), + }) + + return repo_analysis + + +def analyze_always_run_jobs(jobs): + """ + Find jobs that are always_run=true without throttling. + These run on every PR and should be reviewed. + """ + always_run_jobs = [j for j in jobs + if j['always_run'] and j['job_type'] == 'presubmit'] + + # Group by whether they have minimum_interval + with_throttle = [j for j in always_run_jobs if j['minimum_interval']] + without_throttle = [j for j in always_run_jobs if not j['minimum_interval']] + + return { + 'with_throttle': with_throttle, + 'without_throttle': without_throttle, + } + + +def analyze_periodic_schedules(jobs): + """ + Analyze periodic job schedules for optimization. + """ + periodic_jobs = [j for j in jobs if j['job_type'] == 'periodic'] + + # Group by schedule pattern + schedules = defaultdict(list) + for job in periodic_jobs: + schedules[job['schedule']].append(job) + + return schedules + + +def generate_report(jobs, output_file): + """Generate comprehensive trigger optimization report.""" + report = [] + report.append("# OpenStack CI Trigger Optimization Report\n") + report.append(f"Total jobs analyzed: {len(jobs)}\n") + + presubmit_jobs = [j for j in jobs if j['job_type'] == 'presubmit'] + periodic_jobs = [j for j in jobs if j['job_type'] == 'periodic'] + + report.append(f"- Presubmit jobs: {len(presubmit_jobs)}\n") + report.append(f"- Periodic jobs: {len(periodic_jobs)}\n") + + # Trigger Pattern Analysis + report.append("\n## 1. Current Trigger Pattern Usage\n\n") + patterns = analyze_trigger_patterns(jobs) + + report.append("| Pattern | Count | % of Presubmits |\n") + report.append("|---------|-------|------------------|\n") + + total_pre = len(presubmit_jobs) + for pattern, pattern_jobs in patterns.items(): + count = len([j for j in pattern_jobs if j['job_type'] == 'presubmit']) + pct = count / total_pre * 100 if total_pre else 0 + report.append(f"| {pattern} | {count} | {pct:.1f}% |\n") + + # Jobs Missing Filters + report.append("\n## 2. Jobs Without Trigger Optimization\n") + report.append("Presubmit jobs without skip_if_only_changed, " + "run_if_changed, or optional flags.\n\n") + + no_filter_jobs = patterns['no_filters'] + if no_filter_jobs: + # Group by repo + by_repo = defaultdict(list) + for job in no_filter_jobs: + by_repo[(job['org'], job['repo'])].append(job) + + report.append(f"Found {len(no_filter_jobs)} jobs across " + f"{len(by_repo)} repositories that could benefit from " + f"trigger optimization.\n\n") + + report.append("| Org/Repo | Jobs Without Filters | Job Names |\n") + report.append("|----------|----------------------|-----------|\n") + + for (org, repo), repo_jobs in sorted(by_repo.items(), + key=lambda x: len(x[1]), + reverse=True)[:20]: + names = ', '.join(set(j['job_name'] for j in repo_jobs))[:50] + if len(names) >= 50: + names += "..." + report.append(f"| {org}/{repo} | {len(repo_jobs)} | {names} |\n") + else: + report.append("All presubmit jobs have some form of trigger optimization.\n") + + # Repository Analysis + report.append("\n## 3. Repository Trigger Analysis\n") + report.append("Repositories that could benefit from adding " + "`skip_if_only_changed` patterns.\n\n") + + repos = group_jobs_by_repo(jobs) + repo_analysis = analyze_repo_trigger_status(repos) + + # Filter to repos that could benefit + could_benefit = [r for r in repo_analysis if r['could_benefit']] + + if could_benefit: + report.append(f"Found {len(could_benefit)} repositories that could " + f"add skip patterns.\n\n") + + report.append("| Org/Repo | Presubmits | No Filter | Suggested Action |\n") + report.append("|----------|------------|-----------|------------------|\n") + + for repo in sorted(could_benefit, + key=lambda x: x['no_filter'], reverse=True)[:25]: + action = f"Add skip_if_only_changed to {repo['no_filter']} jobs" + report.append( + f"| {repo['org']}/{repo['repo']} | {repo['total_presubmit']} | " + f"{repo['no_filter']} | {action} |\n" + ) + else: + report.append("All repositories have adequate trigger patterns.\n") + + # Suggested Skip Pattern + report.append("\n## 4. Recommended skip_if_only_changed Patterns\n\n") + report.append("For OpenStack E2E tests, we recommend:\n\n") + report.append("```yaml\n") + report.append("skip_if_only_changed: ") + report.append(f"{SUGGESTED_SKIP_PATTERN}\n") + report.append("```\n\n") + + report.append("This pattern skips the job when changes only affect:\n") + report.append("- Documentation files (`docs/` directory)\n") + report.append("- Markdown files (`*.md`)\n") + report.append("- OWNERS files\n\n") + + report.append("### Individual Component Patterns\n\n") + for category, patterns_list in SKIP_PATTERNS.items(): + report.append(f"**{category}:**\n") + for p in patterns_list: + report.append(f"- `{p}`\n") + report.append("\n") + + # Periodic Schedule Analysis + report.append("\n## 5. Periodic Job Schedule Analysis\n\n") + schedules = analyze_periodic_schedules(jobs) + + if schedules: + report.append("| Schedule | Jobs | Examples |\n") + report.append("|----------|------|----------|\n") + + for schedule, sched_jobs in sorted(schedules.items()): + examples = ', '.join(set(j['job_name'] for j in sched_jobs))[:40] + if len(examples) >= 40: + examples += "..." + report.append(f"| {schedule} | {len(sched_jobs)} | {examples} |\n") + else: + report.append("No periodic jobs found.\n") + + # Optimization Recommendations + report.append("\n## 6. Optimization Recommendations\n\n") + + report.append("### High Impact\n\n") + + if could_benefit: + report.append(f"1. **Add skip_if_only_changed to {len(could_benefit)} repos**: " + f"Approximately {sum(r['no_filter'] for r in could_benefit)} jobs " + f"could skip runs on docs-only PRs.\n\n") + + # Calculate potential savings + total_no_filter = len(patterns['no_filters']) + report.append(f"2. **Total presubmit jobs without filters**: {total_no_filter}\n") + report.append(" These jobs run on every non-optional PR regardless of " + "which files changed.\n\n") + + report.append("### Medium Impact\n\n") + report.append("3. **Review always_run jobs**: Ensure jobs marked `always_run: true` " + "are truly required for every PR.\n\n") + + report.append("4. **Add minimum_interval to high-frequency jobs**: " + "Throttle jobs that don't need to run on every commit.\n\n") + + report.append("### Implementation Steps\n\n") + report.append("1. For each repo without skip patterns:\n") + report.append(" - Identify which test jobs are full E2E (vs unit tests)\n") + report.append(" - Add `skip_if_only_changed` to E2E tests\n") + report.append(" - Keep unit tests running on all changes\n\n") + + report.append("2. Example config change:\n") + report.append("```yaml\n") + report.append("tests:\n") + report.append("- as: e2e-openstack\n") + report.append(" skip_if_only_changed: (^docs/)|(\\\\..md$)|((^|/)OWNERS$)\n") + report.append(" steps:\n") + report.append(" cluster_profile: openstack-vexxhost\n") + report.append(" workflow: openshift-e2e-openstack-ipi\n") + report.append("```\n") + + # Write report + with open(output_file, 'w', encoding='utf-8') as f: + f.write(''.join(report)) + + print(f"Report written to {output_file}", file=sys.stderr) + + # Also write machine-readable data + json_output = output_file.replace('.md', '_data.json') + with open(json_output, 'w', encoding='utf-8') as f: + json.dump({ + 'trigger_patterns': {k: len(v) for k, v in patterns.items()}, + 'repos_without_skip': [ + { + 'org': r['org'], + 'repo': r['repo'], + 'jobs_without_filter': r['no_filter'], + 'job_names': r['job_names'], + } + for r in could_benefit + ], + 'jobs_without_filter': [ + { + 'org': j['org'], + 'repo': j['repo'], + 'branch': j['branch'], + 'job_name': j['job_name'], + } + for j in patterns['no_filters'] + ], + 'periodic_schedules': {k: len(v) for k, v in schedules.items()}, + 'suggested_pattern': SUGGESTED_SKIP_PATTERN, + }, f, indent=2) + + print(f"Data written to {json_output}", file=sys.stderr) + + +def main(): + import os + script_dir = os.path.dirname(os.path.abspath(__file__)) + + parser = argparse.ArgumentParser( + description="Analyze OpenStack CI job triggers for optimization" + ) + parser.add_argument( + "--output-dir", + default=script_dir, + help="Directory for input/output files (default: script directory)" + ) + parser.add_argument( + "--inventory", + default="openstack_jobs_inventory.csv", + help="Inventory CSV filename (default: openstack_jobs_inventory.csv)" + ) + + args = parser.parse_args() + + output_dir = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Trigger Optimization Analysis") + print("=" * 60) + print(f"Output directory: {output_dir}") + print() + + inventory_path = os.path.join(output_dir, args.inventory) + output_path = os.path.join(output_dir, "trigger_optimization_report.md") + + jobs = load_inventory(inventory_path) + generate_report(jobs, output_path) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/analyze_workflow_passrate.py b/reporting-toolkit/analyze_workflow_passrate.py new file mode 100644 index 0000000..5d26a60 --- /dev/null +++ b/reporting-toolkit/analyze_workflow_passrate.py @@ -0,0 +1,435 @@ +#!/usr/bin/env python3 +""" +Analyze workflow pass rates by correlating job inventory with Sippy metrics. +Maps inventory job names to Sippy data using substring matching. +""" + +import argparse +import json +import os +import sys +from datetime import datetime +from collections import defaultdict + +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Analyze workflow pass rates" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + + +def load_job_inventory(): + """Load job inventory.""" + filepath = os.path.join(OUTPUT_DIR, "openstack_jobs_inventory.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_sippy_data(): + """Load Sippy job data.""" + filepath = os.path.join(OUTPUT_DIR, "sippy_jobs_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_extended_metrics_jobs(): + """Load extended metrics per job.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics_jobs.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def extract_workflow_from_name(job_name): + """Extract workflow pattern from job name.""" + # Common workflow patterns in OpenStack job names + patterns = [ + "openshift-e2e-openstack-ipi", + "openshift-e2e-openstack-upi", + "openshift-upgrade-openstack", + "openshift-e2e-openstack", + "openshift-installer-openstack", + ] + + # Check for specific test scenarios in name + name_lower = job_name.lower() + + # Extract key test characteristics + characteristics = [] + if "serial" in name_lower: + characteristics.append("serial") + if "parallel" in name_lower: + characteristics.append("parallel") + if "fips" in name_lower: + characteristics.append("fips") + if "proxy" in name_lower: + characteristics.append("proxy") + if "dualstack" in name_lower: + characteristics.append("dualstack") + if "singlestackv6" in name_lower or "single-stack-v6" in name_lower: + characteristics.append("singlestackv6") + if "upgrade" in name_lower: + characteristics.append("upgrade") + if "nfv" in name_lower: + characteristics.append("nfv") + if "hwoffload" in name_lower: + characteristics.append("hwoffload") + if "ccpmso" in name_lower: + characteristics.append("ccpmso") + if "csi" in name_lower: + characteristics.append("csi") + if "manila" in name_lower: + characteristics.append("manila") + if "cinder" in name_lower: + characteristics.append("cinder") + if "externallb" in name_lower: + characteristics.append("externallb") + if "kuryr" in name_lower: + characteristics.append("kuryr") + if "hypershift" in name_lower: + characteristics.append("hypershift") + if "techpreview" in name_lower: + characteristics.append("techpreview") + if "etcd" in name_lower: + characteristics.append("etcd") + + if characteristics: + return "-".join(sorted(characteristics)) + return "e2e-default" + + +def correlate_jobs(inventory, sippy_data, extended_jobs): + """Correlate inventory jobs with Sippy data.""" + # Build Sippy job lookup by name + sippy_lookup = {} + for release, jobs in sippy_data.get("jobs_by_release", {}).items(): + for job in jobs: + name = job.get("name", "") + sippy_lookup[name] = { + "release": release, + "current_runs": job.get("current_runs", 0), + "current_passes": job.get("current_passes", 0), + "previous_runs": job.get("previous_runs", 0), + "previous_passes": job.get("previous_passes", 0), + "pass_rate": job.get("current_pass_percentage", 0), + } + + # Build extended metrics lookup + extended_lookup = {} + if extended_jobs: + for job in extended_jobs: + name = job.get("name", "") + extended_lookup[name] = job + + # Group inventory jobs by workflow + workflow_jobs = defaultdict(list) + + for inv_job in inventory: + job_name = inv_job.get("job_name", "") + workflow = inv_job.get("workflow", "") or extract_workflow_from_name(job_name) + job_type = inv_job.get("job_type", "") + + # Only analyze periodic jobs (which have Sippy data) + if job_type != "periodic": + continue + + # Try to find matching Sippy job + sippy_match = None + extended_match = None + + # Look for exact or partial match + for sippy_name, sippy_job in sippy_lookup.items(): + # Check if inventory job name is in Sippy job name or vice versa + if job_name in sippy_name or sippy_name.endswith(job_name): + sippy_match = sippy_job + extended_match = extended_lookup.get(sippy_name) + break + + job_info = { + "job_name": job_name, + "workflow": workflow, + "cluster_profile": inv_job.get("cluster_profile", ""), + "org": inv_job.get("org", ""), + "repo": inv_job.get("repo", ""), + "branch": inv_job.get("branch", ""), + "has_sippy_data": sippy_match is not None, + } + + if sippy_match: + job_info.update({ + "release": sippy_match.get("release", ""), + "current_runs": sippy_match.get("current_runs", 0), + "current_passes": sippy_match.get("current_passes", 0), + "previous_runs": sippy_match.get("previous_runs", 0), + "previous_passes": sippy_match.get("previous_passes", 0), + "pass_rate": sippy_match.get("pass_rate", 0), + }) + if extended_match: + job_info["combined_runs"] = extended_match.get("combined_runs", 0) + job_info["combined_pass_rate"] = extended_match.get("combined_pass_rate", 0) + job_info["trend"] = extended_match.get("trend", "") + + # Extract scenario from job name + scenario = extract_workflow_from_name(job_name) + workflow_jobs[scenario].append(job_info) + + return workflow_jobs + + +def analyze_workflows(workflow_jobs): + """Analyze pass rates by workflow.""" + results = { + "generated": datetime.now().isoformat(), + "workflows": [], + "summary": {}, + } + + workflow_stats = [] + + for workflow, jobs in workflow_jobs.items(): + jobs_with_data = [j for j in jobs if j.get("has_sippy_data")] + + if not jobs_with_data: + continue + + total_runs = sum(j.get("current_runs", 0) + j.get("previous_runs", 0) for j in jobs_with_data) + total_passes = sum(j.get("current_passes", 0) + j.get("previous_passes", 0) for j in jobs_with_data) + pass_rate = (total_passes / total_runs * 100) if total_runs > 0 else 0 + + # Count problem jobs + problem_jobs = [j for j in jobs_with_data if j.get("pass_rate", 100) < 80] + + # Calculate trend + improving = sum(1 for j in jobs_with_data if j.get("trend") == "improving") + degrading = sum(1 for j in jobs_with_data if j.get("trend") == "degrading") + + trend = "stable" + if improving > degrading and improving > 0: + trend = "improving" + elif degrading > improving and degrading > 0: + trend = "degrading" + + # Determine severity + severity = "ok" + if pass_rate < 50: + severity = "critical" + elif pass_rate < 70: + severity = "warning" + elif pass_rate < 80: + severity = "needs_attention" + + workflow_stats.append({ + "workflow": workflow, + "job_count": len(jobs_with_data), + "total_runs": total_runs, + "total_passes": total_passes, + "pass_rate": pass_rate, + "problem_job_count": len(problem_jobs), + "trend": trend, + "severity": severity, + "jobs": jobs_with_data, + }) + + # Sort by pass rate (lowest first = most problematic) + workflow_stats.sort(key=lambda x: x["pass_rate"]) + + results["workflows"] = workflow_stats + + # Summary + total_workflows = len(workflow_stats) + critical = sum(1 for w in workflow_stats if w["severity"] == "critical") + warning = sum(1 for w in workflow_stats if w["severity"] == "warning") + + results["summary"] = { + "total_workflows_analyzed": total_workflows, + "critical_workflows": critical, + "warning_workflows": warning, + "ok_workflows": total_workflows - critical - warning, + } + + return results + + +def generate_report(analysis): + """Generate markdown report for workflow analysis.""" + report = [] + report.append("# Workflow Pass Rate Analysis") + report.append("") + report.append(f"**Generated:** {analysis['generated']}") + report.append("") + report.append("This report analyzes pass rates grouped by test workflow/scenario type.") + report.append("") + + # Summary + summary = analysis.get("summary", {}) + report.append("## Summary") + report.append("") + report.append(f"| Metric | Count |") + report.append(f"|--------|-------|") + report.append(f"| Total Workflows Analyzed | {summary.get('total_workflows_analyzed', 0)} |") + report.append(f"| Critical (<50% pass rate) | {summary.get('critical_workflows', 0)} |") + report.append(f"| Warning (50-70% pass rate) | {summary.get('warning_workflows', 0)} |") + report.append(f"| OK (>70% pass rate) | {summary.get('ok_workflows', 0)} |") + report.append("") + + workflows = analysis.get("workflows", []) + + # Critical workflows + critical = [w for w in workflows if w["severity"] == "critical"] + if critical: + report.append("## Critical Workflows (Pass Rate < 50%)") + report.append("") + report.append("These workflows require immediate attention:") + report.append("") + report.append("| Workflow | Jobs | Runs | Pass Rate | Trend |") + report.append("|----------|------|------|-----------|-------|") + for w in critical: + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(w["trend"], "") + report.append( + f"| {w['workflow']} | {w['job_count']} | {w['total_runs']} | " + f"**{w['pass_rate']:.1f}%** | {trend_icon} |" + ) + report.append("") + + # Warning workflows + warning = [w for w in workflows if w["severity"] == "warning"] + if warning: + report.append("## Warning Workflows (Pass Rate 50-70%)") + report.append("") + report.append("| Workflow | Jobs | Runs | Pass Rate | Trend |") + report.append("|----------|------|------|-----------|-------|") + for w in warning: + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(w["trend"], "") + report.append( + f"| {w['workflow']} | {w['job_count']} | {w['total_runs']} | " + f"{w['pass_rate']:.1f}% | {trend_icon} |" + ) + report.append("") + + # All workflows table + report.append("## All Workflows by Pass Rate") + report.append("") + report.append("| Rank | Workflow | Jobs | Runs | Pass Rate | Problems | Trend |") + report.append("|------|----------|------|------|-----------|----------|-------|") + for i, w in enumerate(workflows, 1): + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(w["trend"], "") + severity_marker = "" + if w["severity"] == "critical": + severity_marker = " ⚠️" + elif w["severity"] == "warning": + severity_marker = " ⚡" + report.append( + f"| {i} | {w['workflow']}{severity_marker} | {w['job_count']} | " + f"{w['total_runs']} | {w['pass_rate']:.1f}% | {w['problem_job_count']} | {trend_icon} |" + ) + report.append("") + + # Recommendations + report.append("## Recommendations") + report.append("") + if critical: + report.append("### Immediate Actions") + report.append("") + for w in critical[:5]: + report.append(f"- **{w['workflow']}**: {w['pass_rate']:.1f}% pass rate with {w['total_runs']} runs - investigate root cause") + report.append("") + + if warning: + report.append("### Short-term Improvements") + report.append("") + for w in warning[:5]: + report.append(f"- **{w['workflow']}**: {w['pass_rate']:.1f}% pass rate - monitor and triage failures") + report.append("") + + report.append("---") + report.append("") + report.append("*Data Sources: Job inventory + [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Workflow Pass Rate Analysis") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load data + inventory = load_job_inventory() + if not inventory: + print("Error: No job inventory found. Run extract_openstack_jobs.py first.") + sys.exit(1) + print(f"Loaded inventory: {len(inventory)} jobs") + + sippy_data = load_sippy_data() + if not sippy_data: + print("Error: No Sippy data found. Run fetch_job_metrics.py first.") + sys.exit(1) + print(f"Loaded Sippy data from: {sippy_data.get('fetched_at')}") + + extended_jobs = load_extended_metrics_jobs() + print(f"Extended metrics loaded: {extended_jobs is not None}") + print() + + # Correlate and analyze + workflow_jobs = correlate_jobs(inventory, sippy_data, extended_jobs) + print(f"Found {len(workflow_jobs)} workflow types") + + analysis = analyze_workflows(workflow_jobs) + + # Save results + analysis_path = os.path.join(OUTPUT_DIR, "workflow_passrate_analysis.json") + with open(analysis_path, 'w') as f: + # Remove job details for smaller output + save_analysis = dict(analysis) + save_analysis["workflows"] = [ + {k: v for k, v in w.items() if k != "jobs"} + for w in analysis["workflows"] + ] + json.dump(save_analysis, f, indent=2) + print(f"Saved: {analysis_path}") + + # Generate report + report = generate_report(analysis) + report_path = os.path.join(OUTPUT_DIR, "workflow_passrate_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Print summary + print() + print("=" * 60) + print("Summary:") + summary = analysis.get("summary", {}) + print(f" Workflows analyzed: {summary.get('total_workflows_analyzed', 0)}") + print(f" Critical (<50%): {summary.get('critical_workflows', 0)}") + print(f" Warning (50-70%): {summary.get('warning_workflows', 0)}") + print(f" OK (>70%): {summary.get('ok_workflows', 0)}") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/categorize_failures.py b/reporting-toolkit/categorize_failures.py new file mode 100644 index 0000000..569c3f8 --- /dev/null +++ b/reporting-toolkit/categorize_failures.py @@ -0,0 +1,417 @@ +#!/usr/bin/env python3 +""" +Categorize job failures using heuristic classification. +Categories: Infrastructure, Flaky, Product Bug, Unknown/Needs Triage +""" + +import argparse +import json +import os +import sys +from datetime import datetime +from collections import defaultdict + +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Categorize job failures using heuristic classification" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + + +def load_extended_metrics(): + """Load extended metrics data.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_extended_jobs(): + """Load extended metrics per job.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics_jobs.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_sippy_data(): + """Load raw Sippy data for additional context.""" + filepath = os.path.join(OUTPUT_DIR, "sippy_jobs_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def categorize_job(job): + """ + Categorize a job failure based on heuristics. + + Categories: + - infrastructure: Likely infrastructure/provisioning issues + - flaky: Inconsistent pass rates (30-70%) + - product_bug: Consistent failures with bugs filed + - needs_triage: Unknown, requires investigation + """ + name = job.get("name", "").lower() + brief_name = job.get("brief_name", "").lower() + combined_rate = job.get("combined_pass_rate") + current_rate = job.get("current_pass_rate") + open_bugs = job.get("open_bugs", 0) + combined_runs = job.get("combined_runs", 0) + trend = job.get("trend", "") + + # Skip jobs with no data + if combined_rate is None or combined_runs < 2: + return None, "insufficient_data" + + # Jobs at or above 80% are not problem jobs + if combined_rate >= 80: + return None, "passing" + + # Category determination heuristics + + # 1. Product Bug: 0% pass rate or very low with bugs filed + if combined_rate == 0: + if open_bugs > 0: + return "product_bug", "0% pass rate with filed bugs" + else: + return "needs_triage", "0% pass rate, no bugs filed" + + # 2. Infrastructure indicators + infra_keywords = [ + "install", "provision", "bootstrap", "create", + "vpc", "network", "dns", "loadbalancer", "lb", + ] + is_infra_job = any(kw in name or kw in brief_name for kw in infra_keywords) + + if combined_rate < 30 and is_infra_job: + return "infrastructure", "Low pass rate on infrastructure-related job" + + # 3. Flaky: 30-70% pass rate (inconsistent) + if 30 <= combined_rate < 70: + if trend == "degrading": + return "flaky", "Inconsistent pass rate, trending worse" + elif trend == "improving": + return "flaky", "Inconsistent pass rate, trending better" + else: + return "flaky", "Inconsistent pass rate (30-70%)" + + # 4. Product Bug: Low pass rate with bugs + if combined_rate < 50 and open_bugs > 0: + return "product_bug", f"Low pass rate with {open_bugs} open bug(s)" + + # 5. Check for specific failure patterns in job name + if "etcd" in name or "scaling" in name: + return "product_bug", "Known problematic component" + + if "techpreview" in name: + return "needs_triage", "Tech preview feature - expected instability" + + # 6. Very low rate without bugs = needs investigation + if combined_rate < 30: + return "needs_triage", "Very low pass rate, needs investigation" + + # 7. Moderate failures (70-80%) + if 70 <= combined_rate < 80: + if trend == "degrading": + return "needs_triage", "Recently degraded, needs investigation" + else: + return "flaky", "Borderline pass rate" + + # Default + return "needs_triage", "Uncategorized failure" + + +def categorize_all_jobs(extended_jobs, sippy_data): + """Categorize all problem jobs.""" + results = { + "generated": datetime.now().isoformat(), + "categories": { + "infrastructure": [], + "flaky": [], + "product_bug": [], + "needs_triage": [], + }, + "summary": {}, + "by_release": defaultdict(lambda: defaultdict(list)), + } + + # Build Sippy lookup for additional context + sippy_bugs = {} + if sippy_data: + for release, jobs in sippy_data.get("jobs_by_release", {}).items(): + for job in jobs: + sippy_bugs[job.get("name", "")] = job.get("open_bugs", 0) + + # Categorize each job + for job in extended_jobs: + # Ensure we have bug info + if job.get("open_bugs") is None and job.get("name") in sippy_bugs: + job["open_bugs"] = sippy_bugs[job.get("name")] + + category, reason = categorize_job(job) + + if category is None: + continue + + job_info = { + "release": job.get("release", ""), + "name": job.get("name", ""), + "brief_name": job.get("brief_name", ""), + "combined_runs": job.get("combined_runs", 0), + "combined_pass_rate": job.get("combined_pass_rate"), + "current_pass_rate": job.get("current_pass_rate"), + "open_bugs": job.get("open_bugs", 0), + "trend": job.get("trend", ""), + "reason": reason, + } + + results["categories"][category].append(job_info) + results["by_release"][job.get("release", "")][category].append(job_info) + + # Sort each category by pass rate + for category in results["categories"]: + results["categories"][category].sort( + key=lambda x: x.get("combined_pass_rate") or 0 + ) + + # Summary statistics + total_problems = sum(len(jobs) for jobs in results["categories"].values()) + results["summary"] = { + "total_problem_jobs": total_problems, + "by_category": { + cat: len(jobs) for cat, jobs in results["categories"].items() + }, + "percentages": {}, + } + + if total_problems > 0: + for cat, count in results["summary"]["by_category"].items(): + results["summary"]["percentages"][cat] = round(count / total_problems * 100, 1) + + return results + + +def generate_report(analysis): + """Generate markdown report for failure categorization.""" + report = [] + report.append("# Failure Categorization Report") + report.append("") + report.append(f"**Generated:** {analysis['generated']}") + report.append("") + report.append("Jobs with pass rate below 80% are categorized by likely root cause.") + report.append("") + + # Summary + summary = analysis.get("summary", {}) + report.append("## Summary") + report.append("") + report.append(f"**Total Problem Jobs:** {summary.get('total_problem_jobs', 0)}") + report.append("") + report.append("| Category | Count | Percentage | Description |") + report.append("|----------|-------|------------|-------------|") + + category_descriptions = { + "infrastructure": "Provisioning/infra failures", + "flaky": "Inconsistent (30-70% pass rate)", + "product_bug": "Known bugs filed", + "needs_triage": "Requires investigation", + } + + by_cat = summary.get("by_category", {}) + percentages = summary.get("percentages", {}) + for cat in ["infrastructure", "flaky", "product_bug", "needs_triage"]: + count = by_cat.get(cat, 0) + pct = percentages.get(cat, 0) + desc = category_descriptions.get(cat, "") + report.append(f"| {cat.replace('_', ' ').title()} | {count} | {pct}% | {desc} |") + report.append("") + + # Category breakdowns + categories = analysis.get("categories", {}) + + # Infrastructure issues + infra = categories.get("infrastructure", []) + if infra: + report.append("## Infrastructure Issues") + report.append("") + report.append("Jobs likely failing due to OpenStack provisioning or infrastructure problems:") + report.append("") + report.append("| Release | Job | Pass Rate | Runs | Reason |") + report.append("|---------|-----|-----------|------|--------|") + for job in infra[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{job['combined_runs']} | {job['reason'][:30]} |" + ) + if len(infra) > 15: + report.append(f"| ... | *{len(infra) - 15} more* | | | |") + report.append("") + + # Flaky jobs + flaky = categories.get("flaky", []) + if flaky: + report.append("## Flaky Jobs") + report.append("") + report.append("Jobs with inconsistent pass rates (30-70%) indicating test or timing issues:") + report.append("") + report.append("| Release | Job | Pass Rate | Trend | Runs |") + report.append("|---------|-----|-----------|-------|------|") + for job in flaky[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(job["trend"], "") + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{trend_icon} | {job['combined_runs']} |" + ) + if len(flaky) > 15: + report.append(f"| ... | *{len(flaky) - 15} more* | | | |") + report.append("") + + # Product bugs + bugs = categories.get("product_bug", []) + if bugs: + report.append("## Product Bugs") + report.append("") + report.append("Jobs with known bugs filed - track via bug system:") + report.append("") + report.append("| Release | Job | Pass Rate | Open Bugs | Runs |") + report.append("|---------|-----|-----------|-----------|------|") + for job in bugs[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{job['open_bugs']} | {job['combined_runs']} |" + ) + if len(bugs) > 15: + report.append(f"| ... | *{len(bugs) - 15} more* | | | |") + report.append("") + + # Needs triage + triage = categories.get("needs_triage", []) + if triage: + report.append("## Needs Triage") + report.append("") + report.append("Jobs requiring investigation to determine root cause:") + report.append("") + report.append("| Release | Job | Pass Rate | Runs | Reason |") + report.append("|---------|-----|-----------|------|--------|") + for job in triage[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{job['combined_runs']} | {job['reason'][:30]} |" + ) + if len(triage) > 15: + report.append(f"| ... | *{len(triage) - 15} more* | | | |") + report.append("") + + # Recommendations + report.append("## Recommended Actions by Category") + report.append("") + report.append("### Infrastructure") + report.append("- Review OpenStack cloud health and quotas") + report.append("- Check for recurring provisioning failures") + report.append("- Validate network and DNS configuration") + report.append("") + report.append("### Flaky") + report.append("- Analyze test logs for timing-related failures") + report.append("- Consider adding retries for known flaky operations") + report.append("- Investigate environmental dependencies") + report.append("") + report.append("### Product Bug") + report.append("- Track existing bugs to resolution") + report.append("- Prioritize bugs blocking multiple jobs") + report.append("- Consider disabling jobs until bug is fixed") + report.append("") + report.append("### Needs Triage") + report.append("- Review recent job logs to identify patterns") + report.append("- File bugs with failure details") + report.append("- Categorize after investigation") + report.append("") + + report.append("---") + report.append("") + report.append("*Classification based on heuristics - manual review recommended*") + report.append("*Data Source: [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Failure Categorization") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load data + extended_jobs = load_extended_jobs() + if not extended_jobs: + print("Error: No extended metrics jobs data found.") + print("Run fetch_extended_metrics.py first.") + sys.exit(1) + print(f"Loaded {len(extended_jobs)} jobs") + + sippy_data = load_sippy_data() + print(f"Sippy data loaded: {sippy_data is not None}") + print() + + # Categorize + analysis = categorize_all_jobs(extended_jobs, sippy_data) + + # Convert defaultdict to regular dict for JSON serialization + analysis["by_release"] = {k: dict(v) for k, v in analysis["by_release"].items()} + + # Save results + analysis_path = os.path.join(OUTPUT_DIR, "failure_categories.json") + with open(analysis_path, 'w') as f: + json.dump(analysis, f, indent=2) + print(f"Saved: {analysis_path}") + + # Generate report + report = generate_report(analysis) + report_path = os.path.join(OUTPUT_DIR, "failure_categories_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Print summary + print() + print("=" * 60) + print("Summary:") + summary = analysis.get("summary", {}) + print(f" Total problem jobs: {summary.get('total_problem_jobs', 0)}") + for cat, count in summary.get("by_category", {}).items(): + pct = summary.get("percentages", {}).get(cat, 0) + print(f" {cat}: {count} ({pct}%)") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/extract_openstack_jobs.py b/reporting-toolkit/extract_openstack_jobs.py new file mode 100644 index 0000000..8dbc201 --- /dev/null +++ b/reporting-toolkit/extract_openstack_jobs.py @@ -0,0 +1,345 @@ +#!/usr/bin/env python3 +""" +Extract all OpenStack CI jobs from ci-operator/config files. + +This script parses CI configuration files and extracts job information +for tests using OpenStack cluster profiles. + +Target cluster profiles: +- openstack-vexxhost +- openstack-vh-mecha-central +- openstack-vh-mecha-az0 +- openstack-vh-bm-rhos +- openstack-hwoffload +- openstack-nfv +""" + +import argparse +import csv +import json +import os +import re +import sys +from pathlib import Path + +try: + import yaml +except ImportError: + print("Error: PyYAML is required. Install with: pip install pyyaml", file=sys.stderr) + sys.exit(1) + + +# Target OpenStack cluster profiles +OPENSTACK_PROFILES = [ + "openstack-vexxhost", + "openstack-vh-mecha-central", + "openstack-vh-mecha-az0", + "openstack-vh-bm-rhos", + "openstack-hwoffload", + "openstack-nfv", +] + + +def get_cluster_profile(test): + """Extract cluster_profile from a test definition.""" + if "steps" in test: + steps = test["steps"] + if isinstance(steps, dict): + return steps.get("cluster_profile") + return None + + +def get_workflow(test): + """Extract workflow from a test definition.""" + if "steps" in test: + steps = test["steps"] + if isinstance(steps, dict): + return steps.get("workflow") + return None + + +def get_job_type(test): + """Determine job type based on scheduling fields. + + Jobs are classified as: + - periodic: if they have cron/interval, OR if they have minimum_interval + but no presubmit triggers (always_run, run_if_changed, optional) + - postsubmit: if explicitly marked as postsubmit + - presubmit: otherwise + + Note: Jobs with minimum_interval but no presubmit triggers are periodic jobs + that run on a schedule. They're generated into *-periodics.yaml files. + """ + # Explicit periodic scheduling + if test.get("interval") or test.get("cron"): + return "periodic" + + if test.get("postsubmit"): + return "postsubmit" + + # Implicit periodic: minimum_interval without presubmit triggers + # These jobs run periodically, not on PRs + if test.get("minimum_interval"): + has_presubmit_trigger = ( + test.get("always_run") or + test.get("run_if_changed") or + test.get("optional") is True or + test.get("skip_if_only_changed") + ) + if not has_presubmit_trigger: + return "periodic" + + return "presubmit" + + +def get_schedule(test): + """Extract schedule (interval or cron) from a test. + + For implicit periodic jobs (those with minimum_interval but no presubmit + triggers), the minimum_interval acts as the schedule. + """ + if test.get("interval"): + return f"interval: {test['interval']}" + if test.get("cron"): + return f"cron: {test['cron']}" + # For implicit periodic jobs, minimum_interval is the effective schedule + if test.get("minimum_interval"): + has_presubmit_trigger = ( + test.get("always_run") or + test.get("run_if_changed") or + test.get("optional") is True or + test.get("skip_if_only_changed") + ) + if not has_presubmit_trigger: + return f"minimum_interval: {test['minimum_interval']}" + return "" + + +def parse_config_file(file_path): + """Parse a single CI config file and extract OpenStack jobs.""" + jobs = [] + + try: + with open(file_path, 'r', encoding='utf-8') as f: + config = yaml.safe_load(f) + except Exception as e: + print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr) + return jobs + + if not config or "tests" not in config: + return jobs + + # Extract metadata + metadata = config.get("zz_generated_metadata", {}) + org = metadata.get("org", "") + repo = metadata.get("repo", "") + branch = metadata.get("branch", "") + variant = metadata.get("variant", "") + + # Parse each test + for test in config.get("tests", []): + if not isinstance(test, dict): + continue + + cluster_profile = get_cluster_profile(test) + + # Check if this is an OpenStack job + if cluster_profile and any(profile in cluster_profile for profile in OPENSTACK_PROFILES): + job_name = test.get("as", "") + + job_info = { + "job_name": job_name, + "cluster_profile": cluster_profile, + "job_type": get_job_type(test), + "schedule": get_schedule(test), + "workflow": get_workflow(test) or "", + "optional": test.get("optional", False), + "always_run": test.get("always_run", False), + "minimum_interval": test.get("minimum_interval", ""), + "skip_if_only_changed": test.get("skip_if_only_changed", ""), + "run_if_changed": test.get("run_if_changed", ""), + "org": org, + "repo": repo, + "branch": branch, + "variant": variant, + "config_file": str(file_path), + } + + jobs.append(job_info) + + return jobs + + +def find_config_files(config_dir): + """Find all CI config YAML files.""" + config_path = Path(config_dir) + + yaml_files = [] + for pattern in ["**/*.yaml", "**/*.yml"]: + yaml_files.extend(config_path.glob(pattern)) + + return sorted(set(yaml_files)) + + +def extract_jobs(config_dir): + """Extract all OpenStack jobs from config directory.""" + all_jobs = [] + + config_files = find_config_files(config_dir) + print(f"Found {len(config_files)} config files to scan", file=sys.stderr) + + for file_path in config_files: + jobs = parse_config_file(file_path) + all_jobs.extend(jobs) + + print(f"Extracted {len(all_jobs)} OpenStack jobs", file=sys.stderr) + return all_jobs + + +def output_csv(jobs, output_file): + """Output jobs to CSV format.""" + if not jobs: + print("No jobs to output", file=sys.stderr) + return + + fieldnames = [ + "job_name", "cluster_profile", "job_type", "schedule", "workflow", + "optional", "always_run", "minimum_interval", "skip_if_only_changed", + "run_if_changed", "org", "repo", "branch", "variant", "config_file" + ] + + with open(output_file, 'w', newline='', encoding='utf-8') as f: + writer = csv.DictWriter(f, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(jobs) + + print(f"Wrote {len(jobs)} jobs to {output_file}", file=sys.stderr) + + +def output_json(jobs, output_file): + """Output jobs to JSON format.""" + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(jobs, f, indent=2) + + print(f"Wrote {len(jobs)} jobs to {output_file}", file=sys.stderr) + + +def print_summary(jobs): + """Print summary statistics.""" + print("\n=== OpenStack CI Job Summary ===\n") + + # By cluster profile + profile_counts = {} + for job in jobs: + profile = job["cluster_profile"] + profile_counts[profile] = profile_counts.get(profile, 0) + 1 + + print("Jobs by Cluster Profile:") + for profile in sorted(profile_counts.keys()): + print(f" {profile}: {profile_counts[profile]}") + + # By job type + type_counts = {} + for job in jobs: + job_type = job["job_type"] + type_counts[job_type] = type_counts.get(job_type, 0) + 1 + + print("\nJobs by Type:") + for job_type in sorted(type_counts.keys()): + print(f" {job_type}: {type_counts[job_type]}") + + # By org + org_counts = {} + for job in jobs: + org = job["org"] or "unknown" + org_counts[org] = org_counts.get(org, 0) + 1 + + print("\nJobs by Organization:") + for org in sorted(org_counts.keys(), key=lambda x: org_counts[x], reverse=True)[:10]: + print(f" {org}: {org_counts[org]}") + + # Unique workflows + workflows = set(job["workflow"] for job in jobs if job["workflow"]) + print(f"\nUnique Workflows: {len(workflows)}") + + # Unique repos + repos = set(f"{job['org']}/{job['repo']}" for job in jobs if job['org'] and job['repo']) + print(f"Unique Repositories: {len(repos)}") + + # Release branches + branches = set(job["branch"] for job in jobs if job["branch"]) + release_branches = sorted([b for b in branches if "release-" in b or b in ["main", "master"]]) + print(f"\nRelease Branches:") + for branch in release_branches[-10:]: + count = len([j for j in jobs if j["branch"] == branch]) + print(f" {branch}: {count}") + + print() + + +def main(): + parser = argparse.ArgumentParser( + description="Extract OpenStack CI jobs from ci-operator config files" + ) + parser.add_argument( + "--config-dir", + default="ci-operator/config", + help="Path to ci-operator/config directory (default: ci-operator/config)" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for output files (default: script directory)" + ) + parser.add_argument( + "--output-csv", + default="openstack_jobs_inventory.csv", + help="Output CSV filename (default: openstack_jobs_inventory.csv)" + ) + parser.add_argument( + "--output-json", + default="openstack_jobs_inventory.json", + help="Output JSON filename (default: openstack_jobs_inventory.json)" + ) + parser.add_argument( + "--summary", + action="store_true", + help="Print summary statistics" + ) + + args = parser.parse_args() + + # Resolve output directory + output_dir = os.path.abspath(args.output_dir) + os.makedirs(output_dir, exist_ok=True) + + print("=" * 60) + print("OpenStack CI Job Extractor") + print("=" * 60) + print(f"Config directory: {args.config_dir}") + print(f"Output directory: {output_dir}") + print() + + # Ensure config directory exists + if not os.path.isdir(args.config_dir): + print(f"Error: Config directory not found: {args.config_dir}", file=sys.stderr) + sys.exit(1) + + # Extract jobs + jobs = extract_jobs(args.config_dir) + + # Output CSV + csv_path = os.path.join(output_dir, args.output_csv) + output_csv(jobs, csv_path) + + # Output JSON + json_path = os.path.join(output_dir, args.output_json) + output_json(jobs, json_path) + + # Print summary if requested + if args.summary: + print_summary(jobs) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/fetch_comparison_data.py b/reporting-toolkit/fetch_comparison_data.py new file mode 100644 index 0000000..dbb723a --- /dev/null +++ b/reporting-toolkit/fetch_comparison_data.py @@ -0,0 +1,224 @@ +#!/usr/bin/env python3 +""" +Fetch platform comparison data from Sippy API. +Fetches variant data for all platforms to compare OpenStack against AWS, GCP, Azure, vSphere. +""" + +import argparse +import json +import os +import sys +import time +from urllib.request import urlopen, Request +from urllib.error import URLError, HTTPError +from datetime import datetime + +SIPPY_BASE = "https://sippy.dptools.openshift.org/api" +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + +# Platform variants to compare +PLATFORMS = ["OpenStack", "AWS", "GCP", "Azure", "vSphere", "Metal"] + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Fetch platform comparison data from Sippy API" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for output files (default: script directory)" + ) + return parser.parse_args() + + +def fetch_json(url, retries=3, delay=2): + """Fetch JSON from URL with retries.""" + for attempt in range(retries): + try: + req = Request(url, headers={"User-Agent": "OpenStack-CI-Analysis/1.0"}) + with urlopen(req, timeout=60) as response: + return json.loads(response.read().decode()) + except (URLError, HTTPError) as e: + print(f" Attempt {attempt + 1} failed: {e}") + if attempt < retries - 1: + time.sleep(delay) + return None + + +def fetch_variants_for_release(release): + """Fetch variant data for a specific release.""" + url = f"{SIPPY_BASE}/variants?release={release}" + print(f"Fetching variants for release {release}...") + + data = fetch_json(url) + if data is None: + print(f" Failed to fetch variants for {release}") + return [] + + print(f" Retrieved {len(data)} variants") + return data + + +def extract_platform_variants(variants): + """Extract Platform:* variants from variant data.""" + platform_data = {} + + for variant in variants: + name = variant.get("name", "") + if name.startswith("Platform:"): + platform = name.replace("Platform:", "") + platform_data[platform] = { + "name": platform, + "variant_full_name": name, + "current_pass_percentage": variant.get("current_pass_percentage", 0), + "current_runs": variant.get("current_runs", 0), + "current_passes": variant.get("current_passes", 0), + "previous_pass_percentage": variant.get("previous_pass_percentage", 0), + "previous_runs": variant.get("previous_runs", 0), + "previous_passes": variant.get("previous_passes", 0), + "job_count": variant.get("job_count", 0), + } + + return platform_data + + +def fetch_jobs_for_release(release): + """Fetch all jobs for a release to get platform job counts.""" + url = f"{SIPPY_BASE}/jobs?release={release}" + print(f" Fetching jobs for platform counts...") + + data = fetch_json(url) + if data is None: + return {} + + # Count jobs by platform + platform_counts = {} + platform_runs = {} + platform_passes = {} + + for job in data: + name = job.get("name", "").lower() + runs = job.get("current_runs", 0) + job.get("previous_runs", 0) + passes = job.get("current_passes", 0) + job.get("previous_passes", 0) + + # Determine platform from job name + platform = None + if "openstack" in name: + platform = "OpenStack" + elif "aws" in name: + platform = "AWS" + elif "gcp" in name: + platform = "GCP" + elif "azure" in name: + platform = "Azure" + elif "vsphere" in name: + platform = "vSphere" + elif "metal" in name or "baremetal" in name: + platform = "Metal" + + if platform: + platform_counts[platform] = platform_counts.get(platform, 0) + 1 + platform_runs[platform] = platform_runs.get(platform, 0) + runs + platform_passes[platform] = platform_passes.get(platform, 0) + passes + + result = {} + for platform in platform_counts: + runs = platform_runs.get(platform, 0) + passes = platform_passes.get(platform, 0) + result[platform] = { + "job_count": platform_counts[platform], + "total_runs": runs, + "total_passes": passes, + "pass_rate": (passes / runs * 100) if runs > 0 else 0, + } + + return result + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + # Create output directory if needed + os.makedirs(OUTPUT_DIR, exist_ok=True) + + print("=" * 60) + print("OpenStack CI Platform Comparison Data Fetcher") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + results = { + "fetched_at": datetime.now().isoformat(), + "releases": {}, + "overall_by_platform": {}, + } + + # Fetch data for each release + for release in RELEASES: + print(f"\n--- Release {release} ---") + + # Fetch variants + variants = fetch_variants_for_release(release) + platform_variants = extract_platform_variants(variants) if variants else {} + + # Fetch job counts + platform_jobs = fetch_jobs_for_release(release) + + # Combine data + release_data = { + "variants": platform_variants, + "job_metrics": platform_jobs, + } + results["releases"][release] = release_data + + time.sleep(1) # Be nice to the API + + # Calculate overall metrics by platform + overall = {} + for release, data in results["releases"].items(): + for platform, metrics in data.get("job_metrics", {}).items(): + if platform not in overall: + overall[platform] = { + "job_count": 0, + "total_runs": 0, + "total_passes": 0, + } + overall[platform]["job_count"] += metrics.get("job_count", 0) + overall[platform]["total_runs"] += metrics.get("total_runs", 0) + overall[platform]["total_passes"] += metrics.get("total_passes", 0) + + # Calculate pass rates + for platform, data in overall.items(): + runs = data["total_runs"] + passes = data["total_passes"] + data["pass_rate"] = (passes / runs * 100) if runs > 0 else 0 + + results["overall_by_platform"] = overall + + # Save results + output_path = os.path.join(OUTPUT_DIR, "platform_comparison_raw.json") + with open(output_path, 'w') as f: + json.dump(results, f, indent=2) + print(f"\nSaved: {output_path}") + + # Print summary + print("\n" + "=" * 60) + print("Summary by Platform (all releases):") + print("-" * 60) + print(f"{'Platform':<15} {'Jobs':>8} {'Runs':>10} {'Pass Rate':>10}") + print("-" * 60) + for platform in sorted(overall.keys(), key=lambda x: -overall[x]["pass_rate"]): + data = overall[platform] + print(f"{platform:<15} {data['job_count']:>8} {data['total_runs']:>10} {data['pass_rate']:>9.1f}%") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/fetch_extended_metrics.py b/reporting-toolkit/fetch_extended_metrics.py new file mode 100644 index 0000000..6b28834 --- /dev/null +++ b/reporting-toolkit/fetch_extended_metrics.py @@ -0,0 +1,383 @@ +#!/usr/bin/env python3 +""" +Fetch extended job metrics from Sippy API for OpenStack CI jobs. +Combines current + previous periods for ~14 day coverage. +Estimates job duration based on workflow/cluster profile. +""" + +import argparse +import json +import os +import sys +from datetime import datetime, timedelta +from urllib.request import urlopen, Request +from urllib.error import URLError, HTTPError +import time + +SIPPY_BASE = "https://sippy.dptools.openshift.org/api" +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Calculate extended job metrics from Sippy data" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + +# Estimated durations by cluster profile (based on typical run times) +DURATION_ESTIMATES = { + "openstack-vexxhost": {"min": 60, "typical": 90, "max": 150}, + "openstack-vh-mecha-central": {"min": 60, "typical": 90, "max": 150}, + "openstack-vh-mecha-az0": {"min": 60, "typical": 100, "max": 180}, + "openstack-nfv": {"min": 90, "typical": 120, "max": 200}, + "openstack-hwoffload": {"min": 90, "typical": 120, "max": 200}, + "openstack-vh-bm-rhos": {"min": 120, "typical": 180, "max": 300}, +} + + +def load_collected_data(): + """Load previously collected Sippy data.""" + filepath = os.path.join(OUTPUT_DIR, "sippy_jobs_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_job_inventory(): + """Load job inventory for cluster profile info.""" + filepath = os.path.join(OUTPUT_DIR, "openstack_jobs_inventory.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def calculate_extended_metrics(sippy_data, inventory): + """Calculate extended metrics combining current + previous periods.""" + + results = { + "generated": datetime.now().isoformat(), + "period": "~14 days (current + previous Sippy windows)", + "releases": {}, + "overall": {}, + "problem_jobs": [], + "duration_estimates": {}, + } + + # Build a lookup for cluster profiles from inventory + cluster_profiles = {} + if inventory: + for job in inventory: + cluster_profiles[job.get("job_name", "")] = job.get("cluster_profile", "") + + all_jobs = [] + + for release, jobs in sippy_data.get("jobs_by_release", {}).items(): + release_stats = { + "total_jobs": len(jobs), + "current_runs": 0, + "current_passes": 0, + "previous_runs": 0, + "previous_passes": 0, + "combined_runs": 0, + "combined_passes": 0, + "pass_rate_current": 0, + "pass_rate_combined": 0, + "trend": "", + } + + for job in jobs: + name = job.get("name", "") + current_runs = job.get("current_runs", 0) + current_passes = job.get("current_passes", 0) + previous_runs = job.get("previous_runs", 0) + previous_passes = job.get("previous_passes", 0) + + combined_runs = current_runs + previous_runs + combined_passes = current_passes + previous_passes + + release_stats["current_runs"] += current_runs + release_stats["current_passes"] += current_passes + release_stats["previous_runs"] += previous_runs + release_stats["previous_passes"] += previous_passes + release_stats["combined_runs"] += combined_runs + release_stats["combined_passes"] += combined_passes + + # Calculate pass rates + current_rate = (current_passes / current_runs * 100) if current_runs > 0 else None + previous_rate = (previous_passes / previous_runs * 100) if previous_runs > 0 else None + combined_rate = (combined_passes / combined_runs * 100) if combined_runs > 0 else None + + # Determine trend + trend = "stable" + if current_rate is not None and previous_rate is not None: + diff = current_rate - previous_rate + if diff > 10: + trend = "improving" + elif diff < -10: + trend = "degrading" + + # Get cluster profile for duration estimate + cluster = cluster_profiles.get(name, "unknown") + duration_est = DURATION_ESTIMATES.get(cluster, {"min": 60, "typical": 90, "max": 180}) + + job_info = { + "release": release, + "name": name, + "brief_name": job.get("brief_name", name), + "cluster_profile": cluster, + "current_runs": current_runs, + "current_passes": current_passes, + "current_pass_rate": current_rate, + "previous_runs": previous_runs, + "previous_passes": previous_passes, + "previous_pass_rate": previous_rate, + "combined_runs": combined_runs, + "combined_passes": combined_passes, + "combined_pass_rate": combined_rate, + "trend": trend, + "last_pass": job.get("last_pass", ""), + "open_bugs": job.get("open_bugs", 0), + "estimated_duration_min": duration_est["typical"], + } + all_jobs.append(job_info) + + # Track problem jobs (< 80% and has runs) + if combined_rate is not None and combined_rate < 80 and combined_runs >= 2: + results["problem_jobs"].append(job_info) + + # Calculate release-level rates + if release_stats["current_runs"] > 0: + release_stats["pass_rate_current"] = ( + release_stats["current_passes"] / release_stats["current_runs"] * 100 + ) + if release_stats["combined_runs"] > 0: + release_stats["pass_rate_combined"] = ( + release_stats["combined_passes"] / release_stats["combined_runs"] * 100 + ) + + # Determine release trend + if release_stats["current_runs"] > 0 and release_stats["previous_runs"] > 0: + curr_rate = release_stats["current_passes"] / release_stats["current_runs"] + prev_rate = release_stats["previous_passes"] / release_stats["previous_runs"] + diff = (curr_rate - prev_rate) * 100 + if diff > 5: + release_stats["trend"] = "improving" + elif diff < -5: + release_stats["trend"] = "degrading" + else: + release_stats["trend"] = "stable" + + results["releases"][release] = release_stats + + # Overall statistics + total_current_runs = sum(r["current_runs"] for r in results["releases"].values()) + total_current_passes = sum(r["current_passes"] for r in results["releases"].values()) + total_combined_runs = sum(r["combined_runs"] for r in results["releases"].values()) + total_combined_passes = sum(r["combined_passes"] for r in results["releases"].values()) + + results["overall"] = { + "total_jobs": len(all_jobs), + "current_runs": total_current_runs, + "current_passes": total_current_passes, + "current_pass_rate": (total_current_passes / total_current_runs * 100) if total_current_runs > 0 else 0, + "combined_runs": total_combined_runs, + "combined_passes": total_combined_passes, + "combined_pass_rate": (total_combined_passes / total_combined_runs * 100) if total_combined_runs > 0 else 0, + "problem_job_count": len(results["problem_jobs"]), + } + + # Sort problem jobs by pass rate + results["problem_jobs"].sort(key=lambda x: x.get("combined_pass_rate", 0) or 0) + + # Duration estimates summary + jobs_by_profile = {} + for job in all_jobs: + profile = job.get("cluster_profile", "unknown") + if profile not in jobs_by_profile: + jobs_by_profile[profile] = [] + jobs_by_profile[profile].append(job) + + for profile, jobs in jobs_by_profile.items(): + est = DURATION_ESTIMATES.get(profile, {"min": 60, "typical": 90, "max": 180}) + total_runs = sum(j["combined_runs"] for j in jobs) + results["duration_estimates"][profile] = { + "job_count": len(jobs), + "total_runs": total_runs, + "typical_duration_min": est["typical"], + "estimated_total_hours": round(total_runs * est["typical"] / 60, 1), + } + + return results, all_jobs + + +def generate_extended_report(results, all_jobs): + """Generate markdown report with extended metrics.""" + report = [] + report.append("# OpenStack CI Extended Metrics Report") + report.append("") + report.append(f"**Generated:** {results['generated']}") + report.append(f"**Period:** {results['period']}") + report.append("") + + # Overall summary + report.append("## Executive Summary") + report.append("") + overall = results["overall"] + report.append(f"| Metric | Current (~7d) | Combined (~14d) |") + report.append(f"|--------|---------------|-----------------|") + report.append(f"| Total Jobs | {overall['total_jobs']} | {overall['total_jobs']} |") + report.append(f"| Total Runs | {overall['current_runs']} | {overall['combined_runs']} |") + report.append(f"| Pass Rate | {overall['current_pass_rate']:.1f}% | {overall['combined_pass_rate']:.1f}% |") + report.append(f"| Problem Jobs (<80%) | - | {overall['problem_job_count']} |") + report.append("") + + # Per-release breakdown + report.append("## Metrics by Release") + report.append("") + report.append("| Release | Jobs | Runs (14d) | Pass Rate | Trend |") + report.append("|---------|------|------------|-----------|-------|") + for release in RELEASES: + rel = results["releases"].get(release, {}) + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(rel.get("trend", ""), "") + report.append( + f"| {release} | {rel.get('total_jobs', 0)} | " + f"{rel.get('combined_runs', 0)} | " + f"{rel.get('pass_rate_combined', 0):.1f}% | {trend_icon} {rel.get('trend', '')} |" + ) + report.append("") + + # Problem jobs + report.append("## Problem Jobs (Pass Rate < 80%)") + report.append("") + problem_jobs = results.get("problem_jobs", []) + if problem_jobs: + report.append(f"**{len(problem_jobs)} jobs** need attention:") + report.append("") + report.append("| Release | Job | Runs | Pass Rate | Trend | Bugs |") + report.append("|---------|-----|------|-----------|-------|------|") + for job in problem_jobs[:25]: + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(job.get("trend", ""), "") + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:50]} | " + f"{job['combined_runs']} | {rate_str} | {trend_icon} | {job.get('open_bugs', 0)} |" + ) + if len(problem_jobs) > 25: + report.append(f"| ... | *{len(problem_jobs) - 25} more jobs* | | | | |") + else: + report.append("All jobs with sufficient runs have pass rate >= 80%.") + report.append("") + + # Duration estimates + report.append("## Estimated Job Durations by Cluster Profile") + report.append("") + report.append("*Note: Durations are estimates based on typical run times.*") + report.append("") + report.append("| Cluster Profile | Jobs | Runs (14d) | Typical Duration | Est. Total Hours |") + report.append("|-----------------|------|------------|------------------|------------------|") + for profile, est in sorted(results.get("duration_estimates", {}).items(), + key=lambda x: -x[1]["total_runs"]): + report.append( + f"| {profile} | {est['job_count']} | {est['total_runs']} | " + f"~{est['typical_duration_min']}min | {est['estimated_total_hours']}h |" + ) + report.append("") + + # Trend analysis + report.append("## Trend Analysis") + report.append("") + improving = [j for j in all_jobs if j.get("trend") == "improving" and j["combined_runs"] >= 2] + degrading = [j for j in all_jobs if j.get("trend") == "degrading" and j["combined_runs"] >= 2] + report.append(f"- **Improving jobs:** {len(improving)}") + report.append(f"- **Degrading jobs:** {len(degrading)}") + report.append(f"- **Stable jobs:** {len(all_jobs) - len(improving) - len(degrading)}") + report.append("") + + if degrading: + report.append("### Degrading Jobs (investigate)") + report.append("") + for job in sorted(degrading, key=lambda x: (x.get("current_pass_rate") or 100))[:10]: + curr = job.get("current_pass_rate") + prev = job.get("previous_pass_rate") + curr_str = f"{curr:.0f}%" if curr is not None else "N/A" + prev_str = f"{prev:.0f}%" if prev is not None else "N/A" + report.append(f"- **{job['brief_name'][:50]}** ({job['release']}): {prev_str} → {curr_str}") + report.append("") + + report.append("---") + report.append("") + report.append("*Data Source: [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Extended Metrics") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load existing data + sippy_data = load_collected_data() + if not sippy_data: + print("Error: No Sippy data found. Run fetch_job_metrics.py first.") + sys.exit(1) + + inventory = load_job_inventory() + print(f"Loaded Sippy data from: {sippy_data.get('fetched_at')}") + print(f"Job inventory loaded: {inventory is not None}") + print() + + # Calculate extended metrics + results, all_jobs = calculate_extended_metrics(sippy_data, inventory) + + # Save results + results_path = os.path.join(OUTPUT_DIR, "extended_metrics.json") + with open(results_path, 'w') as f: + json.dump(results, f, indent=2) + print(f"Saved: {results_path}") + + all_jobs_path = os.path.join(OUTPUT_DIR, "extended_metrics_jobs.json") + with open(all_jobs_path, 'w') as f: + json.dump(all_jobs, f, indent=2) + print(f"Saved: {all_jobs_path}") + + # Generate report + report = generate_extended_report(results, all_jobs) + report_path = os.path.join(OUTPUT_DIR, "extended_metrics_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Summary + print() + print("=" * 60) + print("Summary:") + overall = results["overall"] + print(f" Total jobs: {overall['total_jobs']}") + print(f" Combined runs (14d): {overall['combined_runs']}") + print(f" Combined pass rate: {overall['combined_pass_rate']:.1f}%") + print(f" Problem jobs: {overall['problem_job_count']}") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/fetch_job_metrics.py b/reporting-toolkit/fetch_job_metrics.py new file mode 100644 index 0000000..2df28c8 --- /dev/null +++ b/reporting-toolkit/fetch_job_metrics.py @@ -0,0 +1,319 @@ +#!/usr/bin/env python3 +""" +Fetch job metrics (pass rates, run counts) from Sippy API for OpenStack CI jobs. +Saves progress to files to allow resumption if interrupted. +""" + +import argparse +import json +import os +import sys +import time +from urllib.request import urlopen, Request +from urllib.error import URLError, HTTPError +from datetime import datetime + +SIPPY_BASE = "https://sippy.dptools.openshift.org/api" +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Fetch job metrics from Sippy API for OpenStack CI jobs" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for output files (default: script directory)" + ) + parser.add_argument( + "--force", + action="store_true", + help="Refetch data even if cache exists" + ) + return parser.parse_args() + +def fetch_json(url, retries=3, delay=2): + """Fetch JSON from URL with retries.""" + for attempt in range(retries): + try: + req = Request(url, headers={"User-Agent": "OpenStack-CI-Analysis/1.0"}) + with urlopen(req, timeout=60) as response: + return json.loads(response.read().decode()) + except (URLError, HTTPError) as e: + print(f" Attempt {attempt + 1} failed: {e}") + if attempt < retries - 1: + time.sleep(delay) + return None + +def fetch_openstack_jobs_for_release(release): + """Fetch all OpenStack jobs for a specific release.""" + url = f"{SIPPY_BASE}/jobs?release={release}" + print(f"Fetching jobs for release {release}...") + + data = fetch_json(url) + if data is None: + print(f" Failed to fetch data for {release}") + return [] + + # Filter for OpenStack jobs + openstack_jobs = [j for j in data if "openstack" in j.get("name", "").lower()] + print(f" Found {len(openstack_jobs)} OpenStack jobs out of {len(data)} total") + + return openstack_jobs + +def save_progress(data, filename): + """Save data to file.""" + filepath = os.path.join(OUTPUT_DIR, filename) + with open(filepath, 'w') as f: + json.dump(data, f, indent=2) + print(f"Saved: {filepath}") + +def load_progress(filename): + """Load data from file if exists.""" + filepath = os.path.join(OUTPUT_DIR, filename) + if os.path.exists(filepath): + with open(filepath, 'r') as f: + return json.load(f) + return None + +def analyze_job_metrics(all_jobs_by_release): + """Analyze and summarize job metrics.""" + summary = { + "generated": datetime.now().isoformat(), + "releases": {}, + "overall_stats": {}, + "worst_jobs": [], + "best_jobs": [], + "jobs_by_pass_rate": {} + } + + all_jobs_flat = [] + + for release, jobs in all_jobs_by_release.items(): + if not jobs: + continue + + release_stats = { + "total_jobs": len(jobs), + "total_runs": sum(j.get("current_runs", 0) for j in jobs), + "total_passes": sum(j.get("current_passes", 0) for j in jobs), + "avg_pass_rate": 0, + "jobs_below_90": 0, + "jobs_below_80": 0, + "jobs_below_50": 0, + } + + pass_rates = [] + for job in jobs: + rate = job.get("current_pass_percentage", 0) + pass_rates.append(rate) + if rate < 90: + release_stats["jobs_below_90"] += 1 + if rate < 80: + release_stats["jobs_below_80"] += 1 + if rate < 50: + release_stats["jobs_below_50"] += 1 + + # Add to flat list for overall analysis + all_jobs_flat.append({ + "release": release, + "name": job.get("name", ""), + "brief_name": job.get("brief_name", ""), + "pass_rate": rate, + "runs": job.get("current_runs", 0), + "passes": job.get("current_passes", 0), + "previous_pass_rate": job.get("previous_pass_percentage", 0), + "improvement": job.get("net_improvement", 0), + "last_pass": job.get("last_pass", ""), + "open_bugs": job.get("open_bugs", 0), + }) + + if pass_rates: + release_stats["avg_pass_rate"] = sum(pass_rates) / len(pass_rates) + + summary["releases"][release] = release_stats + + # Find worst and best performing jobs + jobs_with_runs = [j for j in all_jobs_flat if j["runs"] > 0] + if jobs_with_runs: + # Worst jobs (lowest pass rate with at least 2 runs) + jobs_with_sufficient_runs = [j for j in jobs_with_runs if j["runs"] >= 2] + summary["worst_jobs"] = sorted(jobs_with_sufficient_runs, key=lambda x: x["pass_rate"])[:20] + + # Best jobs (100% pass rate with most runs) + perfect_jobs = [j for j in jobs_with_runs if j["pass_rate"] == 100] + summary["best_jobs"] = sorted(perfect_jobs, key=lambda x: -x["runs"])[:20] + + # Group by pass rate ranges + ranges = { + "100%": [j for j in jobs_with_runs if j["pass_rate"] == 100], + "90-99%": [j for j in jobs_with_runs if 90 <= j["pass_rate"] < 100], + "80-89%": [j for j in jobs_with_runs if 80 <= j["pass_rate"] < 90], + "50-79%": [j for j in jobs_with_runs if 50 <= j["pass_rate"] < 80], + "below_50%": [j for j in jobs_with_runs if j["pass_rate"] < 50], + } + summary["jobs_by_pass_rate"] = {k: len(v) for k, v in ranges.items()} + + # Overall stats + if all_jobs_flat: + all_runs = sum(j["runs"] for j in all_jobs_flat) + all_passes = sum(j["passes"] for j in all_jobs_flat) + summary["overall_stats"] = { + "total_jobs": len(all_jobs_flat), + "total_runs": all_runs, + "total_passes": all_passes, + "overall_pass_rate": (all_passes / all_runs * 100) if all_runs > 0 else 0, + } + + return summary, all_jobs_flat + +def generate_metrics_report(summary, all_jobs): + """Generate a markdown report of job metrics.""" + report = [] + report.append("# OpenStack CI Job Metrics Report") + report.append("") + report.append(f"**Generated:** {summary['generated']}") + report.append("") + + # Overall stats + report.append("## Overall Statistics") + report.append("") + stats = summary.get("overall_stats", {}) + report.append(f"| Metric | Value |") + report.append(f"|--------|-------|") + report.append(f"| Total OpenStack Jobs Tracked | {stats.get('total_jobs', 0)} |") + report.append(f"| Total Job Runs (current period) | {stats.get('total_runs', 0)} |") + report.append(f"| Total Passes | {stats.get('total_passes', 0)} |") + report.append(f"| Overall Pass Rate | {stats.get('overall_pass_rate', 0):.1f}% |") + report.append("") + + # Pass rate distribution + report.append("## Pass Rate Distribution") + report.append("") + report.append("| Pass Rate Range | Job Count |") + report.append("|-----------------|-----------|") + for range_name, count in summary.get("jobs_by_pass_rate", {}).items(): + report.append(f"| {range_name} | {count} |") + report.append("") + + # By release + report.append("## Metrics by Release") + report.append("") + report.append("| Release | Jobs | Total Runs | Avg Pass Rate | <90% | <80% | <50% |") + report.append("|---------|------|------------|---------------|------|------|------|") + for release in RELEASES: + rel_stats = summary.get("releases", {}).get(release, {}) + if rel_stats: + report.append(f"| {release} | {rel_stats.get('total_jobs', 0)} | {rel_stats.get('total_runs', 0)} | {rel_stats.get('avg_pass_rate', 0):.1f}% | {rel_stats.get('jobs_below_90', 0)} | {rel_stats.get('jobs_below_80', 0)} | {rel_stats.get('jobs_below_50', 0)} |") + report.append("") + + # Worst performing jobs + report.append("## Worst Performing Jobs (by pass rate)") + report.append("") + report.append("Jobs with at least 2 runs, sorted by lowest pass rate:") + report.append("") + report.append("| Release | Job Name | Pass Rate | Runs | Passes |") + report.append("|---------|----------|-----------|------|--------|") + for job in summary.get("worst_jobs", [])[:15]: + report.append(f"| {job['release']} | {job['brief_name'][:60]} | {job['pass_rate']:.1f}% | {job['runs']} | {job['passes']} |") + report.append("") + + # Best performing jobs with high volume + report.append("## Best Performing Jobs (100% pass rate, most runs)") + report.append("") + report.append("| Release | Job Name | Runs | Last Pass |") + report.append("|---------|----------|------|-----------|") + for job in summary.get("best_jobs", [])[:10]: + last_pass = job['last_pass'][:10] if job['last_pass'] else "N/A" + report.append(f"| {job['release']} | {job['brief_name'][:60]} | {job['runs']} | {last_pass} |") + report.append("") + + # Jobs needing attention + report.append("## Jobs Needing Attention") + report.append("") + attention_jobs = [j for j in all_jobs if j["pass_rate"] < 80 and j["runs"] >= 2] + if attention_jobs: + report.append(f"**{len(attention_jobs)} jobs** have pass rate below 80%:") + report.append("") + for job in sorted(attention_jobs, key=lambda x: x["pass_rate"]): + report.append(f"- **{job['brief_name']}** ({job['release']}): {job['pass_rate']:.1f}% ({job['passes']}/{job['runs']} runs)") + else: + report.append("All jobs with sufficient runs have pass rate >= 80%.") + report.append("") + + # Data source + report.append("---") + report.append("") + report.append("*Data Source: Sippy (https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + # Create output directory if needed + os.makedirs(OUTPUT_DIR, exist_ok=True) + + print("=" * 60) + print("OpenStack CI Job Metrics Collector") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Check for existing progress + progress_file = "sippy_jobs_raw.json" + existing_data = load_progress(progress_file) + + if existing_data and not args.force: + print(f"Found existing data from {existing_data.get('fetched_at', 'unknown')}") + print("Use --force to refetch") + all_jobs_by_release = existing_data.get("jobs_by_release", {}) + else: + all_jobs_by_release = {} + + for release in RELEASES: + jobs = fetch_openstack_jobs_for_release(release) + all_jobs_by_release[release] = jobs + + # Save progress after each release + save_progress({ + "fetched_at": datetime.now().isoformat(), + "releases_fetched": list(all_jobs_by_release.keys()), + "jobs_by_release": all_jobs_by_release + }, progress_file) + + time.sleep(1) # Be nice to the API + + print() + print("Analyzing metrics...") + summary, all_jobs = analyze_job_metrics(all_jobs_by_release) + + # Save summary + save_progress(summary, "job_metrics_summary.json") + save_progress(all_jobs, "job_metrics_all_jobs.json") + + # Generate report + report = generate_metrics_report(summary, all_jobs) + report_path = os.path.join(OUTPUT_DIR, "job_metrics_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Report saved: {report_path}") + + print() + print("=" * 60) + print("Summary:") + print(f" Total jobs: {summary['overall_stats'].get('total_jobs', 0)}") + print(f" Total runs: {summary['overall_stats'].get('total_runs', 0)}") + print(f" Overall pass rate: {summary['overall_stats'].get('overall_pass_rate', 0):.1f}%") + print("=" * 60) + +if __name__ == "__main__": + main() diff --git a/reporting-toolkit/run_analysis.sh b/reporting-toolkit/run_analysis.sh new file mode 100755 index 0000000..b3b934d --- /dev/null +++ b/reporting-toolkit/run_analysis.sh @@ -0,0 +1,163 @@ +#!/bin/bash +# +# Run all OpenStack CI analysis scripts in the correct order. +# +# Usage: +# ./run_analysis.sh [--config-dir /path/to/ci-operator/config] [--output-dir /path/to/output] +# +# If --config-dir is not specified, defaults to ../../../ci-operator/config +# (relative to script location, assuming standard repo layout) +# +# If --output-dir is not specified, outputs to current working directory +# This allows running from any location in the filesystem + +set -e + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Default config directory (relative to script location) +CONFIG_DIR="${SCRIPT_DIR}/../../../ci-operator/config" + +# Default output directory is current working directory +OUTPUT_DIR="$(pwd)" + +# Parse arguments +while [[ $# -gt 0 ]]; do + case $1 in + --config-dir) + CONFIG_DIR="$2" + shift 2 + ;; + --output-dir) + OUTPUT_DIR="$2" + shift 2 + ;; + --force) + FORCE="--force" + shift + ;; + --help) + echo "Usage: $0 [OPTIONS]" + echo "" + echo "Options:" + echo " --config-dir DIR Path to ci-operator/config directory" + echo " --output-dir DIR Directory for output files (default: current directory)" + echo " --force Refetch data from Sippy API" + echo "" + echo "Examples:" + echo " # Run from repo root, output to current directory" + echo " ./hack/openstack-ci-analysis/reporting-toolkit/run_analysis.sh" + echo "" + echo " # Run from anywhere, specify both directories" + echo " ./run_analysis.sh --config-dir /path/to/release/ci-operator/config --output-dir /tmp/analysis" + exit 0 + ;; + *) + echo "Unknown option: $1" + exit 1 + ;; + esac +done + +# Resolve to absolute paths +CONFIG_DIR="$(cd "$CONFIG_DIR" 2>/dev/null && pwd)" || { + echo "Error: Config directory not found: $CONFIG_DIR" + echo "Use --config-dir to specify the path to ci-operator/config" + exit 1 +} + +OUTPUT_DIR="$(mkdir -p "$OUTPUT_DIR" && cd "$OUTPUT_DIR" && pwd)" + +echo "============================================================" +echo "OpenStack CI Analysis Toolkit" +echo "============================================================" +echo "Script directory: $SCRIPT_DIR" +echo "Config directory: $CONFIG_DIR" +echo "Output directory: $OUTPUT_DIR" +echo "============================================================" +echo "" + +# Phase 1: Data Collection +echo "=== Phase 1: Data Collection ===" +echo "" + +echo "[1/4] Extracting job inventory..." +python3 "$SCRIPT_DIR/extract_openstack_jobs.py" \ + --config-dir "$CONFIG_DIR" \ + --output-dir "$OUTPUT_DIR" \ + --summary + +echo "" +echo "[2/4] Fetching job metrics from Sippy..." +python3 "$SCRIPT_DIR/fetch_job_metrics.py" \ + --output-dir "$OUTPUT_DIR" \ + ${FORCE:+"$FORCE"} + +echo "" +echo "[3/4] Calculating extended metrics..." +python3 "$SCRIPT_DIR/fetch_extended_metrics.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[4/4] Fetching platform comparison data..." +python3 "$SCRIPT_DIR/fetch_comparison_data.py" \ + --output-dir "$OUTPUT_DIR" + +# Phase 2: Configuration Analysis +echo "" +echo "=== Phase 2: Configuration Analysis ===" +echo "" + +echo "[1/3] Analyzing redundancy..." +python3 "$SCRIPT_DIR/analyze_redundancy.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[2/3] Analyzing coverage gaps..." +python3 "$SCRIPT_DIR/analyze_coverage.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[3/3] Analyzing trigger patterns..." +python3 "$SCRIPT_DIR/analyze_triggers.py" \ + --output-dir "$OUTPUT_DIR" + +# Phase 3: Runtime Analysis +echo "" +echo "=== Phase 3: Runtime Analysis ===" +echo "" + +echo "[1/3] Analyzing platform comparison..." +python3 "$SCRIPT_DIR/analyze_platform_comparison.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[2/3] Analyzing workflow pass rates..." +python3 "$SCRIPT_DIR/analyze_workflow_passrate.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[3/3] Categorizing failures..." +python3 "$SCRIPT_DIR/categorize_failures.py" \ + --output-dir "$OUTPUT_DIR" + +# Summary +echo "" +echo "============================================================" +echo "Analysis Complete!" +echo "============================================================" +echo "" +echo "Output directory: $OUTPUT_DIR" +echo "" +echo "Generated Reports:" +find "$OUTPUT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | while read -r f; do + echo " - $(basename "$f")" +done +echo "" +echo "Data Files:" +find "$OUTPUT_DIR" -maxdepth 1 -name "*.json" -type f 2>/dev/null | wc -l | xargs -I {} echo " {} JSON files generated" +echo "" +echo "To view key findings, run:" +echo " cd $OUTPUT_DIR" +echo " python3 -c \"import json; d=json.load(open('extended_metrics.json')); print(f'Pass rate: {d[\\\"overall\\\"][\\\"combined_pass_rate\\\"]:.1f}%')\"" +echo ""