Updates/20251204-01 by stewartshea · Pull Request #605 · runwhen-contrib/rw-cli-codecollection

stewartshea · 2025-12-04T22:04:15Z

Added comprehensive analysis for server errors, throttling, user errors, and storage capacity in the service bus metrics script, providing detailed metrics and recommendations for each issue.
Improved queue and topic health scripts to include analysis of disabled queues/topics and message backlog, enhancing visibility into operational issues.
Introduced context, investigation steps, and recommendations for each identified issue, aiding in troubleshooting and resolution.
Updated runbook to reflect new timeout settings for improved reliability during execution of cost health analysis tasks.

Note

Enhances Service Bus diagnostics, introduces data-driven storage savings (tiering/redundancy) and improved VM underutilization logic, and raises runbook timeouts for cost analyses.

Service Bus Health:
- Namespace Metrics (service_bus_metrics.sh): Adds detailed calculations and rich context for ServerErrors, ThrottledRequests, UserErrors, and Size (totals/max/averages, error rates, message imbalance) with actionable recommendations and updated issue titles.
- Queue Health (service_bus_queue_health.sh): Adds disabled-queue detection with counts/timestamps; expands backlog and size analyses with additional metrics (scheduled/transfer counts, config context) and structured guidance.
- Topic & Subscriptions (service_bus_topic_health.sh): Adds disabled-topic analysis; augments topic size checks (subscriptions, partitioning, status) and subscription checks (dead-letter, backlogs, disabled state) with detailed remediation steps.
Storage Cost Optimization (analyze_storage_optimization.sh):
- Introduces blob tier pricing helpers and capacity retrieval via Azure Monitor; computes per-account potential savings for missing lifecycle policies (tiering to Cool/Archive) and geo-redundancy downgrades, rolling up monthly/annual estimates and severity; enriches issue titles with savings.
VM Optimization (analyze_vm_optimization.sh):
- Refines underutilization detection when memory metrics are unavailable (average-CPU–based with peak guard), improves logging/recommendations, and surfaces savings/context in outputs.
Runbook (runbook.robot):
- Increases timeouts to 900s across tasks for more reliable long-running analyses.

^{Written by Cursor Bugbot for commit 63566f7. This will update automatically on new commits. Configure here.}

- Added logging for VM names during optimization analysis to improve traceability. - Updated CPU-only analysis logic to use average CPU as the primary metric when memory metrics are unavailable, refining underutilization detection. - Enhanced reporting for underutilized VMs, providing clearer recommendations based on average and peak CPU metrics. - Improved documentation within the script to clarify the analysis approach and thresholds used for identifying underutilization.

…nalysis - Added comprehensive analysis for server errors, throttling, user errors, and storage capacity in the service bus metrics script, providing detailed metrics and recommendations for each issue. - Improved queue and topic health scripts to include analysis of disabled queues/topics and message backlog, enhancing visibility into operational issues. - Introduced context, investigation steps, and recommendations for each identified issue, aiding in troubleshooting and resolution. - Updated runbook to reflect new timeout settings for improved reliability during execution of cost health analysis tasks.

cursor

Bug: Undefined variables used in subscription backlog analysis

The MESSAGE BACKLOG ANALYSIS section references $status and $max_delivery_count variables that are only defined inside the dead-letter check block (lines 272-273). When a subscription has high active message count but NOT high dead-letter count, these variables will be unset. Since the script uses set -u, this causes a script failure. The service_bus_queue_health.sh file correctly fetches these variables within the active message count block, but this pattern was not followed in the topic health script.

codebundles/azure-servicebus-health/service_bus_topic_health.sh#L332-L334

rw-cli-codecollection/codebundles/azure-servicebus-health/service_bus_topic_health.sh

Lines 332 to 334 in 1dbe9ba

    
           - Topic: $topic_name 
        
           - Subscription Status: $status 
        
           - Max Delivery Count: $max_delivery_count

codebundles/azure-servicebus-health/service_bus_metrics.sh

- Updated the message imbalance calculation to use `bc` for float-safe arithmetic, enhancing accuracy in metrics analysis. - Improved comments for clarity on the calculation process, ensuring better understanding of the script's functionality.

cursor · 2025-12-05T11:03:13Z

codebundles/azure-subscription-cost-health/analyze_storage_optimization.sh

+            local savings_note="N/A"
+
+            # Calculate potential savings if we have capacity data
+            if [[ "$access_tier" == "Hot" ]] && (( $(echo "$capacity_gb > 0" | bc -l) )); then


Bug: Hot tier accounts miscounted when capacity data unavailable

The hot_tier_accounts counter is only incremented when both the access tier is "Hot" AND capacity_gb > 0. However, if there are Hot tier storage accounts whose capacity metrics are unavailable (returning 0), they won't be counted. Later, at lines 601-608, when hot_tier_accounts is 0, the message incorrectly states "No Hot tier accounts found" even though Hot tier accounts may exist - they just lack capacity data. This causes misleading output and incorrect severity assignment.

Additional Locations (1)

codebundles/azure-subscription-cost-health/analyze_storage_optimization.sh#L600-L608

codebundles/azure-servicebus-health/service_bus_metrics.sh

- Updated the service bus metrics script to ensure that calculations for total errors, throttled requests, incoming messages, outgoing messages, and user errors default to zero when no data is available, improving reliability and preventing potential errors during execution. - Enhanced the analysis of storage metrics to include similar default handling, ensuring consistent behavior across the script and better handling of edge cases.

stewartshea added 2 commits December 4, 2025 18:31

stewartshea requested a review from a team as a code owner December 4, 2025 22:04

cursor bot reviewed Dec 4, 2025

View reviewed changes

codebundles/azure-servicebus-health/service_bus_metrics.sh Outdated Show resolved Hide resolved

cursor bot reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates/20251204-01#605

Updates/20251204-01#605
stewartshea wants to merge 4 commits intorunwhen-contrib:mainfrom
stewartshea:updates/20251204-01

stewartshea commented Dec 4, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cursor bot Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Topic: $topic_name
	- Subscription Status: $status
	- Max Delivery Count: $max_delivery_count

Conversation

stewartshea commented Dec 4, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Undefined variables used in subscription backlog analysis

Uh oh!

Uh oh!

cursor bot Dec 5, 2025

Choose a reason for hiding this comment

Bug: Hot tier accounts miscounted when capacity data unavailable

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stewartshea commented Dec 4, 2025 •

edited by cursor bot

Loading