Skip to content

Refactor Azure DevOps Health Monitoring Scripts to Enhance Error Handling and Authentication#626

Merged
stewartshea merged 2 commits intorunwhen-contrib:mainfrom
stewartshea:updates/030426-01
Mar 4, 2026
Merged

Refactor Azure DevOps Health Monitoring Scripts to Enhance Error Handling and Authentication#626
stewartshea merged 2 commits intorunwhen-contrib:mainfrom
stewartshea:updates/030426-01

Conversation

@stewartshea
Copy link
Contributor

@stewartshea stewartshea commented Mar 4, 2026

  • Consolidated authentication setup across multiple scripts by introducing a common setup_azure_auth function, improving code maintainability and clarity.
  • Enhanced error handling for Azure DevOps API calls, providing clearer feedback on failures and next steps for users.
  • Updated scripts to utilize a consistent method for retrieving data, ensuring better reliability and reducing the likelihood of errors during execution.
  • Improved documentation and output messages to reflect changes in functionality and enhance user understanding of monitoring processes.

Note

Medium Risk
Touches many runbook scripts and changes how all Azure DevOps API calls are executed (timeouts/retries and auth setup), which could alter behavior in edge cases or environments missing timeout/with different CLI auth state.

Overview
Improves reliability of Azure DevOps health checks by introducing shared script helpers (_az_helpers.sh) that centralize Azure DevOps CLI extension setup + PAT/SP auth (setup_azure_auth) and wrap az calls with per-call timeouts, exponential-backoff retries, and standardized stdout capture (az_with_retry).

Updates the existing health scripts to use these helpers instead of ad-hoc az invocations and error-log parsing, and tweaks issue messages to explicitly call out API unavailability/timeouts and suggest network/service-health checks.

Adds a new preflight-check.sh and wires it into runbook.robot suite initialization to report identity + per-scope access results; the runbook also now emits explicit issues when downstream scripts fail to produce valid JSON, including the preflight summary for troubleshooting.

Written by Cursor Bugbot for commit de8f2d7. This will update automatically on new commits. Configure here.

…ling and Authentication

- Consolidated authentication setup across multiple scripts by introducing a common `setup_azure_auth` function, improving code maintainability and clarity.
- Enhanced error handling for Azure DevOps API calls, providing clearer feedback on failures and next steps for users.
- Updated scripts to utilize a consistent method for retrieving data, ensuring better reliability and reducing the likelihood of errors during execution.
- Improved documentation and output messages to reflect changes in functionality and enhance user understanding of monitoring processes.
@stewartshea stewartshea requested a review from a team as a code owner March 4, 2026 18:41
- Improved error handling in the `az_with_retry` function by capturing the exit code after executing the Azure command, ensuring that error messages are logged and processed correctly.
- Streamlined the removal of the error log file to occur only when the command succeeds, enhancing clarity in error reporting.
@stewartshea stewartshea merged commit d997b91 into runwhen-contrib:main Mar 4, 2026
2 checks passed
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

echo "$AZURE_DEVOPS_PAT" | az devops login --organization "$org_url"
else
echo "Using service principal authentication..."
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTH_TYPE validation silently removed from shared auth function

Medium Severity

The new setup_azure_auth function treats any AUTH_TYPE value that isn't "pat" as service principal authentication without error. The old code in five scripts (agent-pools, pipeline-logs, queued-pipelines, service-connections, discover-projects) explicitly validated AUTH_TYPE and exited with an error for invalid values. Now, a misconfigured AUTH_TYPE (e.g., "pat_token") silently proceeds without the intended PAT login, causing subsequent API calls to fail with unclear auth errors instead of an immediate actionable message.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant