[AzureBackup] Fix telemetry misclassification, improve error UX, add command-boundary validation#2880
Open
shrja-ms wants to merge 5 commits into
Open
Conversation
- Update argument-hint to cover both telemetry and customer reports - Align icmdataro cluster name to icmdataro.centralus (matches query syntax) - Add field casing note to kql-customer-queries.md with cross-ref to kql-queries.md
…command-boundary validation - DppBackupOperations: InvalidOperationException -> KeyNotFoundException for policy-not-found fallback (fixes telemetry misclassification as MCP Bug instead of Customer error) - BackupStatusCommand: Improve 403 error message with RBAC hint (backupStatus/action permission, Backup Reader role, vault_get+protecteditem_get alternative) - VaultGetCommand: Add 403 error handler with RBAC guidance - GovernanceFindUnprotectedCommand: Add GetErrorMessage override with 403 RBAC hint - PolicyCreateValidator: Reject unknown --workload-type at command boundary with actionable list of supported types (fixes microsoft#2727) - ProtectedItemProtectCommand: Add --datasource-type validator checking both RSV and DPP registries at command boundary (fixes microsoft#2727) Telemetry context: Jun 9-15 analysis showed 16 MCP Tool Bugs. After PR microsoft#2805 (merged), the remaining InvalidOperationException in DppBackupOperations was the last misclassified throw. External customer failures (Antares Capital 6x 403 on backup_status, SIEMENS mcp-soc-scanner ValidationError) drove the error UX improvements.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Azure Backup MCP tool to (1) reduce telemetry “MCP Tool Bug” misclassification by using more appropriate exception types, (2) provide more actionable customer-facing authorization errors, and (3) add command-boundary validation for workload/datasource tokens to prevent service-layer ArgumentException surfacing as 500s.
Changes:
- Reclassify DPP policy “not found / not parseable” fallback from
InvalidOperationExceptiontoKeyNotFoundExceptionfor more accurate telemetry. - Improve 403 (Forbidden) UX for
backup status,vault get, andgovernance find-unprotected. - Add command-boundary validation for
--workload-type(policy create validator) and--datasource-type(protecteditem protect).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/Azure.Mcp.Tools.AzureBackup/src/Services/Policy/PolicyCreateValidator.cs | Reject unknown --workload-type early with an actionable 400-style validation issue. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Services/DppBackupOperations.cs | Throw KeyNotFoundException for policy-not-found fallback to improve telemetry classification. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Vault/VaultGetCommand.cs | Add 403 handler with RBAC guidance. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/ProtectedItem/ProtectedItemProtectCommand.cs | Add --datasource-type validator at the command boundary. |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Governance/GovernanceFindUnprotectedCommand.cs | Add command-specific error messaging (ArgumentException + 403 guidance). |
| tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Backup/BackupStatusCommand.cs | Improve 403 guidance with required permission and alternate workflow. |
| tools/Azure.Mcp.Tools.AzureBackup/skills/azurebackup-telemetry-report/SKILL.md | Broaden skill “argument-hint” wording and clarify C360 cluster reference. |
| tools/Azure.Mcp.Tools.AzureBackup/skills/azurebackup-telemetry-report/references/kql-customer-queries.md | Add note about KQL bag key casing normalization and troubleshooting. |
Comment on lines
+54
to
+71
| command.Validators.Add(commandResult => | ||
| { | ||
| if (commandResult.HasOptionResult(AzureBackupOptionDefinitions.DatasourceType.Name)) | ||
| { | ||
| var value = commandResult.GetValue<string>(AzureBackupOptionDefinitions.DatasourceType.Name); | ||
| if (!string.IsNullOrEmpty(value) && | ||
| Services.RsvDatasourceRegistry.Resolve(value) is null && | ||
| !Services.DppDatasourceRegistry.AllProfiles.Any(p => | ||
| p.FriendlyName.Equals(value, StringComparison.OrdinalIgnoreCase) || | ||
| p.Aliases.Any(a => a.Equals(value, StringComparison.OrdinalIgnoreCase)))) | ||
| { | ||
| commandResult.AddError( | ||
| $"Unknown datasource type '{value}'. " + | ||
| $"RSV types: {string.Join(", ", Services.RsvDatasourceRegistry.KnownTypeNames)}. " + | ||
| $"DPP types: {string.Join(", ", Services.DppDatasourceRegistry.KnownTypeNames)}."); | ||
| } | ||
| } | ||
| }); |
Comment on lines
+54
to
+58
| command.Validators.Add(commandResult => | ||
| { | ||
| if (commandResult.HasOptionResult(AzureBackupOptionDefinitions.DatasourceType.Name)) | ||
| { | ||
| var value = commandResult.GetValue<string>(AzureBackupOptionDefinitions.DatasourceType.Name); |
Comment on lines
+139
to
+140
| RequestFailedException reqEx when reqEx.Status == (int)HttpStatusCode.Forbidden => | ||
| "Authorization failed listing vaults. Ensure you have 'Reader' or 'Backup Reader' role at subscription scope.", |
Comment on lines
+34
to
+38
| if (family == WorkloadFamily.Unknown && !string.IsNullOrWhiteSpace(workload)) | ||
| { | ||
| issues.Add(new PolicyValidationIssue( | ||
| $"--{AzureBackupOptionDefinitions.WorkloadTypeName}", | ||
| $"Unknown workload type '{workload}'. Supported types: " + |
… tests - ProtectedItemProtectCommand: Remove Services. prefix, add ArmResourceType and TryAutoDetect checks, use IsNullOrWhiteSpace to catch whitespace-only inputs - VaultGetCommand: Change 'listing vaults' to 'accessing vault information' for neutral phrasing across single/list paths - Add ProtectedItemProtectCommandTests: RejectsUnknownDatasourceType (garbage, SQL injection, whitespace) and AcceptsValidDatasourceType (RSV types, DPP types, ARM resource types, auto-detect base types) - Add PolicyCreateValidatorTests: Validate_UnknownWorkloadType_RejectsWithActionableMessage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix telemetry misclassification, improve customer-facing error messages, and add command-boundary validation for
--workload-typeand--datasource-typeparameters.Changes (6 files, 49 insertions, 3 deletions)
P1 - Telemetry Misclassification Fix
InvalidOperationExceptiontoKeyNotFoundExceptionfor policy-not-found fallback inGetPolicyAsync. This was the last remainingInvalidOperationExceptionthrow in the Azure Backup toolset. Eliminates false MCP Bug counts in telemetry - the error is now correctly classified as a Customer error (404 resource not found).P2 - Error UX Improvements (driven by Jun 9-15 telemetry)
Microsoft.RecoveryServices/locations/backupStatus/actionpermission, suggestsBackup Readerrole at subscription scope, and recommendsvault_get + protecteditem_getas an alternative workflow. Directly addresses Antares Capital 6x 403 pattern.GetErrorMessageoverride - handlesArgumentException, 403 with RBAC hint, and generalRequestFailedException. Previously all errors fell through to the base class default message.P2 - Command-Boundary Validation (fixes #2727)
--workload-typevalues at the command boundary with an actionable list of all supported types (VM, SQL, SAPHANA, SAPASE, AzureFileShare, AzureDisk, AzureBlob, AKS, ElasticSAN, PostgreSQLFlexible, ADLS, CosmosDB).--datasource-typethat checks both RSV (RsvDatasourceRegistry) and DPP (DppDatasourceRegistry) registries at command boundary.Telemetry Context
Jun 9-15, 2026 analysis (334 calls, 188 failures, 16 MCP Tool Bugs):
InvalidOperationException422 fromListVaultsAsyncTask.WhenAll)InvalidOperationExceptionin DPP policy operationsInvalidOperationExceptionthrows remain in Azure Backup toolset (except the legitimate both-fail fallback)Invoking Livetests
Copilot submitted PRs are not trustworthy by default. Users with
writeaccess to the repo need to validate the contents of this PR before leaving a comment with the text/azp run mcp - pullrequest - live. This will trigger the necessary livetest workflows to complete required validation.