Skip to content

[AzureBackup] Fix telemetry misclassification, improve error UX, add command-boundary validation#2880

Open
shrja-ms wants to merge 5 commits into
microsoft:mainfrom
shrja-ms:user/azurebackup-error-ux-and-validation-jun15
Open

[AzureBackup] Fix telemetry misclassification, improve error UX, add command-boundary validation#2880
shrja-ms wants to merge 5 commits into
microsoft:mainfrom
shrja-ms:user/azurebackup-error-ux-and-validation-jun15

Conversation

@shrja-ms

Copy link
Copy Markdown
Contributor

Summary

Fix telemetry misclassification, improve customer-facing error messages, and add command-boundary validation for --workload-type and --datasource-type parameters.

Changes (6 files, 49 insertions, 3 deletions)

P1 - Telemetry Misclassification Fix

  • DppBackupOperations.cs: Change InvalidOperationException to KeyNotFoundException for policy-not-found fallback in GetPolicyAsync. This was the last remaining InvalidOperationException throw in the Azure Backup toolset. Eliminates false MCP Bug counts in telemetry - the error is now correctly classified as a Customer error (404 resource not found).

P2 - Error UX Improvements (driven by Jun 9-15 telemetry)

  • BackupStatusCommand.cs: Improve 403 error message - now mentions the required Microsoft.RecoveryServices/locations/backupStatus/action permission, suggests Backup Reader role at subscription scope, and recommends vault_get + protecteditem_get as an alternative workflow. Directly addresses Antares Capital 6x 403 pattern.
  • VaultGetCommand.cs: Add 403 error handler with Reader or Backup Reader RBAC guidance.
  • GovernanceFindUnprotectedCommand.cs: Add GetErrorMessage override - handles ArgumentException, 403 with RBAC hint, and general RequestFailedException. Previously all errors fell through to the base class default message.

P2 - Command-Boundary Validation (fixes #2727)

  • PolicyCreateValidator.cs: Reject unknown --workload-type values at the command boundary with an actionable list of all supported types (VM, SQL, SAPHANA, SAPASE, AzureFileShare, AzureDisk, AzureBlob, AKS, ElasticSAN, PostgreSQLFlexible, ADLS, CosmosDB).
  • ProtectedItemProtectCommand.cs: Add validator for --datasource-type that checks both RSV (RsvDatasourceRegistry) and DPP (DppDatasourceRegistry) registries at command boundary.

Telemetry Context

Jun 9-15, 2026 analysis (334 calls, 188 failures, 16 MCP Tool Bugs):

Invoking Livetests

Copilot submitted PRs are not trustworthy by default. Users with write access to the repo need to validate the contents of this PR before leaving a comment with the text /azp run mcp - pullrequest - live. This will trigger the necessary livetest workflows to complete required validation.

shrja-ms added 2 commits June 12, 2026 07:35
- Update argument-hint to cover both telemetry and customer reports
- Align icmdataro cluster name to icmdataro.centralus (matches query syntax)
- Add field casing note to kql-customer-queries.md with cross-ref to kql-queries.md
…command-boundary validation

- DppBackupOperations: InvalidOperationException -> KeyNotFoundException for policy-not-found
  fallback (fixes telemetry misclassification as MCP Bug instead of Customer error)
- BackupStatusCommand: Improve 403 error message with RBAC hint (backupStatus/action permission,
  Backup Reader role, vault_get+protecteditem_get alternative)
- VaultGetCommand: Add 403 error handler with RBAC guidance
- GovernanceFindUnprotectedCommand: Add GetErrorMessage override with 403 RBAC hint
- PolicyCreateValidator: Reject unknown --workload-type at command boundary with actionable
  list of supported types (fixes microsoft#2727)
- ProtectedItemProtectCommand: Add --datasource-type validator checking both RSV and DPP
  registries at command boundary (fixes microsoft#2727)

Telemetry context: Jun 9-15 analysis showed 16 MCP Tool Bugs. After PR microsoft#2805 (merged),
the remaining InvalidOperationException in DppBackupOperations was the last misclassified
throw. External customer failures (Antares Capital 6x 403 on backup_status, SIEMENS
mcp-soc-scanner ValidationError) drove the error UX improvements.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Azure Backup MCP tool to (1) reduce telemetry “MCP Tool Bug” misclassification by using more appropriate exception types, (2) provide more actionable customer-facing authorization errors, and (3) add command-boundary validation for workload/datasource tokens to prevent service-layer ArgumentException surfacing as 500s.

Changes:

  • Reclassify DPP policy “not found / not parseable” fallback from InvalidOperationException to KeyNotFoundException for more accurate telemetry.
  • Improve 403 (Forbidden) UX for backup status, vault get, and governance find-unprotected.
  • Add command-boundary validation for --workload-type (policy create validator) and --datasource-type (protecteditem protect).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/Azure.Mcp.Tools.AzureBackup/src/Services/Policy/PolicyCreateValidator.cs Reject unknown --workload-type early with an actionable 400-style validation issue.
tools/Azure.Mcp.Tools.AzureBackup/src/Services/DppBackupOperations.cs Throw KeyNotFoundException for policy-not-found fallback to improve telemetry classification.
tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Vault/VaultGetCommand.cs Add 403 handler with RBAC guidance.
tools/Azure.Mcp.Tools.AzureBackup/src/Commands/ProtectedItem/ProtectedItemProtectCommand.cs Add --datasource-type validator at the command boundary.
tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Governance/GovernanceFindUnprotectedCommand.cs Add command-specific error messaging (ArgumentException + 403 guidance).
tools/Azure.Mcp.Tools.AzureBackup/src/Commands/Backup/BackupStatusCommand.cs Improve 403 guidance with required permission and alternate workflow.
tools/Azure.Mcp.Tools.AzureBackup/skills/azurebackup-telemetry-report/SKILL.md Broaden skill “argument-hint” wording and clarify C360 cluster reference.
tools/Azure.Mcp.Tools.AzureBackup/skills/azurebackup-telemetry-report/references/kql-customer-queries.md Add note about KQL bag key casing normalization and troubleshooting.

Comment on lines +54 to +71
command.Validators.Add(commandResult =>
{
if (commandResult.HasOptionResult(AzureBackupOptionDefinitions.DatasourceType.Name))
{
var value = commandResult.GetValue<string>(AzureBackupOptionDefinitions.DatasourceType.Name);
if (!string.IsNullOrEmpty(value) &&
Services.RsvDatasourceRegistry.Resolve(value) is null &&
!Services.DppDatasourceRegistry.AllProfiles.Any(p =>
p.FriendlyName.Equals(value, StringComparison.OrdinalIgnoreCase) ||
p.Aliases.Any(a => a.Equals(value, StringComparison.OrdinalIgnoreCase))))
{
commandResult.AddError(
$"Unknown datasource type '{value}'. " +
$"RSV types: {string.Join(", ", Services.RsvDatasourceRegistry.KnownTypeNames)}. " +
$"DPP types: {string.Join(", ", Services.DppDatasourceRegistry.KnownTypeNames)}.");
}
}
});
Comment on lines +54 to +58
command.Validators.Add(commandResult =>
{
if (commandResult.HasOptionResult(AzureBackupOptionDefinitions.DatasourceType.Name))
{
var value = commandResult.GetValue<string>(AzureBackupOptionDefinitions.DatasourceType.Name);
Comment on lines +139 to +140
RequestFailedException reqEx when reqEx.Status == (int)HttpStatusCode.Forbidden =>
"Authorization failed listing vaults. Ensure you have 'Reader' or 'Backup Reader' role at subscription scope.",
Comment on lines +34 to +38
if (family == WorkloadFamily.Unknown && !string.IsNullOrWhiteSpace(workload))
{
issues.Add(new PolicyValidationIssue(
$"--{AzureBackupOptionDefinitions.WorkloadTypeName}",
$"Unknown workload type '{workload}'. Supported types: " +
shrja-ms added 3 commits June 15, 2026 17:12
… tests

- ProtectedItemProtectCommand: Remove Services. prefix, add ArmResourceType and
  TryAutoDetect checks, use IsNullOrWhiteSpace to catch whitespace-only inputs
- VaultGetCommand: Change 'listing vaults' to 'accessing vault information' for
  neutral phrasing across single/list paths
- Add ProtectedItemProtectCommandTests: RejectsUnknownDatasourceType (garbage,
  SQL injection, whitespace) and AcceptsValidDatasourceType (RSV types, DPP
  types, ARM resource types, auto-detect base types)
- Add PolicyCreateValidatorTests: Validate_UnknownWorkloadType_RejectsWithActionableMessage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Untriaged

Development

Successfully merging this pull request may close these issues.

AzureBackup: Apply command-boundary workload/datasource validation to DPP protect and policy-create surfaces (NEW-4 follow-up)

2 participants