Skip to content

[ONBOARD] Azure.Mcp.Tools.ResilienceManagement #2894

@LavishSingal

Description

@LavishSingal

[ONBOARD] Azure.Mcp.Tools.ResilienceManagement (namespace: resilience)

Service / Tool Name

Azure Resilience Management (Azure.Mcp.Tools.ResilienceManagement)

Azure Resilience Management is the unified resource provider for goal-based resilience, recovery (failover / reprotect / finalize), and disaster recovery drills across Azure workloads.

This onboarding introduces a new Azure MCP toolset with command namespace resilience so AI agents can inspect resilience posture, plan and execute failovers, run DR drills, and manage resilience goals from chat.

Contacts

Kumar Gautam (kgautam), Umesh Kumar Patel (umpatel), Anubhav (anubh)

Intended Agent Scenarios

Overview

Enable conversational operations for resilience posture review, planned failover, DR drill rehearsals, goal-based onboarding, and operational triage.

All flows assume --service-group <name> is passed on relevant commands.

Command Inventory

Recovery > Plan

  • resilience recovery plan list (RecoveryPlans_List)
  • resilience recovery plan show (RecoveryPlans_Get)
  • resilience recovery plan create-or-update (RecoveryPlans_CreateOrUpdate)
  • resilience recovery plan update (RecoveryPlans_Update)
  • resilience recovery plan delete (RecoveryPlans_Delete)

Recovery > Plan Actions

  • resilience recovery plan check-readiness (RecoveryPlanActions_CheckReadiness)
  • resilience recovery plan failover (RecoveryPlanActions_Failover)
  • resilience recovery plan failover-commit (RecoveryPlanActions_FailoverCommit)
  • resilience recovery plan finalize (RecoveryPlanActions_Finalize)
  • resilience recovery plan reprotect (RecoveryPlanActions_Reprotect)
  • resilience recovery plan test-failover (RecoveryPlanActions_TestFailover)
  • resilience recovery plan test-failover-cleanup (RecoveryPlanActions_TestFailoverCleanup)
  • resilience recovery plan update-resources (RecoveryPlanActions_UpdateResources)
  • resilience recovery plan validate-for-failover (RecoveryPlanActions_ValidateForFailover)
  • resilience recovery plan validate-for-failover-commit (RecoveryPlanActions_ValidateForFailoverCommit)
  • resilience recovery plan validate-for-reprotect (RecoveryPlanActions_ValidateForReprotect)
  • resilience recovery plan validate-for-test-failover (RecoveryPlanActions_ValidateForTestFailover)
  • resilience recovery plan validate-for-test-failover-cleanup (RecoveryPlanActions_ValidateForTestFailoverCleanup)
  • resilience recovery plan validate-for-operation (RecoveryPlanActions_ValidateForOperation)

Recovery > Plan > Resource

  • resilience recovery plan resource list (RecoveryResources_List)
  • resilience recovery plan resource show (RecoveryResources_Get)

Recovery > Job

  • resilience recovery job list (RecoveryJobs_List)
  • resilience recovery job show (RecoveryJobs_Get)
  • resilience recovery job cancel (RecoveryJobs_Cancel)
  • resilience recovery job resume (RecoveryJobs_Resume)
  • resilience recovery job retry (RecoveryJobs_Retry)

Recovery > Job > Resource

  • resilience recovery job resource list (RecoveryJobResources_List)
  • resilience recovery job resource show (RecoveryJobResources_Get)

Drill

  • resilience drill list (Drills_List)
  • resilience drill show (Drills_Get)
  • resilience drill create-or-update (Drills_Create)
  • resilience drill update (Drills_Update)
  • resilience drill delete (Drills_Delete)
  • resilience drill add-or-update-resources (Drills_AddOrUpdateResources)
  • resilience drill start (Drills_Start)
  • resilience drill end (Drills_End)
  • resilience drill validate-for-execution (Drills_ValidateForExecution)
  • resilience drill resync-readiness-check (Drills_ResyncReadinessCheck)

Drill > Resource

  • resilience drill resource list (DrillResources_List)
  • resilience drill resource show (DrillResources_Get)

Drill > Run

  • resilience drill run list (DrillRuns_List)
  • resilience drill run show (DrillRuns_Get)
  • resilience drill run add-notes (DrillRuns_AddNotes)
  • resilience drill run failover (DrillRuns_FailOver)
  • resilience drill run mark-complete (DrillRuns_MarkAsComplete)
  • resilience drill run reprotect (DrillRuns_Reprotect)
  • resilience drill run resume (DrillRuns_Resume)

Drill > Run > Resource

  • resilience drill run resource list (DrillRunResources_List)
  • resilience drill run resource show (DrillRunResources_Get)

Goal > Assignment

  • resilience goal assignment list (GoalAssignments_List)
  • resilience goal assignment show (GoalAssignments_Get)
  • resilience goal assignment create-or-update (GoalAssignments_CreateOrUpdate)
  • resilience goal assignment update (GoalAssignments_Update)
  • resilience goal assignment delete (GoalAssignments_Delete)
  • resilience goal assignment recommend-capacity (GoalAssignments_RecommendCapacity)
  • resilience goal assignment refresh-resources (GoalAssignments_RefreshGoalResources)
  • resilience goal assignment update-resources (GoalAssignments_UpdateGoalResources)

Goal > Assignment > Resource

  • resilience goal assignment resource list (GoalResources_List)
  • resilience goal assignment resource show (GoalResources_Get)

Goal > Template

  • resilience goal template list (GoalTemplates_List)
  • resilience goal template show (GoalTemplates_Get)
  • resilience goal template create-or-update (GoalTemplates_CreateOrUpdate)
  • resilience goal template update (GoalTemplates_Update)
  • resilience goal template delete (GoalTemplates_Delete)

Unified Items

  • resilience servicegroup-metadata-item list (UnifiedResilienceItems_List)
  • resilience servicegroup-metadata-item show (UnifiedResilienceItems_Get)

Usage Plan

  • resilience usage-plan list (UsagePlans_ListBySubscription / UsagePlans_ListByResourceGroup)
  • resilience usage-plan show (UsagePlans_Get)
  • resilience usage-plan create-or-update (UsagePlans_CreateOrUpdate)
  • resilience usage-plan update (UsagePlans_Update)
  • resilience usage-plan delete (UsagePlans_Delete)

Usage Plan > Enrollment

  • resilience usage-plan enrollment list (Enrollments_List)
  • resilience usage-plan enrollment show (Enrollments_Get)
  • resilience usage-plan enrollment create-or-update (Enrollments_CreateOrUpdate)
  • resilience usage-plan enrollment delete (Enrollments_Delete)

Representative End-to-End Agent Flows

  • DR posture review: usage-plan list -> goal assignment list + goal assignment resource list -> servicegroup-metadata-item list
  • Planned failover: recovery plan show -> recovery plan check-readiness -> recovery plan validate-for-failover -> recovery plan failover -> recovery job list -> recovery plan failover-commit -> recovery plan reprotect
  • DR drill rehearsal: drill create -> drill validate-for-execution -> drill start -> drill run failover -> drill run mark-complete -> drill run add-notes -> drill end
  • Goal-based onboarding: goal template list -> goal assignment create -> goal assignment recommend-capacity -> goal assignment refresh-resources
  • Operational triage: recovery job list -> recovery job show -> recovery job cancel / recovery job retry / recovery job resume

Rollout Plan

  • PR 1: read-only surface across all groups (all list and show commands).
    • Goal: establish toolset plumbing end-to-end with zero blast radius.
    • Includes: IAreaSetup, BaseAzureResourceService, non-standard serviceGroupName path parameter handling, pagination, parameter shapes, telemetry, and live-test scaffolding.

Subsequent PRs (one per subgroup):

  • Recovery plans: create, update, delete
  • Recovery plan actions (safe/write split): check-readiness, update-resources, all validate-for-*
  • Recovery plan actions (mutating LRO): failover, failover-commit, reprotect, finalize, test-failover, test-failover-cleanup
  • Recovery jobs: cancel, resume, retry
  • Drills: create, update, delete, add-resources
  • Drill lifecycle: start, end, validate-for-execution, resync-readiness-check
  • Drill runs: failover, mark-complete, reprotect, resume, add-notes
  • Goal templates: create, update, delete
  • Goal assignments: create, update, delete
  • Goal assignment actions: recommend-capacity, refresh-resources, update-resources
  • Usage plans: create, update, delete
  • Enrollments: create, delete

Out Of Scope (Initial Release)

  • Direct OperationStatus_Get command surface (LRO polling remains internal to the SDK/tool implementation).

Timeline

June 2026

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestneeds-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.onboardingNew service or MCP child server to onboard
    No fields configured for Feature.

    Projects

    Status
    Untriaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions