Skip to content

ACR High Availability - Bounded Staleness Consistency for Pushed Images in ACR Geo-Replication (Geo Sync Proxy) #880

@johnsonshi

Description

@johnsonshi

Overview

Description

This roadmap item tracks the work to support bounded staleness consistency for pushed images in Azure Container Registry (ACR) geo-replication. The goal is to provide bounded staleness — meaning images pushed to one geo-replica will be pullable from other geo-replicas within a bounded staleness window. Average and median bounded staleness times will be published in the future.

Context

Today, when an image is pushed to a geo-replicated ACR registry, the image is replicated asynchronously to other replicas via ACR's existing eventual consistency data copy. There is no guarantee or visibility into when the pushed image will be pullable from other replicas. Clients pulling from a replica other than the one where the image was pushed may encounter delays or pull failures until replication completes.

How It Works

Bounded staleness consistency works by making images pullable sooner in geo-replicas for images that need to be pulled. When a client pulls an image from a geo-replica that has not yet received the image via eventual consistency data copy, ACR will proxy the pull request on the server side (not redirect) to a replica that has the image, making it available to the client without waiting for full replication.

Important details:

  • No API for querying image availability. This feature will not expose an API for querying whether an image is available in a specific replica. Because bounded staleness works via server-side proxying to other replicas that have the image, and because cross-region proxying can fail during outages, an availability query API would not provide meaningful guarantees.

  • Eventual consistency data copy still applies. Whether or not an image is pulled, ACR's existing eventual consistency data copy will continue to replicate images to all other geo-replicas for storage backup and disaster resiliency purposes. Bounded staleness merely makes images available sooner in other geos for images that need to be pulled — it does not replace or change the existing asynchronous replication behavior.

  • Webhooks behavior is unchanged. ACR webhooks will continue to fire based on the eventual consistency data copy propagating to a local geo-replica. Webhooks are scoped to a specific regional replica and are triggered when the image arrives at that replica via replication — this behavior applies regardless of whether bounded staleness (geo sync proxy) is enabled or not, and regardless of whether the image has been pulled or not.

Problem Statement

Currently:

  • Image replication across geo-replicas is asynchronous with no consistency guarantees exposed to the user.
  • Clients have no way to know when a pushed image will be pullable from a different replica.
  • CI/CD pipelines that push to one region and pull from another may encounter race conditions where the image is not yet available in the target replica.
  • The only workaround is to add arbitrary sleep/retry delays in pipelines or to push to all replicas individually.

Use Case

  • CI/CD pipelines that push images in one region and deploy (pull) in another region can rely on bounded staleness guarantees instead of adding arbitrary sleep/retry delays.
  • Enterprises with geo-replicated registries can have confidence that images are pullable across replicas within a bounded staleness window.
  • Disaster recovery and failover workflows benefit from tighter consistency for image pulls across replicas.

Related Issues


Milestones

⏳ Private Preview

  • Private Preview available for customers to request access

⏳ Public Preview

  • Public Preview rollout in public regions
  • Public docs on MS Learn

⏳ GA

  • General Availability

Status

Active development — follow this issue for milestone updates and preview availability.

Metadata

Metadata

Assignees

Labels

feature-aks-integrationIssues realted to integration with AKSfeature-high-availabilityIssues related to geo-replication, zone redundancy and high availabilityfeature-requestIssues that request new featuresroadmapFeatures and asks that should show up on the public roadmaptriagedUse after the issue is triaged

Type

No type

Projects

Status

In Progress (Development)

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions