Summary
When using StreamingPull with the default flow control settings (1000 outstanding messages), existing subscribers hoard messages in their client buffer and gRPC stream buffer. When new subscribers connect (e.g., via Kubernetes HPA autoscaling), they receive zero messages because all messages are locked inside existing subscribers' memory.
Scaling to zero and scaling back up fixes the problem immediately — all subscribers receive messages evenly. This confirms the issue is with how messages are distributed to existing vs new subscribers.
Environment
- Client library: google-cloud-pubsub (Java)
- Subscription type: StreamingPull
- Infrastructure: GKE (Google Kubernetes Engine) with HPA (Horizontal Pod Autoscaler)
- Processing: Single-threaded message processing (1 executor thread), each message triggers a large file download (~5 MB), so processing is slow (~30 seconds per message)
- Configuration:
parallelPullCount = 2, executorProviderThreadCount = 1, no explicit setFlowControlSettings()
Problem Description
Setup
- 5 subscriber pods running continuously (HPA min replicas = 5, max = 50)
- Each pod uses StreamingPull with default flow control (1000 messages outstanding)
- Each pod processes 1 message at a time (single threaded)
What happens
- A burst of messages arrives (e.g., 75,000 messages)
- The 5 existing pods receive all messages into their buffers
- HPA detects high lag and scales to 50 pods
- The new 45 pods connect but receive ZERO messages
- Only the original 5 pods process messages
- Lag keeps increasing despite 50 pods running
What fixes it
- Scale to 0 (all pods die, all messages released)
- Scale back to 50
- All 50 pods now receive messages evenly
- Lag drops to 0 within 15-30 minutes
Root Cause Analysis
1. Two buffers inside each subscriber pod
Based on the explanation by a Google engineer in googleapis/python-pubsub#237, there are two separate buffers inside each subscriber:
| Buffer |
Location |
Size |
Ack deadline extended? |
| Client buffer |
Inside the application (RAM) |
Controlled by maxOutstandingElementCount (default: 1000) |
Yes - client library extends automatically |
| gRPC stream buffer |
Between gRPC layer and application (RAM) |
Server sends messages in packets up to 10 MB |
No - messages can expire here |
The Google engineer stated:
"Unless the flow control is being applied at the gRPC level, messages will be buffered in the gRPC stream, where more than the specified number of messages can be present due to the Pub/Sub server sending out messages in packets ≤ 10 MiB in size. We currently don't support server-side flow control."
2. Message hoarding calculation
With a message size of ~1 KB:
| What |
Calculation |
Result |
| Messages in one 10 MB gRPC packet |
10,000 KB / 1 KB |
~10,000 messages |
| gRPC buffer per pod (2 streams) |
2 × 10,000 |
~20,000 messages |
| Client buffer per pod (default) |
1000 |
1,000 messages |
| Total per pod |
20,000 + 1,000 |
~21,000 messages |
| Total for 5 pods |
5 × 21,000 |
~105,000 messages |
With a lag of 75,000 messages, all messages fit inside the 5 existing pods' buffers. Zero messages are available for new pods.
3. Server routing favors existing subscribers
From Google's troubleshooting documentation:
"The system dynamically directs more messages towards consumers that demonstrate higher capacity — those that acknowledge messages quickly and are not constrained by their flow control settings."
"Messages could be outstanding to clients already, and a backlog of unacknowledged messages doesn't necessarily mean you'll receive those messages on your next pull request."
Existing pods have ack history and appear to have high capacity (1000 outstanding messages, not constrained by flow control). New pods have zero history. The server keeps routing to existing pods.
4. Why the client buffer is the critical problem
-
Client buffer (1000 messages): Ack deadline is extended automatically by the client library. These messages are permanently locked to the pod until processed. With 1 thread and ~30 seconds per message, processing 1000 messages takes ~8 hours.
-
gRPC buffer (~20,000 messages): Ack deadline is NOT extended. These messages expire and return to the server. However, the server often re-delivers them to the same existing pods (because those pods appear to have higher capacity).
The combination of 1000 permanently-locked messages per pod + server routing preference = new subscribers are starved.
Workaround / Fix
Setting maxOutstandingElementCount to a low value (e.g., 10) significantly improves message distribution:
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
.setFlowControlSettings(
FlowControlSettings.newBuilder()
.setMaxOutstandingElementCount(10L)
.setLimitExceededBehavior(FlowController.LimitExceededBehavior.Block)
.build()
)
.build();
With maxOutstandingElementCount = 10:
- Each pod locks only 10 messages (instead of 1000) with extended ack deadlines
- Pods become "constrained by flow control" almost instantly
- Server routing shifts to new pods that are "not constrained"
- New subscribers receive messages
However, the gRPC stream buffer still receives ~10,000 messages in the initial 10 MB packet (since there is no server-side flow control). These messages expire and return to the server, but this creates unnecessary churn.
Feature Request
Server-side flow control for StreamingPull
The Google engineer in #237 stated: "We currently don't support server-side flow control."
If the server respected the client's maxOutstandingElementCount when sending messages (i.e., not sending more than the client's limit), it would:
- Prevent gRPC buffer flooding (no unnecessary 10 MB packets)
- Reduce message expiry and re-delivery churn
- Improve message distribution to new subscribers immediately
- Make Pub/Sub work better with Kubernetes HPA autoscaling
Documentation improvement
The current documentation does not clearly explain:
- The two-buffer model (client buffer + gRPC stream buffer)
- How default flow control (1000 messages) causes message hoarding with slow processors
- The impact on Kubernetes HPA autoscaling (new pods getting zero messages)
- The recommended flow control settings for slow-processing subscribers
Adding a section to the StreamingPull documentation or troubleshooting guide about this scenario would help many users running Pub/Sub on Kubernetes with HPA.
References
- googleapis/python-pubsub#237 - Google engineer confirms two-buffer behavior
- Troubleshooting pull subscriptions - Server routing based on flow control constraints
- Testing Cloud Pub/Sub clients - Default flow control is 1000 messages
Summary
When using StreamingPull with the default flow control settings (1000 outstanding messages), existing subscribers hoard messages in their client buffer and gRPC stream buffer. When new subscribers connect (e.g., via Kubernetes HPA autoscaling), they receive zero messages because all messages are locked inside existing subscribers' memory.
Scaling to zero and scaling back up fixes the problem immediately — all subscribers receive messages evenly. This confirms the issue is with how messages are distributed to existing vs new subscribers.
Environment
parallelPullCount = 2,executorProviderThreadCount = 1, no explicitsetFlowControlSettings()Problem Description
Setup
What happens
What fixes it
Root Cause Analysis
1. Two buffers inside each subscriber pod
Based on the explanation by a Google engineer in googleapis/python-pubsub#237, there are two separate buffers inside each subscriber:
maxOutstandingElementCount(default: 1000)The Google engineer stated:
2. Message hoarding calculation
With a message size of ~1 KB:
With a lag of 75,000 messages, all messages fit inside the 5 existing pods' buffers. Zero messages are available for new pods.
3. Server routing favors existing subscribers
From Google's troubleshooting documentation:
Existing pods have ack history and appear to have high capacity (1000 outstanding messages, not constrained by flow control). New pods have zero history. The server keeps routing to existing pods.
4. Why the client buffer is the critical problem
Client buffer (1000 messages): Ack deadline is extended automatically by the client library. These messages are permanently locked to the pod until processed. With 1 thread and ~30 seconds per message, processing 1000 messages takes ~8 hours.
gRPC buffer (~20,000 messages): Ack deadline is NOT extended. These messages expire and return to the server. However, the server often re-delivers them to the same existing pods (because those pods appear to have higher capacity).
The combination of 1000 permanently-locked messages per pod + server routing preference = new subscribers are starved.
Workaround / Fix
Setting
maxOutstandingElementCountto a low value (e.g., 10) significantly improves message distribution:With
maxOutstandingElementCount = 10:However, the gRPC stream buffer still receives ~10,000 messages in the initial 10 MB packet (since there is no server-side flow control). These messages expire and return to the server, but this creates unnecessary churn.
Feature Request
Server-side flow control for StreamingPull
The Google engineer in #237 stated: "We currently don't support server-side flow control."
If the server respected the client's
maxOutstandingElementCountwhen sending messages (i.e., not sending more than the client's limit), it would:Documentation improvement
The current documentation does not clearly explain:
Adding a section to the StreamingPull documentation or troubleshooting guide about this scenario would help many users running Pub/Sub on Kubernetes with HPA.
References