StreamingPull: new subscribers get zero messages - existing subscribers hoard entire backlog with default flow control

## Summary

When using StreamingPull with the default flow control settings (1000 outstanding messages), existing subscribers hoard messages in their client buffer and gRPC stream buffer. When new subscribers connect (e.g., via Kubernetes HPA autoscaling), they receive **zero messages** because all messages are locked inside existing subscribers' memory.

Scaling to zero and scaling back up fixes the problem immediately — all subscribers receive messages evenly. This confirms the issue is with how messages are distributed to existing vs new subscribers.

## Environment

- **Client library:** google-cloud-pubsub (Java)
- **Subscription type:** StreamingPull
- **Infrastructure:** GKE (Google Kubernetes Engine) with HPA (Horizontal Pod Autoscaler)
- **Processing:** Single-threaded message processing (1 executor thread), each message triggers a large file download (~5 MB), so processing is slow (~30 seconds per message)
- **Configuration:** `parallelPullCount = 2`, `executorProviderThreadCount = 1`, no explicit `setFlowControlSettings()`

## Problem Description

### Setup
- 5 subscriber pods running continuously (HPA min replicas = 5, max = 50)
- Each pod uses StreamingPull with default flow control (1000 messages outstanding)
- Each pod processes 1 message at a time (single threaded)

### What happens
1. A burst of messages arrives (e.g., 75,000 messages)
2. The 5 existing pods receive all messages into their buffers
3. HPA detects high lag and scales to 50 pods
4. **The new 45 pods connect but receive ZERO messages**
5. Only the original 5 pods process messages
6. Lag keeps increasing despite 50 pods running

### What fixes it
- Scale to 0 (all pods die, all messages released)
- Scale back to 50
- All 50 pods now receive messages evenly
- Lag drops to 0 within 15-30 minutes

## Root Cause Analysis

### 1. Two buffers inside each subscriber pod

Based on the explanation by a Google engineer in [googleapis/python-pubsub#237](https://github.com/googleapis/python-pubsub/issues/237), there are **two separate buffers** inside each subscriber:

| Buffer | Location | Size | Ack deadline extended? |
|--------|----------|------|----------------------|
| **Client buffer** | Inside the application (RAM) | Controlled by `maxOutstandingElementCount` (default: 1000) | **Yes** - client library extends automatically |
| **gRPC stream buffer** | Between gRPC layer and application (RAM) | Server sends messages in packets up to **10 MB** | **No** - messages can expire here |

The Google engineer stated:
> *"Unless the flow control is being applied at the gRPC level, messages will be buffered in the gRPC stream, where more than the specified number of messages can be present due to the Pub/Sub server sending out messages in packets ≤ 10 MiB in size. We currently don't support server-side flow control."*

### 2. Message hoarding calculation

With a message size of ~1 KB:

| What | Calculation | Result |
|------|------------|--------|
| Messages in one 10 MB gRPC packet | 10,000 KB / 1 KB | ~10,000 messages |
| gRPC buffer per pod (2 streams) | 2 × 10,000 | ~20,000 messages |
| Client buffer per pod (default) | 1000 | 1,000 messages |
| **Total per pod** | 20,000 + 1,000 | **~21,000 messages** |
| **Total for 5 pods** | 5 × 21,000 | **~105,000 messages** |

With a lag of 75,000 messages, all messages fit inside the 5 existing pods' buffers. **Zero messages are available for new pods.**

### 3. Server routing favors existing subscribers

From [Google's troubleshooting documentation](https://cloud.google.com/pubsub/docs/pull-troubleshooting):

> *"The system dynamically directs more messages towards consumers that demonstrate higher capacity — those that acknowledge messages quickly and are not constrained by their flow control settings."*

> *"Messages could be outstanding to clients already, and a backlog of unacknowledged messages doesn't necessarily mean you'll receive those messages on your next pull request."*

Existing pods have ack history and appear to have high capacity (1000 outstanding messages, not constrained by flow control). New pods have zero history. The server keeps routing to existing pods.

### 4. Why the client buffer is the critical problem

- **Client buffer (1000 messages):** Ack deadline is extended automatically by the client library. These messages are **permanently locked** to the pod until processed. With 1 thread and ~30 seconds per message, processing 1000 messages takes **~8 hours**.

- **gRPC buffer (~20,000 messages):** Ack deadline is NOT extended. These messages expire and return to the server. However, the server often re-delivers them to the same existing pods (because those pods appear to have higher capacity).

The combination of 1000 permanently-locked messages per pod + server routing preference = new subscribers are starved.

## Workaround / Fix

Setting `maxOutstandingElementCount` to a low value (e.g., 10) significantly improves message distribution:

```java
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
    .setFlowControlSettings(
        FlowControlSettings.newBuilder()
            .setMaxOutstandingElementCount(10L)
            .setLimitExceededBehavior(FlowController.LimitExceededBehavior.Block)
            .build()
    )
    .build();
```

With `maxOutstandingElementCount = 10`:
- Each pod locks only 10 messages (instead of 1000) with extended ack deadlines
- Pods become "constrained by flow control" almost instantly
- Server routing shifts to new pods that are "not constrained"
- New subscribers receive messages

**However**, the gRPC stream buffer still receives ~10,000 messages in the initial 10 MB packet (since there is no server-side flow control). These messages expire and return to the server, but this creates unnecessary churn.

## Feature Request

### Server-side flow control for StreamingPull

The Google engineer in [#237](https://github.com/googleapis/python-pubsub/issues/237) stated: *"We currently don't support server-side flow control."*

If the server respected the client's `maxOutstandingElementCount` when sending messages (i.e., not sending more than the client's limit), it would:

1. Prevent gRPC buffer flooding (no unnecessary 10 MB packets)
2. Reduce message expiry and re-delivery churn
3. Improve message distribution to new subscribers immediately
4. Make Pub/Sub work better with Kubernetes HPA autoscaling

### Documentation improvement

The current documentation does not clearly explain:

1. The two-buffer model (client buffer + gRPC stream buffer)
2. How default flow control (1000 messages) causes message hoarding with slow processors
3. The impact on Kubernetes HPA autoscaling (new pods getting zero messages)
4. The recommended flow control settings for slow-processing subscribers

Adding a section to the [StreamingPull documentation](https://cloud.google.com/pubsub/docs/pull) or [troubleshooting guide](https://cloud.google.com/pubsub/docs/pull-troubleshooting) about this scenario would help many users running Pub/Sub on Kubernetes with HPA.

## References

1. [googleapis/python-pubsub#237](https://github.com/googleapis/python-pubsub/issues/237) - Google engineer confirms two-buffer behavior
2. [Troubleshooting pull subscriptions](https://cloud.google.com/pubsub/docs/pull-troubleshooting) - Server routing based on flow control constraints
3. [Testing Cloud Pub/Sub clients](https://cloud.google.com/blog/products/data-analytics/testing-cloud-pubsub-clients-to-maximize-streaming-performance) - Default flow control is 1000 messages


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamingPull: new subscribers get zero messages - existing subscribers hoard entire backlog with default flow control #2715

Summary

Environment

Problem Description

Setup

What happens

What fixes it

Root Cause Analysis

1. Two buffers inside each subscriber pod

2. Message hoarding calculation

3. Server routing favors existing subscribers

4. Why the client buffer is the critical problem

Workaround / Fix

Feature Request

Server-side flow control for StreamingPull

Documentation improvement

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Buffer	Location	Size	Ack deadline extended?
Client buffer	Inside the application (RAM)	Controlled by `maxOutstandingElementCount` (default: 1000)	Yes - client library extends automatically
gRPC stream buffer	Between gRPC layer and application (RAM)	Server sends messages in packets up to 10 MB	No - messages can expire here

What	Calculation	Result
Messages in one 10 MB gRPC packet	10,000 KB / 1 KB	~10,000 messages
gRPC buffer per pod (2 streams)	2 × 10,000	~20,000 messages
Client buffer per pod (default)	1000	1,000 messages
Total per pod	20,000 + 1,000	~21,000 messages
Total for 5 pods	5 × 21,000	~105,000 messages

StreamingPull: new subscribers get zero messages - existing subscribers hoard entire backlog with default flow control #2715

Description

Summary

Environment

Problem Description

Setup

What happens

What fixes it

Root Cause Analysis

1. Two buffers inside each subscriber pod

2. Message hoarding calculation

3. Server routing favors existing subscribers

4. Why the client buffer is the critical problem

Workaround / Fix

Feature Request

Server-side flow control for StreamingPull

Documentation improvement

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions