Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 36 additions & 20 deletions hyperfleet/architecture/architecture-summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,17 +208,16 @@ cluster_statuses

**Why**:
- **Centralized Orchestration Logic**: Single component decides "when" to reconcile
- **Simple Max Age Strategy**: Time-based decisions using status.last_updated_time (updated on every adapter check)
- **Configurable Message Decision**: CEL-based decision logic with named params and boolean result expressions
- **Horizontal Scalability**: Sharding via label selectors (by region, environment, etc.)
- **Broker Abstraction**: Pluggable event publishers (GCP Pub/Sub, RabbitMQ, Stub)
- **Self-Healing**: Continuously retries without manual intervention

**Responsibilities**:
1. **Fetch Resources**: Poll HyperFleet API for resources matching shard selector
2. **Decision Logic**: Determine if resource needs reconciliation based on:
- `status.phase` (Ready vs Not Ready)
- `status.last_updated_time` (time since last adapter check)
- Configured max age intervals (10s for not-ready, 30m for ready)
- Generation check: `resource.generation > observedGeneration` triggers immediate reconciliation
- Configurable message decision with named params (CEL expressions) and a boolean result expression
3. **Event Creation**: Create reconciliation event with resource context
4. **Event Publishing**: Publish event to configured message broker
5. **Metrics & Observability**: Expose Prometheus metrics for monitoring
Expand All @@ -228,8 +227,15 @@ cluster_statuses
# sentinel-config.yaml (ConfigMap)
resource_type: clusters
poll_interval: 5s
max_age_not_ready: 10s
max_age_ready: 30m

message_decision:
params:
ref_time: 'conditionTime(resource, "Ready")'
is_ready: 'status(resource, "Ready") == "True"'
age_exceeded_ready: 'is_ready && now - timestamp(ref_time) > duration("30m")'
age_exceeded_not_ready: '!is_ready && now - timestamp(ref_time) > duration("10s")'
result: age_exceeded_ready OR age_exceeded_not_ready

resource_selector:
- label: region
value: us-east
Expand Down Expand Up @@ -259,14 +265,14 @@ data:
**Decision Algorithm**:
```
FOR EACH resource in FetchResources(resourceType, resourceSelector):
IF resource.status.phase != "Ready":
max_age = max_age_not_ready (10s)
IF resource.generation > observedGeneration:
PublishEvent(broker, CreateEvent(resource)) // immediate reconciliation
ELSE:
max_age = max_age_ready (30m)
Evaluate message_decision params (CEL expressions, dependency-ordered)
Evaluate result expression (combines params with AND/OR)

IF now >= resource.status.last_updated_time + max_age:
event = CreateEvent(resource)
PublishEvent(broker, event)
IF result == true:
PublishEvent(broker, CreateEvent(resource))
```

**Benefits**:
Expand Down Expand Up @@ -541,7 +547,7 @@ sequenceDiagram
DB-->>API: [cluster list]
API-->>Sentinel: [{id, status, ...}]

Note over Sentinel: Decision: phase != "Ready" &&<br/>last_updated_time + 10s < now
Note over Sentinel: Decision: generation ><br/>observedGeneration? OR<br/>message_decision result == true?

Sentinel->>Broker: Publish event<br/>{resourceType: "clusters",<br/>resourceId: "cls-123"}

Expand All @@ -564,12 +570,12 @@ sequenceDiagram
DB-->>API: ClusterStatus saved
API-->>Adapter: 201 Created

Note over Sentinel: Next poll cycle (10s later)
Note over Sentinel: Next poll cycle

Sentinel->>API: GET /clusters
API-->>Sentinel: [{id, status.last_updated_time = now(), ...}]

Note over Sentinel: Decision: Create event again<br/>(cycle continues for other adapters)
Note over Sentinel: Decision: evaluate message_decision<br/>(cycle continues for other adapters)

Sentinel->>Broker: Publish event
```
Expand Down Expand Up @@ -610,7 +616,7 @@ sequenceDiagram
DB-->>API: [cluster list with updated last_updated_time]
API-->>Sentinel: [{id, status, ...}]

Note over Sentinel: Decision: Create event<br/>if max-age expired
Note over Sentinel: Decision: generation ><br/>observedGeneration? OR<br/>message_decision result == true?
```

---
Expand Down Expand Up @@ -737,8 +743,13 @@ See [Status Guide](../docs/status-guide.md) for complete details on the status c
# sentinel-us-east-config.yaml (ConfigMap)
resource_type: clusters
poll_interval: 5s
max_age_not_ready: 10s
max_age_ready: 30m
message_decision:
params:
ref_time: 'conditionTime(resource, "Ready")'
is_ready: 'status(resource, "Ready") == "True"'
age_exceeded_ready: 'is_ready && now - timestamp(ref_time) > duration("30m")'
age_exceeded_not_ready: '!is_ready && now - timestamp(ref_time) > duration("10s")'
result: age_exceeded_ready OR age_exceeded_not_ready
resource_selector:
- label: region
value: us-east
Expand All @@ -758,8 +769,13 @@ message_data:
# sentinel-eu-west-config.yaml (ConfigMap)
resource_type: clusters
poll_interval: 5s
max_age_not_ready: 15s
max_age_ready: 1h
message_decision:
params:
ref_time: 'conditionTime(resource, "Ready")'
is_ready: 'status(resource, "Ready") == "True"'
age_exceeded_ready: 'is_ready && now - timestamp(ref_time) > duration("1h")'
age_exceeded_not_ready: '!is_ready && now - timestamp(ref_time) > duration("15s")'
result: age_exceeded_ready OR age_exceeded_not_ready
resource_selector:
- label: region
value: eu-west
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,9 @@ sequenceDiagram
S->>API: GET /api/hyperfleet/v1/clusters?labels=shard
API-->>S: List of clusters

Note over S: For each cluster:<br/>Check if requires event?<br/>(10s for Not Ready, 30m for Ready)
Note over S: For each cluster:<br/>Check generation, then evaluate<br/>message_decision result

S->>S: Evaluate: now >= lastEventTime + max_age
S->>S: Evaluate: message_decision params + result

alt Requires event
S->>B: Publish CloudEvent<br/>{resourceType: "clusters", resourceId: "cls-123"}
Expand Down
49 changes: 0 additions & 49 deletions hyperfleet/components/sentinel/sentinel-config.yaml

This file was deleted.

Loading