Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ CI builds, publishes the npm package with provenance, and pushes Docker tags `1.

The bottleneck is the Postgres database, not this server. For production load, point `PG_CONN` at multiple read replicas — the server fans queries across them and recovers automatically as hosts come and go. A recent benchmark on a 12-core / 32 GB box (API + Postgres co-located) sustained ~800 req/s with p99 latency of 39 ms. Use `npm run benchmark` to size your own deployment.

## Deployment

Reference Kubernetes and production Docker Compose manifests — with liveness/readiness probes, resource limits, autoscaling, and a hardened pod security context — live in [`deploy/`](./deploy/). Read [`docs/security.md`](./docs/security.md) for the deployment contract (TLS gateway, read-only DB role, private Postgres).

## Contributing

- AI coding agents: read [`AGENTS.md`](./AGENTS.md) first.
Expand Down
45 changes: 45 additions & 0 deletions deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Reference deployment artifacts

Opinionated starting points for running the Archive Node API in production. They
are references to adapt, not turnkey configs — review image tags, sizing, and
secret management for your environment. Read [`docs/security.md`](../docs/security.md)
first for the deployment contract (TLS gateway, read-only DB role, private
Postgres).

## Kubernetes — [`kubernetes.yaml`](./kubernetes.yaml)

A `Deployment` + `Service` + `HorizontalPodAutoscaler` (and a placeholder
`Secret`) with the production defaults baked in:

- **Liveness** probe on `/healthcheck` (process up) and **readiness** probe on
`/readiness` (database reachable) — a node with a dead DB stops receiving
traffic without being restarted.
- **Resource** requests/limits and a 2→6 replica HPA on CPU.
- Hardened pod: non-root, `readOnlyRootFilesystem`, `allowPrivilegeEscalation:
false`, all capabilities dropped, `RuntimeDefault` seccomp.
- Prometheus scrape annotations pointing at `/metrics`.
- `terminationGracePeriodSeconds: 30` to match the app's graceful-shutdown drain.

```sh
# edit the Secret's PG_CONN (use a read-only role) and the image tag first
kubectl apply -f deploy/kubernetes.yaml
```

Put a TLS-terminating Ingress/gateway in front (it must set `X-Forwarded-For`
for per-client rate limiting) — see [`docs/security.md`](../docs/security.md).

## Docker Compose — [`docker-compose.prod.yml`](./docker-compose.prod.yml)

Runs only the published image against an external Postgres (contrast with the
repo-root `docker-compose.yml`, which is for local dev with a bundled DB).

```sh
PG_CONN='postgres://archive_api_ro:...@db:5432/archive' \
docker compose -f deploy/docker-compose.prod.yml up -d
```

## Sizing

The bottleneck is Postgres, not this server; point `PG_CONN` at read replicas for
throughput. See the benchmark note in the root [`README.md`](../README.md#hardware-requirements)
and use `npm run benchmark` to size your own deployment.
22 changes: 22 additions & 0 deletions deploy/docker-compose.prod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Production-shaped Compose for the Archive Node API alone (bring your own
# Postgres). Unlike the repo-root docker-compose.yml — which stands up Postgres +
# Jaeger for local development — this runs only the published image against an
# external, read-only archive database. See deploy/README.md and docs/security.md.
services:
archive-node-api:
# Pin a specific version in production rather than :latest.
image: ghcr.io/o1-labs/archive-node-api:latest
restart: unless-stopped
ports:
- '8080:8080'
environment:
# Required. Point at a read-only Postgres role (see docs/security.md).
PG_CONN: ${PG_CONN:?set PG_CONN to your archive-node Postgres connection string}
PORT: '8080'
# Restrict cross-origin access — leave unset for same-origin only.
# CORS_ORIGIN: 'https://app.example.com'
# Resource caps (Compose v2).
cpus: 1.0
mem_limit: 512m
# The image ships a HEALTHCHECK against /healthcheck; Compose surfaces it as
# the container health status.
123 changes: 123 additions & 0 deletions deploy/kubernetes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Reference Kubernetes manifest for the Archive Node API.
#
# This is a starting point, not a turnkey production deploy — review the image
# tag, replica count, resource sizing, and secret management for your cluster.
# See deploy/README.md and docs/security.md for the full deployment contract.
---
apiVersion: v1
kind: Secret
metadata:
name: archive-node-api
type: Opaque
stringData:
# Point at a read-only Postgres role (see docs/security.md). Replace before use.
PG_CONN: 'postgres://archive_api_ro:CHANGE_ME@postgres:5432/archive'
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: archive-node-api
labels:
app: archive-node-api
spec:
replicas: 2
selector:
matchLabels:
app: archive-node-api
template:
metadata:
labels:
app: archive-node-api
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
prometheus.io/path: /metrics
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: archive-node-api
# Pin a specific version in production rather than :latest.
image: ghcr.io/o1-labs/archive-node-api:latest
ports:
- containerPort: 8080
env:
- name: PORT
value: '8080'
- name: PG_CONN
valueFrom:
secretKeyRef:
name: archive-node-api
key: PG_CONN
# Restrict cross-origin access (see docs/security.md). Leave unset for
# same-origin only, or set an explicit allowlist.
# - name: CORS_ORIGIN
# value: 'https://app.example.com'
resources:
requests:
cpu: '250m'
memory: '256Mi'
limits:
cpu: '1'
memory: '512Mi'
# Liveness: process is up. Readiness: the database is reachable.
livenessProbe:
httpGet:
path: /healthcheck
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /readiness
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# Give in-flight requests time to drain on rollout (matches the app's
# graceful-shutdown window).
terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: archive-node-api
labels:
app: archive-node-api
spec:
type: ClusterIP
selector:
app: archive-node-api
ports:
- name: http
port: 80
targetPort: 8080
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: archive-node-api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: archive-node-api
minReplicas: 2
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Loading