Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions alembic/versions/b2c52ee8ff12_add_ingestion_status.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
"""Add ingestion status

Revision ID: b2c52ee8ff12
Revises: 9e9a4a7cd639
Create Date: 2026-05-11 16:16:03.768893

"""

from typing import Sequence, Union

import sqlalchemy as sa

from alembic import op

# revision identifiers, used by Alembic.
revision: str = "b2c52ee8ff12"
down_revision: Union[str, Sequence[str], None] = "28bee3aa2429"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
"""Upgrade schema."""
conn = op.get_bind()
dialect = conn.dialect.name
if dialect == "postgresql":
op.execute(
"CREATE TYPE ingestionstatus AS ENUM ('QUEUED', 'COPYING', 'COPIED', "
"'VALIDATING', 'VALIDATED', 'COMPLETED', 'COPY_FAILED', "
"'VALIDATION_FAILED')"
)
with op.batch_alter_table("simulations", schema=None) as batch_op:
batch_op.add_column(
sa.Column(
"ingestion_status",
sa.Enum(
"QUEUED",
"COPYING",
"COPIED",
"VALIDATING",
"VALIDATED",
"COMPLETED",
"COPY_FAILED",
"VALIDATION_FAILED",
name="ingestionstatus",
),
nullable=True,
)
)
batch_op.add_column(sa.Column("ingestion_version", sa.Integer(), nullable=True))
op.execute(
"UPDATE simulations SET ingestion_status = 'COMPLETED' WHERE ingestion_status "
"IS NULL"
)
op.execute(
"UPDATE simulations SET ingestion_version = 0 WHERE ingestion_version IS NULL"
)
with op.batch_alter_table("simulations", schema=None) as batch_op:
batch_op.alter_column("ingestion_status", nullable=False)
batch_op.alter_column("ingestion_version", nullable=False)


def downgrade() -> None:
"""Downgrade schema."""
with op.batch_alter_table("simulations", schema=None) as batch_op:
batch_op.drop_column("ingestion_version")
batch_op.drop_column("ingestion_status")
conn = op.get_bind()
dialect = conn.dialect.name
if dialect == "postgresql":
op.execute("DROP TYPE ingestionstatus")
77 changes: 77 additions & 0 deletions docs/celery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Celery async task processing

SimDB uses [Celery](https://docs.celeryproject.org/) to run asynchronous background
tasks such as copying simulation files and completing the ingestion pipeline.

## Overview

When simulations are uploaded via the REST API, the server offloads heavy operations
to Celery workers instead of blocking the HTTP request. Tasks are defined in
`src/simdb/workers/tasks.py`:

- `copy_files_task` — copies input/output files from source locations to the server's
upload folder and updates the simulation's ingestion status.
- `complete_ingestion_task` — marks a simulation as fully ingested.
- `validate_imas_task` — runs validation checks on IMAS data (placeholder).
- `send_email_task` — sends email notifications.

Tasks can be chained in the API endpoint:

```python
copy_files = copy_files_task.si(simulation.uuid, ...)
complete = complete_ingestion_task.si(simulation.uuid)
_ = (copy_files | complete).apply_async()
```

## Configuration

Celery is configured via `app.cfg`:

| Section | Option | Required | Description |
|---------|----------------|----------|--------------------------------------------------|
| celery | broker_url | no | Redis URL for the message broker. Defaults to `redis://localhost:6379/0` |
| celery | result_backend | no | Redis URL for results storage. Defaults to `redis://localhost:6379/0` |

Example:

```ini
[celery]
broker_url = redis://localhost:6379/0
result_backend = redis://localhost:6379/0
```

## Running workers

### Standalone worker

Start a Celery worker using the built-in CLI:

```bash
simdb_celery worker
```

### Worker with beat scheduler

For periodic tasks (e.g. cleanup, reports), run both the worker and beat:

```bash
# Terminal 1: worker
simdb_celery worker

# Terminal 2: beat scheduler
simdb_celery beat
```

### Flower monitoring

[Flower](https://flower.readthedocs.io/) provides a web UI for monitoring Celery
workers and tasks:

```bash
celery -A simdb.workers.celery flower --port=5555
```

## Testing with eager mode

In tests, set `task_always_eager = True` to run tasks synchronously without a
broker.
16 changes: 16 additions & 0 deletions docs/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,22 @@ simdb_server

This will start a server on port 5000. You can test this server is running by opening http://localhost:5000 in a browser.

## Running Celery workers

For development, you typically want to run Celery tasks synchronously. This is
enabled by setting `task_always_eager = True` in tests (see `tests/remote/api/v1.3/test_simulations3.py`).

To run actual background workers during development:

```bash
# Worker
simdb_celery worker

# Beat scheduler (if needed)
simdb_celery beat
```

See the [Celery documentation](celery.md) for full details.
## Swagger API documentation

SimDB provides interactive Swagger API documentation for each API version. The documentation is automatically generated and accessible at different endpoints depending on the API version you want to explore.
Expand Down
6 changes: 6 additions & 0 deletions docs/maintenance_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,12 @@ service nginx restart

You should now be able to check the simdb server is running by going to the http address defined in your nginx site (localhost:80 in the example above).

## Celery background workers

SimDB uses Celery to run asynchronous background tasks such as copying simulation
files. See the [Celery documentation](celery.md) for details on configuration and
running workers.

#### Nginx Request Entity Size

You may need to increase the size of uploaded files that Nginx will accept. For SimDB this should be at least 100MB.
Expand Down
10 changes: 8 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -92,13 +92,19 @@ build-docs = [
postgres = [
"psycopg2-binary>=2.8.0",
]
celery = [
"celery>=5.3.0",
"redis>=5.0.0",
]
all = [
"imas-simdb[server, imas-validator, postgres]"
"imas-simdb[server, imas-validator, postgres, celery]",
]

[project.scripts]
simdb = "simdb.cli.simdb:main"
simdb_server = "simdb.remote.wsgi:run"
simdb_worker = "simdb.workers.cli:worker"
simdb_beat = "simdb.workers.cli:beat"

[project.urls]
Homepage = "https://simdb.iter.org/dashboard/"
Expand Down Expand Up @@ -168,5 +174,5 @@ dev = [
"pytest-cov>=5.0.0",
"ruff~=0.15.0",
"ty==0.0.34",
"imas-simdb[server, imas-validator, postgres, auth]"
"imas-simdb[server, imas-validator, postgres, auth, celery]"
]
Loading
Loading