Skip to content

feat: Background job retry & monitoring system#861

Open
inventelpk-cell wants to merge 1 commit intorohitdash08:mainfrom
inventelpk-cell:feat/background-job-retry
Open

feat: Background job retry & monitoring system#861
inventelpk-cell wants to merge 1 commit intorohitdash08:mainfrom
inventelpk-cell:feat/background-job-retry

Conversation

@inventelpk-cell
Copy link
Copy Markdown

@inventelpk-cell inventelpk-cell commented Apr 13, 2026

Summary

Implements a complete background job retry and monitoring system for FinMind, closing #130.

  • BackgroundJob model with status tracking (pending/running/completed/failed), JSON payload, retry count, max retries, exponential backoff scheduling, and error capture
  • JobExecutionLog model for per-attempt audit trail with timestamps and full error tracebacks
  • Job executor engine with automatic retry on failure using exponential backoff (5s, 25s, 125s base^attempt), permanent failure after max retries, and a run_due_jobs() scheduler entry point
  • Pluggable handler registry — new job types added via @register_handler("type") decorator
  • Admin API endpoints (JWT-protected, ADMIN role required):
    • GET /admin/jobs — paginated listing with status and job_type filters
    • GET /admin/jobs/<id> — job detail with full execution history
    • POST /admin/jobs/<id>/retry — manual retry for permanently failed jobs
  • Reminder integrationsend_reminder job type wraps existing reminder delivery with automatic retry on failure
  • PostgreSQL schema migration for background_jobs and job_execution_logs tables with appropriate indexes
  • 18 pytest tests covering the execution engine, admin API, and reminder integration

Acceptance Criteria

  • BackgroundJob model with all required fields (id, job_type, status, payload, retry_count, max_retries, last_error, created_at, updated_at, next_retry_at)
  • Exponential backoff retry (5s, 25s, 125s)
  • Permanent failure after max retries exceeded
  • Per-attempt execution logging with timestamps and errors
  • Admin endpoints with JWT auth and ADMIN role check
  • Pagination and filtering on job list endpoint
  • Manual retry endpoint for failed jobs
  • Reminder sends wrapped in job framework
  • Database schema migration included
  • pytest test coverage (18 tests, all passing)
  • README updated with new endpoints and documentation

Test plan

  • Run pytest tests/test_jobs.py — all 18 tests pass
  • Verify existing test suite still passes with Docker Compose (sh scripts/test-backend.sh)
  • Manual smoke test: create a job via the service layer, verify retry behavior
  • Verify admin endpoints return 403 for non-admin users

- Add BackgroundJob and JobExecutionLog models with retry tracking
- Implement job executor with exponential backoff (5s, 25s, 125s)
- Add admin API endpoints for job listing, detail, and manual retry
- Wrap reminder sends in job framework for reliable delivery
- Add PostgreSQL schema migration for new tables
- Add 18 pytest tests covering execution engine, API, and integration
- Update README with new endpoints and job framework documentation

Closes rohitdash08#130
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants