Skip to content

feat: add resilient background job retry and monitoring#863

Open
Mohye24k wants to merge 1 commit intorohitdash08:mainfrom
Mohye24k:fix/issue-130-job-retry
Open

feat: add resilient background job retry and monitoring#863
Mohye24k wants to merge 1 commit intorohitdash08:mainfrom
Mohye24k:fix/issue-130-job-retry

Conversation

@Mohye24k
Copy link
Copy Markdown

Summary

/claim #130

What this PR does

  • JobManager with exponential backoff retry
  • Job statuses: pending, running, success, failed, retrying, dead
  • Dead letter queue for exhausted jobs
  • Monitoring: GET /jobs/stats, /jobs/recent, /jobs/dead-letter, /jobs/:id
  • @with_retry decorator
  • 12 test cases

How to test

pytest tests/test_jobs.py -v

Fixes #130

- JobManager with exponential backoff retry (configurable max_retries, base_delay, backoff_factor)
- Job status tracking: pending, running, success, failed, retrying, dead
- Dead letter queue for exhausted jobs
- Monitoring endpoints: GET /jobs/stats, /jobs/recent, /jobs/dead-letter, /jobs/:id
- @with_retry decorator for wrapping any function
- 12 test cases covering success, retry, dead letter, stats, backoff, decorator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Mohye24k
Copy link
Copy Markdown
Author

Proof of Functionality

Architecture

  • JobManager class with configurable retry parameters
  • Exponential backoff: delay = base_delay * (backoff_factor ^ attempt), capped at max_delay
  • Status flow: pending -> running -> success OR running -> retrying -> ... -> dead

Endpoints

GET /jobs/stats - total_submitted, total_succeeded, total_failed, total_retries, active_jobs, dead_letter_count
GET /jobs/recent - last 20 jobs sorted by creation time
GET /jobs/dead-letter - all failed jobs that exhausted retries
GET /jobs/:id - single job details
POST /jobs/dead-letter/clear - flush the dead letter queue

Test Coverage (12 tests)

  1. test_job_success - single attempt success
  2. test_job_retry_then_success - fails twice, succeeds third
  3. test_job_dead_letter - exhausts retries, enters DLQ
  4. test_stats - correct counters
  5. test_recent_jobs - ordered by recency
  6. test_get_job - lookup by ID
  7. test_clear_dead_letter - DLQ flush
  8. test_backoff_delay - exponential with cap
  9. test_with_retry_decorator - function decorator
  10. test_with_retry_exhausted - raises after max retries
  11. test_stats_endpoint - integration test
  12. test_jobs_requires_auth - 401 check

@rohitdash08

@Mohye24k
Copy link
Copy Markdown
Author

Live API Demo

GET /jobs/stats

\n
JobManager with exponential backoff verified via unit tests:

  • Job succeeds on first attempt
  • Job retries 2x then succeeds on 3rd
  • Job exhausts retries and enters dead letter queue
  • Backoff delay: 1.0 -> 2.0 -> 4.0 -> 8.0 (capped at max_delay)

All 12 tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resilient background job retry & monitoring

1 participant