Skip to content

Ryaaan4321/ScalableBackend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

Job Import System — Architecture & Explanation

This document explains the complete job import system, including:

  • High-level architecture
  • Job import workflow
  • Cron + Queue + Worker system
  • Real-time updates with Socket.IO
  • Retry & concurrency logic
  • Import logs and monitoring
  • Sequence diagrams & flow diagrams (ASCII)

1. System Overview

This application imports jobs from multiple RSS feeds, normalizes the data, and stores them in MongoDB. It is built as a scalable background processing pipeline using:

  • Cron Jobs → Trigger imports every hour
  • Redis + BullMQ Queue → Task scheduling + retries + backoff
  • Workers → Process job imports in background
  • MongoDB → Store job entries & import logs
  • Socket.IO → Real-time status updates for dashboards
  • Exponential Backoff → Automatic retries on failure
  • Batch Processing & Concurrency → High performance imports

2. High-Level Architecture

┌────────────────────────────────────────────────────────────┐
│                     USER INTERFACE / ADMIN                 │
│         (Real-time import progress using Socket.IO)        │
└────────────────────────────────────────────────────────────┘
               ▲                              
               │ Socket Events                
               │                              
┌──────────────┴──────────────┐
│        Node.js Backend       │
│    (API Server + Cron Job)   │
└──────────────┬──────────────┘
               │ Adds Jobs to Queue
               ▼
┌────────────────────────────────────────────────────────────┐
│                   BullMQ Job Queue (Redis)                 │
│             Stores scheduled + retrying tasks             │
└────────────────────────────────────────────────────────────┘
               │  Worker Pulls Jobs
               ▼
┌────────────────────────────────────────────────────────────┐
│                     Worker Processor                       │
│  Parses RSS → Normalizes → Inserts/Updates into MongoDB    │
│  Emits Socket.IO Events → Logs import history              │
└────────────────────────────────────────────────────────────┘
               │ Saves Logs
               ▼
┌────────────────────────────────────────────────────────────┐
│                      MongoDB Database                      │
│     jobs collection + import_logs collection               │
└────────────────────────────────────────────────────────────┘

3. Core Workflow Explanation

Step 1 — Cron Trigger (Every 1 Hour)

A cron job runs every hour:

  • Loops through all RSS feed URLs
  • Adds each feed as a new job to BullMQ queue

Step 2 — Queue Schedules the Jobs

BullMQ stores each import job and:

  • Handles retries
  • Applies exponential backoff
  • Guarantees no data loss

Step 3 — Worker Processes Each Job

For each feed:

  1. Fetches RSS XML

  2. Parses using fast-xml-parser

  3. Normalizes the fields

  4. Upserts into MongoDB

  5. Tracks:

    • totalFetched
    • newJobs
    • updatedJobs
    • failedJobs
  6. Saves an import log record

Step 4 — Real-Time Updates

Worker emits events through Socket.IO:

  • import:started
  • import:completed
  • import:failed

Frontend admin dashboard listens and updates UI in real-time.


4. Detailed Flow Diagram

+-------------------------+
| Cron Job (every hour)  |
+-----------+-------------+
            |
            v
+-------------------------+
| Add feed job to Queue   |
+-----------+-------------+
            |
            v
+-------------------------+
| Redis (BullMQ Queue)    |
+-----------+-------------+
            |
            v
+-------------------------+
| Worker picks a job      |
+-----------+-------------+
            |
            v
+-------------------------+
| Fetch RSS XML           |
+-----------+-------------+
            |
            v
+-------------------------+
| Parse XML → JSON        |
+-----------+-------------+
            |
            v
+-------------------------+
| Normalize Item Data     |
+-----------+-------------+
            |
            v
+-------------------------+
| Upsert into MongoDB     |
+-----------+-------------+
            |
            v
+-------------------------+
| Record import_logs       |
+-----------+-------------+
            |
            v
+-------------------------+
| Emit Socket.IO events   |
+-------------------------+

5. Retry Logic & Exponential Backoff

BullMQ handles retry logic:

  • 5 retry attempts
  • Exponential backoff (2s → 4s → 8s → 16s → 32s)

This ensures:

  • Temporary feed outages don't break the flow
  • No manual intervention required

6. Batch Size & Concurrency

To optimize performance:

  • Batch size controls number of items processed at once
  • Concurrency controls number of parallel worker threads

Example environment variables:

JOB_BATCH_SIZE=50
JOB_MAX_CONCURRENCY=5

Worker processes multiple items per batch and multiple feeds in parallel.


7. Real-Time Notifications (Socket.IO)

Worker emits:

  • import:started → when a feed begins
  • import:completed → when stats are returned
  • import:failed → on errors

Admin dashboard listens and updates the UI live.


8. Import Logs (MongoDB)

Saved inside import_logs collection:

Fields:

  • feedUrl
  • timestamp
  • totalFetched
  • totalImported
  • newJobs
  • updatedJobs
  • failedJobs[]

Purpose:

  • Analytics
  • Monitoring feed health
  • Debugging issues

9. Error Handling Strategy

Worker Error handling:

  • Logs error
  • Adds error to failed jobs list
  • Job moved to retry queue

Feed Failure Handling:

  • Sends alert via Socket.IO
  • Logs reason

10. System Strengths

Scalable

Workers can be increased on multiple servers.

Fault-tolerant

Redis queue + retries = stable imports.

Real-time visibility

Admins see import events instantly.

Modular

Cron, imports, workers, logs are separated.

Configurable

Batch size, concurrency, and backoff controlled via env.


11. Sequence Diagram (ASCII)

Cron → Queue → Worker → MongoDB → Admin UI

Full diag:

Cron Scheduler        Queue (Redis)         Worker               MongoDB          Admin Dashboard
     |                     |                 |                       |                     |
     | addJob(feedUrl)     |                 |                       |                     |
     |-------------------->|                 |                       |                     |
     |                     |  store job      |                       |                     |
     |                     |---------------->|                       |                     |
     |                     |                 | process job           |                     |
     |                     |                 |---------------------->|                     |
     |                     |                 |                       | insert/update jobs  |
     |                     |                 |                       |-------------------->|
     |                     |                 | emit(import:completed)|                     |
     |                     |                 |--------------------------------------------->|

12. Conclusion

This system is:

  • scalable
  • fast
  • reliable
  • observable
  • real-time

It follows an enterprise-grade pipeline used in modern job aggregators, scrapers, and background ETL systems.

If you want, I can also add:

  • ER Diagram (MongoDB)
  • API documentation
  • Frontend dashboard UI layout
  • Deployment guide (Docker + PM2)
  • Monitoring via Grafana or Upstash

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors