Skip to content

Kiwi6212/JobHunter

Repository files navigation

JobHunter

License: MIT Python Flask

Automated job monitoring tool for work-study (alternance) opportunities in Systems & Network Administration. Aggregates multiple sources, filters offers by criteria, and provides an interactive tracking dashboard.


Overview

JobHunter is a self-hosted job search assistant that automates the tedious part of job hunting. It collects offers from official APIs, job boards, and company career pages, filters them against predefined criteria, and presents everything in a web dashboard with full application tracking.

Key principles:

  • Semi-automated — The tool finds and filters offers; the user decides when and where to apply
  • Privacy-first — Runs locally with SQLite, no data sent to external services (except Claude API for cover letter generation)
  • Multi-source — Aggregates France Travail API, Welcome to the Jungle, Indeed, and company career sites
  • Trackable — Built-in application tracker with status management, follow-up reminders, and statistics

What this tool does NOT do:

  • It does not send applications automatically
  • It does not log into user accounts on job platforms
  • It does not store sensitive data online

Features

Job Aggregation

  • Multi-source collection via official APIs and web scraping
  • Keyword-based filtering (title and description matching)
  • Location, contract type, and education level filters
  • Cross-source duplicate detection
  • Daily automated execution via scheduler

Application Tracker

  • Interactive tracking table with per-offer status management
  • Checkbox columns: CV sent, follow-up done
  • Date fields: date sent, follow-up date
  • Status workflow: NewAppliedFollowed upInterviewAccepted / Rejected / No response
  • Free-text notes per offer
  • Filters by status, source, company, and date range
  • Column sorting and full-text search
  • CSV export

Cover Letter Generation

  • AI-generated draft per offer via Anthropic Claude API
  • Personalized based on user CV + job description
  • Saved in database to avoid regeneration
  • One-click copy to clipboard

Statistics

  • Total offers found and new offers this week
  • CVs sent, follow-ups done, interviews obtained
  • Response rate tracking

Tech Stack

Layer Technology
Backend Flask with Python 3.11+
Database SQLite (lightweight, no server required)
Scraping Requests + BeautifulSoup
Advanced Scraping Selenium (JS-rendered sites)
Scheduler APScheduler
Frontend HTML/CSS/JS with Jinja2 + DataTables.js
AI Anthropic API (Claude)

Job Sources

Source Method Priority
France Travail Official REST API (OAuth2) High
Welcome to the Jungle Web scraping High
Company career sites (see below) Custom scrapers High
Indeed Web scraping Medium
LinkedIn Public listings scraping Low / Optional

Target Company Career Sites

Company Career Page Sector
Thales https://careers.thalesgroup.com Defense / Aerospace
Safran https://www.safran-group.com/fr/emplois Aerospace
Capgemini https://www.capgemini.com/fr-fr/carrieres IT Services
Sopra Steria https://www.soprasteria.com/rejoignez-nous IT Services
Atos / Eviden https://jobs.atos.net IT Services
Orange https://orange.jobs Telecom
Airbus https://www.airbus.com/en/careers Aerospace
CGI https://www.cgi.com/france/fr-fr/carrieres IT Services
Alten https://www.alten.com/rejoignez-nous IT Services
Bouygues Telecom https://www.bouyguestelecom.fr/groupe/recrutement Telecom

Search Criteria

Keywords

KEYWORDS = [
    "administrateur systèmes et réseaux",
    "administrateur systèmes",
    "administrateur réseaux",
    "admin sys",
    "admin réseau",
    "technicien systèmes et réseaux",
    "ingénieur systèmes",
    "ingénieur infrastructure",
    "technicien infrastructure",
    "technicien informatique",
    "administrateur infrastructure",
    "ingénieur réseaux",
    "sysadmin",
]

Filters

FILTERS = {
    "contract_type": "alternance",
    "location": "Île-de-France",
    "departments": ["75", "78", "91", "92", "93", "94", "95", "77"],
    "min_level": "bac+3",
    "max_level": "bac+5",
    "duration": "24 months",
}

Target Companies (Bonus Scoring)

Offers from major companies receive a higher relevance score:

TARGET_COMPANIES = [
    "Thales", "Safran", "Capgemini", "Sopra Steria", "Atos", "Eviden",
    "Orange", "Airbus", "CGI", "Alten", "Bouygues Telecom", "SFR",
    "Société Générale", "BNP Paribas", "AXA", "Engie", "EDF",
    "Dassault", "Naval Group", "SNCF", "RATP", "Renault", "PSA",
]

Architecture

┌─────────────────────────────────────────────────────┐
│                  SCHEDULER (APScheduler)             │
│               Daily execution at 8:00 AM             │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│                   COLLECTORS                        │
│                                                     │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│  │ France   │ │ Welcome  │ │  Career  │ │ Indeed │ │
│  │ Travail  │ │ to the   │ │  Sites   │ │        │ │
│  │  (API)   │ │ Jungle   │ │ (custom) │ │        │ │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └───┬────┘ │
└───────┼─────────────┼───────────┼────────────┼──────┘
        │             │           │            │
        ▼             ▼           ▼            ▼
┌─────────────────────────────────────────────────────┐
│                   FILTER ENGINE                     │
│                                                     │
│  Keywords · Location · Contract type · Dedup        │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│                 DATABASE (SQLite)                   │
│                                                     │
│  offers ──── tracking ──── cover_letter_drafts      │
└──────────────────────┬──────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────┐
│                WEB DASHBOARD (Flask)                │
│                                                     │
│  Offer table · Tracking · Filters · Stats · Export  │
│                                                     │
│  http://localhost:5000                              │
└─────────────────────────────────────────────────────┘

Project Structure

JobHunter/
├── app/
│   ├── __init__.py              # Flask initialization
│   ├── routes.py                # Dashboard routes
│   ├── models.py                # SQLite models (offers, tracking)
│   ├── database.py              # Database connection and init
│   ├── scrapers/
│   │   ├── __init__.py
│   │   ├── base_scraper.py      # Abstract base class
│   │   ├── france_travail.py    # France Travail API
│   │   ├── wttj.py              # Welcome to the Jungle
│   │   ├── indeed.py            # Indeed
│   │   ├── linkedin.py          # LinkedIn (optional)
│   │   └── career_sites/
│   │       ├── __init__.py
│   │       ├── thales.py
│   │       ├── safran.py
│   │       ├── capgemini.py
│   │       └── ...
│   ├── services/
│   │   ├── __init__.py
│   │   ├── filter_engine.py     # Offer filtering
│   │   ├── deduplication.py     # Duplicate detection
│   │   ├── cover_letter.py      # Claude API integration
│   │   └── scheduler.py         # Task scheduling
│   ├── templates/
│   │   ├── base.html
│   │   ├── dashboard.html
│   │   ├── offer_detail.html
│   │   └── stats.html
│   └── static/
│       ├── css/
│       │   └── style.css
│       └── js/
│           └── dashboard.js
├── data/
│   ├── jobhunter.db             # SQLite database (gitignored)
│   └── cv.txt                   # CV for cover letter generation
├── scripts/
│   ├── run_scrapers.py          # Manual scraper execution
│   └── init_db.py               # Database initialization
├── tests/
│   ├── test_scrapers.py
│   └── test_filters.py
├── .env.example
├── .gitignore
├── config.py
├── requirements.txt
├── ROADMAP.md
└── README.md

Installation

Prerequisites

Setup

git clone https://github.com/Kiwi6212/JobHunter.git
cd JobHunter

python -m venv venv
# Windows: venv\Scripts\activate
# macOS/Linux: source venv/bin/activate

pip install -r requirements.txt
cp .env.example .env

Edit .env with your API credentials:

FRANCE_TRAVAIL_CLIENT_ID=your_client_id
FRANCE_TRAVAIL_CLIENT_SECRET=your_client_secret
ANTHROPIC_API_KEY=your_api_key
FLASK_SECRET_KEY=a_random_secret_key
FLASK_DEBUG=true

Running

# Initialize the database
python scripts/init_db.py

# Run scrapers manually
python scripts/run_scrapers.py

# Launch the dashboard
python -m flask run

The dashboard is available at http://localhost:5000.


Backup & Restore

Automatic daily backup (cron)

scripts/backup.py copies data/jobhunter.db to a timestamped file in /home/ubuntu/backups/ and automatically deletes backups older than the 7 most recent ones.

Add this line to your crontab (crontab -e) to run the backup every night at 02:00:

0 2 * * * cd /home/ubuntu/JobHunter && /home/ubuntu/JobHunter/venv/bin/python scripts/backup.py >> /home/ubuntu/backups/backup.log 2>&1

Weekly dead link check (cron)

scripts/check_dead_links.py verifies offer URLs and marks dead links (HTTP 404/410, connection refused) as inactive. Only offers from the last 30 days are checked. Add this to your crontab to run every Sunday at 3:00 UTC:

0 3 * * 0 cd /home/ubuntu/JobHunter && /home/ubuntu/JobHunter/venv/bin/python scripts/check_dead_links.py >> /home/ubuntu/logs/dead_links.log 2>&1

Monthly inactive user cleanup (cron)

scripts/cleanup_inactive_users.py deletes non-admin user accounts that have been inactive for more than 90 days (no login), along with all associated data (tracking, documents, password resets). Add this to your crontab to run on the 1st of each month at 04:00 UTC:

0 4 1 * * cd /home/ubuntu/JobHunter && /home/ubuntu/JobHunter/venv/bin/python scripts/cleanup_inactive_users.py >> /home/ubuntu/logs/cleanup_users.log 2>&1

Weekly email digest (cron)

scripts/weekly_email.py sends a weekly email digest to subscribed users with the top 10 job offers (Match IA > 50%) from the last 7 days. Users can unsubscribe via a one-click link in the email or from their profile. Add this to your crontab to run every Monday at 09:00 UTC:

0 9 * * 1 cd /home/ubuntu/JobHunter && /home/ubuntu/JobHunter/venv/bin/python scripts/weekly_email.py >> /home/ubuntu/logs/weekly_email.log 2>&1

You can override the backup directory via the BACKUP_DIR environment variable:

BACKUP_DIR=/mnt/nas/backups python scripts/backup.py

Manual restore

scripts/restore.py restores a chosen backup over the live database. It creates a pre-restore safety copy before overwriting.

python scripts/restore.py /home/ubuntu/backups/jobhunter_20260308_020000.db

The script prompts for explicit confirmation (YES) before any data is overwritten.


Development Roadmap

Phase 1 — Foundations

  1. Project setup (folder structure, dependencies, configuration)
  2. Database models (offers and tracking tables)
  3. Basic Flask dashboard with empty table

Phase 2 — First Source

  1. France Travail API integration (OAuth2, search, parsing)
  2. Filter engine (keywords, location, contract type)
  3. Offer display in dashboard table

Phase 3 — Tracking

  1. Interactive columns (checkboxes, date fields, status dropdown)
  2. AJAX persistence (save changes without page reload)
  3. Filters and column sorting

Phase 4 — Additional Sources

  1. Welcome to the Jungle scraper
  2. Career site scrapers (Thales, Safran, then others)
  3. Cross-source deduplication

Phase 5 — Intelligence

  1. Cover letter generation (Claude API)
  2. Relevance scoring based on profile match

Phase 6 — Automation & Polish

  1. APScheduler for daily execution
  2. Statistics dashboard header
  3. CSV export
  4. Indeed scraper
  5. LinkedIn scraper (optional)
  6. UI/UX improvements

See ROADMAP.md for detailed progress.


Maintenance & Error Pages

Maintenance mode (Nginx)

To enable maintenance mode without stopping the app, create a flag file:

touch /home/ubuntu/JobHunter/maintenance_on

To disable maintenance mode:

rm /home/ubuntu/JobHunter/maintenance_on

Nginx configuration

Add the following to your Nginx server block to serve custom error pages and enable maintenance mode:

# --- Maintenance mode ---
# If the flag file exists, return 503 for all requests
if (-f /home/ubuntu/JobHunter/maintenance_on) {
    return 503;
}

# --- Custom error pages ---
error_page 502 /static/error.html;
error_page 503 /static/maintenance.html;

location = /static/maintenance.html {
    root /home/ubuntu/JobHunter/app;
    internal;
}

location = /static/error.html {
    root /home/ubuntu/JobHunter/app;
    internal;
}
  • 502 (Bad Gateway) — served when Gunicorn/Flask is down or unresponsive
  • 503 (Service Unavailable) — served when maintenance mode is enabled
  • 404 and 500 — handled by Flask with custom templates (templates/404.html, templates/500.html)

The internal directive ensures these pages are only served by Nginx error handling, not directly accessible via URL.


Security

  • API keys — Stored in .env, never committed to version control
  • Database — Local SQLite file excluded from Git
  • Scraping — Respects robots.txt, includes delays between requests, realistic user-agent headers
  • Rate limiting — Built-in delays to avoid IP blocking
  • Personal data — CV stored locally only, transmitted exclusively to Claude API for cover letter generation

License

MIT License — see LICENSE for details.


Credits

Mathias QuillateauGitHub · LinkedIn

Code assisted by Claude Code (Anthropic).

About

Automated monitoring tool for work-study opportunities in Systems & Network Administration. Aggregates multiple sources (France Travail, WTTJ, career sites) and provides a dashboard for tracking applications.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors