Resume Matching Engine

Redrob AI Campus Hackathon — Individual Competition

Problem Statement

Built a Resume Matching Engine that:

Normalizes noisy resume skill data from 10 Indian university students
Computes TF-IDF vectors for resumes
Builds binary vectors for 3 Job Descriptions from Korean tech companies
Calculates cosine similarity between resumes and JDs
Outputs the Top 3 matching candidates per JD

How It Works

Step 1 — Skill Normalization

Split raw skills on commas
Convert to lowercase
Match multi-word phrases before single tokens (sorted by length descending)
Apply SKILL_ALIASES mapping exactly as provided
Discard tokens not present in the alias map

Step 2 — Deduplication

Each canonical skill appears only once per resume

Step 3 — Vocabulary Construction

Built from normalized + deduplicated resume skills only
Sorted alphabetically (48 unique terms)

Step 4 — TF-IDF Vectors (Resumes only)

TF  = 1 / N               (N = total unique skills in resume)
IDF = ln(10 / df)         (natural log, no smoothing)
TF-IDF = TF × IDF

Step 5 — JD Binary Vectors

1 if skill present in JD, 0 if not
Built over same shared vocabulary

Step 6 — Cosine Similarity & Ranking

Cosine(A, B) = (A · B) / (|A| × |B|)

A = Resume TF-IDF vector
B = JD binary vector
Top 3 ranked per JD, ties broken alphabetically

Results

JD-1 — Kakao (ML Engineer)
Sneha Patel(0.57), Karan Mehta(0.53), Arjun Sharma(0.40)

JD-2 — Naver (Backend Engineer)
Rahul Gupta(0.81), Ananya Krishnan(0.28), Deepika Rao(0.19)

JD-3 — Line (Frontend Engineer)
Aditya Kumar(0.67), Priya Nair(0.58), Ananya Krishnan(0.35)

Tech Stack

Item	Detail
Language	Python
Libraries	Standard library only (`math`)
External Libraries	None (not allowed)
AI Tool Used	Redrob AI

File Structure

├── resume_matcher.py   # Main solution file
└── README.md           # This file

How to Run

Google Colab:

Open colab.research.google.com
Create a new notebook
Paste resume_matcher.py contents into a cell
Press Shift + Enter to run

VS Code / Terminal:

python resume_matcher.py

No installations needed — uses Python standard library only.

Rules Followed

✅ Only standard library used (math) — no numpy, pandas, sklearn
✅ SKILL_ALIASES used exactly as provided, not modified
✅ Multi-word phrases matched before single tokens
✅ Vocabulary built from resume skills only
✅ TF-IDF computed for resumes only, not JDs
✅ IDF = ln(10/df), natural log, no smoothing
✅ Cosine similarity uses Euclidean norm of resume vector
✅ Scores rounded to 2 decimal places
✅ Ties broken alphabetically by candidate name

Redrob AI Campus Hackathon · Powered by McKinley Rice

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
redrobai.ipynb		redrobai.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Matching Engine

Redrob AI Campus Hackathon — Individual Competition

Problem Statement

How It Works

Step 1 — Skill Normalization

Step 2 — Deduplication

Step 3 — Vocabulary Construction

Step 4 — TF-IDF Vectors (Resumes only)

Step 5 — JD Binary Vectors

Step 6 — Cosine Similarity & Ranking

Results

Tech Stack

File Structure

How to Run

Rules Followed

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Resume Matching Engine

Redrob AI Campus Hackathon — Individual Competition

Problem Statement

How It Works

Step 1 — Skill Normalization

Step 2 — Deduplication

Step 3 — Vocabulary Construction

Step 4 — TF-IDF Vectors (Resumes only)

Step 5 — JD Binary Vectors

Step 6 — Cosine Similarity & Ranking

Results

Tech Stack

File Structure

How to Run

Rules Followed

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages