Built a Resume Matching Engine that:
- Normalizes noisy resume skill data from 10 Indian university students
- Computes TF-IDF vectors for resumes
- Builds binary vectors for 3 Job Descriptions from Korean tech companies
- Calculates cosine similarity between resumes and JDs
- Outputs the Top 3 matching candidates per JD
- Split raw skills on commas
- Convert to lowercase
- Match multi-word phrases before single tokens (sorted by length descending)
- Apply
SKILL_ALIASESmapping exactly as provided - Discard tokens not present in the alias map
- Each canonical skill appears only once per resume
- Built from normalized + deduplicated resume skills only
- Sorted alphabetically (48 unique terms)
TF = 1 / N (N = total unique skills in resume)
IDF = ln(10 / df) (natural log, no smoothing)
TF-IDF = TF × IDF
- 1 if skill present in JD, 0 if not
- Built over same shared vocabulary
Cosine(A, B) = (A · B) / (|A| × |B|)
- A = Resume TF-IDF vector
- B = JD binary vector
- Top 3 ranked per JD, ties broken alphabetically
JD-1 — Kakao (ML Engineer)
Sneha Patel(0.57), Karan Mehta(0.53), Arjun Sharma(0.40)
JD-2 — Naver (Backend Engineer)
Rahul Gupta(0.81), Ananya Krishnan(0.28), Deepika Rao(0.19)
JD-3 — Line (Frontend Engineer)
Aditya Kumar(0.67), Priya Nair(0.58), Ananya Krishnan(0.35)
| Item | Detail |
|---|---|
| Language | Python |
| Libraries | Standard library only (math) |
| External Libraries | None (not allowed) |
| AI Tool Used | Redrob AI |
├── resume_matcher.py # Main solution file
└── README.md # This file
Google Colab:
- Open colab.research.google.com
- Create a new notebook
- Paste
resume_matcher.pycontents into a cell - Press
Shift + Enterto run
VS Code / Terminal:
python resume_matcher.pyNo installations needed — uses Python standard library only.
- ✅ Only standard library used (
math) — no numpy, pandas, sklearn - ✅ SKILL_ALIASES used exactly as provided, not modified
- ✅ Multi-word phrases matched before single tokens
- ✅ Vocabulary built from resume skills only
- ✅ TF-IDF computed for resumes only, not JDs
- ✅ IDF = ln(10/df), natural log, no smoothing
- ✅ Cosine similarity uses Euclidean norm of resume vector
- ✅ Scores rounded to 2 decimal places
- ✅ Ties broken alphabetically by candidate name
Redrob AI Campus Hackathon · Powered by McKinley Rice