We want to create a resume-vacancy macher based on NLP embeddings and other ML technologiess with the following API
- Given a vacancy, create a list of resumes which match it the best
- Extending a resumes DB implies updating the model weights
By now the following goals are set:
- Get a resume-vacancy database with response history.
- Support different resume/vacancy formats (pdf, docx, etc) and create a proper parser for each of the formats which probably be able to parse such features as highlighted words, headers, etc.
- For the unified resume / vacancy models (for all the formats) create an embedding vector.
- Use a baseline StarSpace embeddings
- Learn the function
Resumes x Vacancies -> R, where the output is considered to be a measure of how well do the resume and the vacancy match each other. - Clusterize the resumes according to the industry domain
There are two notebook files src/run.ipynb and src/clusterization.ipynb. First one is about the main objective while the second is about clustering, which can be done without labeled resume-vacancy dataset.