uk-soc-normaliser is an open-source Python library that maps free-text job titles and descriptions into official UK SOC 2020 occupation codes.
It enriches job text with normalised titles, skills, and seniority bands for use in recruitment platforms, ATS systems, labour market analysis, and job matching engines.
- 🧹 Normalise messy job titles into standard UK SOC 2020 codes.
- 🏷 Detect common seniority levels (junior, senior, lead, principal).
- 🛠 Extract skill keywords (from ESCO/ONS skills lists).
- 📊 Confidence scoring for downstream filtering.
- 📦 Lightweight & Apache 2.0 licensed.
git clone https://github.com/deep12650/uk-ai-soc-normaliser.git
cd uk-ai-soc-normaliser
pip install -e .(PyPI release planned: pip install uk-soc-normaliser)
from uk_soc_normaliser import Normaliser
normaliser = Normaliser()
text = "Senior Software Engineer with Python and React"
result = normaliser.predict(text)
print(result.soc_code) # e.g. "2136"
print(result.normalized_title) # "Software Engineer"
print(result.skills) # ["python", "react"]
print(result.seniority) # "senior"
print(result.confidence) # 0.7Example output:
2136
Software Engineer
['python', 'react']
senior
0.7
This project builds on open datasets:
- ONS SOC 2020 – occupational codes & titles.
- ESCO skills taxonomy – skill descriptors.
- Sample UK job ads (public domain or synthetic).
- Rule-based title + seniority mapping
- Expand dictionary with full SOC 2020 titles/aliases
- Add ESCO skill embeddings for fuzzy extraction
- ML model (BERT/SBERT) for improved text → SOC prediction
- Publish to PyPI
See Issues for planned features & contributions.
Clone and install dev dependencies:
git clone https://github.com/deep12650/uk-ai-soc-normaliser.git
cd uk-ai-soc-normaliser
python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
pytestWe welcome contributions! Please see CONTRIBUTING.md for setup, coding style, and PR process.
If you discover a security vulnerability, please see SECURITY.md for reporting instructions.
This project is licensed under the Apache License 2.0.
You may freely use, modify, and distribute this project in commercial and non-commercial applications. Please retain attribution in derived works.
This library is part of the UK AI Toolkit series, alongside: