Skip to content

deep12650/uk-ai-soc-normaliser

Repository files navigation

🇬🇧 uk-soc-normaliser

CI License Python

uk-soc-normaliser is an open-source Python library that maps free-text job titles and descriptions into official UK SOC 2020 occupation codes.

It enriches job text with normalised titles, skills, and seniority bands for use in recruitment platforms, ATS systems, labour market analysis, and job matching engines.


✨ Features

  • 🧹 Normalise messy job titles into standard UK SOC 2020 codes.
  • 🏷 Detect common seniority levels (junior, senior, lead, principal).
  • 🛠 Extract skill keywords (from ESCO/ONS skills lists).
  • 📊 Confidence scoring for downstream filtering.
  • 📦 Lightweight & Apache 2.0 licensed.

📦 Installation

git clone https://github.com/deep12650/uk-ai-soc-normaliser.git
cd uk-ai-soc-normaliser
pip install -e .

(PyPI release planned: pip install uk-soc-normaliser)


🚀 Quick Start

from uk_soc_normaliser import Normaliser

normaliser = Normaliser()

text = "Senior Software Engineer with Python and React"

result = normaliser.predict(text)

print(result.soc_code)          # e.g. "2136"
print(result.normalized_title)  # "Software Engineer"
print(result.skills)            # ["python", "react"]
print(result.seniority)         # "senior"
print(result.confidence)        # 0.7

Example output:

2136
Software Engineer
['python', 'react']
senior
0.7

📊 Data Sources

This project builds on open datasets:


🛠 Roadmap

  • Rule-based title + seniority mapping
  • Expand dictionary with full SOC 2020 titles/aliases
  • Add ESCO skill embeddings for fuzzy extraction
  • ML model (BERT/SBERT) for improved text → SOC prediction
  • Publish to PyPI

See Issues for planned features & contributions.


🧪 Development

Clone and install dev dependencies:

git clone https://github.com/deep12650/uk-ai-soc-normaliser.git
cd uk-ai-soc-normaliser
python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
pytest

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for setup, coding style, and PR process.


🔒 Security

If you discover a security vulnerability, please see SECURITY.md for reporting instructions.


📜 License

This project is licensed under the Apache License 2.0.

You may freely use, modify, and distribute this project in commercial and non-commercial applications. Please retain attribution in derived works.


💡 Part of the UK AI Toolkit

This library is part of the UK AI Toolkit series, alongside:

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

 
 
 

Contributors

Languages