strutopy

Is a Python-package focussing on the Structural Topic Model and machine-assisted reading of large text corpora. The implementation in Python aims for computational efficiency as well as ease-of-use.

Structural Topic Model (Roberts et al. 2014) can be used to extend the classical topic modelling approaches by including text metadata on a document level. The meta information can be introduced to the estimation procedure two-fold, via:

topical content covariates that shape the word usage within topics
topical prevalence covariates that shape the frequency of topic occurences.

Package Structure

The packages consists of three main parts:

Text Reading for various filetypes (*.csv, *.json)
Text Preparation

Pre-processing
- Stopword-Removal
- Stemming
- Dropping Documents
- Removing Punctuation
- n-gram algorithm
Corpus creation
- list of documents containing word indices and their count
- vector of words associated with the indices
- metadata matrix with document covariates

Model Estimation

Spectral Initialisation
Topical Prevalence Model
- interaction terms, standard transforms and non-linear relations, such as splines
Topical Content Model

Model Evaluation

Semantic Coherence Measure: Goodness-of-topics depends on whether most probable words in a given topic frequently co-occur together
Exclusivity: Word-exclusivity on a topic level
FREX: harmonic mean for semantic coherence and exclusivity

Visualisation

Corpus visualisations:
- wordclouds
- word frequencies
- tf-idf -> t-sne visualisation
Estimate visualisation:
- Metadata estimates can be visualised w.r.t. their effect on the expected topic proportions as well as on the topical content
- Visualisation of identified topics and their distances in a topic - graph

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github/workflows		.github/workflows
archive		archive
img		img
notebooks		notebooks
src		src
tests		tests
undefined		undefined
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
script.sh		script.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

strutopy

Package Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

strutopy

Package Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages