A research-grade framework for non-intrusive, on-device affective computing using behavioral biometrics.
- Achieved ROC-AUC up to ~0.85 using Local Outlier Factor for anomaly-based fatigue detection
- Designed a real-time, on-device cognitive load estimation system using keystroke dynamics
- Demonstrated cross-user generalization through Leave-One-User-Out (LOUO) evaluation
- Implemented a privacy-preserving pipeline with zero key identity or text content storage
Background. Sustained cognitive load and mental fatigue are established antecedents of decision-making degradation, occupational errors, and reduced learning efficacy. Existing fatigue-monitoring systems predominantly rely on physiological sensors (EEG, fNIRS, ECG) that are intrusive, expensive, and impractical for naturalistic deployment.
Scientific Contribution. This work introduces FlowState, a privacy-preserving behavioural biometrics framework that estimates real-time cognitive load from keystroke dynamics — specifically, the statistical structure of inter-keystroke intervals (IKIs) — without recording key identities or text content. The primary contributions are:
- A seven-dimensional feature space for typing-rhythm characterisation, extending beyond mean and variance to include Coefficient of Variation, Shannon Entropy, Skewness, Excess Kurtosis, and Hjorth Mobility.
- A comparative anomaly-detection study benchmarking Isolation Forest, One-Class SVM, and Local Outlier Factor under identical feature representations, evaluated via ROC-AUC on pseudo-labelled data.
- A Privacy-by-Design architecture in which key identity is discarded at the hardware-event layer before persistence, enabling on-device inference with zero sensitive data exposure.
Significance. FlowState demonstrates that high-fidelity cognitive-state estimation is achievable through passive, non-intrusive instrumentation of typing behavior. The framework provides a reproducible baseline for Privacy-First Affective Computing research applicable to adaptive learning, human-computer interaction, and occupational health.
flowstate/
├── src/
│ ├── config.py # Hyperparameter registry — all params live here
│ ├── features.py # 7-feature extractor: CV, Entropy, Skewness, Hjorth, Kurtosis
│ ├── collector.py # Privacy-first pynput IKI collector
│ ├── train.py # Comparative study: IF vs OC-SVM vs LOF
│ ├── validate.py # Pseudo-label pipeline + ROC/AUC evaluation
│ ├── ground_truth.py # NASA-TLX labels, dual-task protocol, Cohen's κ
│ ├── multi_user.py # LOUO cross-validation, personalised evaluation
│ └── report.py # Auto-generates RESULTS.md from CSV outputs
├── data/ # Raw IKI CSVs and labelled feature matrices (gitignored)
├── models/ # Serialised sklearn Pipelines (gitignored)
├── reports/ # ROC curves, PR curves, comparison tables
└── README.md
pip install scikit-learn pynput pandas numpy scipy matplotlib joblib
# 1. Collect a keystroke session (press ESC to stop)
python src/collector.py
# 2. Label + generate first ROC curves (pseudo-labelling, no extra data needed)
python src/validate.py --session_csv data/features_live.csv --model_dir models/
# 3. Train all 3 models
python src/train.py
# 4. Re-evaluate with trained models
python src/validate.py --session_csv data/features_live.csv --model_dir models/
# 5. Generate final paper-ready report
python src/report.py --model_csv reports/model_comparison.csv --output_dir reports/| Feature | Description | Cognitive Load Sensitivity |
|---|---|---|
| Mean IKI | Average inter-keystroke interval | Increases under fatigue |
| Std IKI | Raw variability | Captures instability |
| Coefficient of Variation | Normalised variability (σ/μ) | Robust to typing speed differences |
| Skewness | Distribution asymmetry | Detects hesitation pauses |
| Shannon Entropy | Rhythm unpredictability | Increases with cognitive load |
| Hjorth Mobility | Temporal frequency estimate (EEG-derived) | Decreases under fatigue |
| Excess Kurtosis | Tail heaviness | Captures attentional lapses |
All features are extracted over a rolling window of 50 keystrokes. Each window produces one feature vector passed to the anomaly detector.
FlowState uses a combination of pseudo-labelling and experimental ground-truth strategies.
Time-block pseudo-labelling (default — no extra data required):
- First 33% of session → Rested (+1)
- Middle 34% → Excluded (ambiguous transition zone)
- Last 33% → Fatigued (−1)
Optional ground-truth strategies (research-grade):
- NASA-TLX subjective workload ratings collected every 10 minutes; windows labelled by Weighted Workload Score threshold
- Dual-task induced cognitive load protocol: alternating baseline and high-load (serial subtraction) phases with known labels
Validation metrics:
- ROC-AUC and Precision-Recall curves per model
- Cohen's κ for label agreement between labelling strategies (κ > 0.60 = substantial)
- Leave-One-User-Out (LOUO) cross-validation for cross-user generalisation
| Model | Precision | Recall | F1 | AUC-ROC |
|---|---|---|---|---|
| IsolationForest | 0.7143 | 0.7500 | 0.7317 | 0.8241 |
| OneClassSVM | 0.6522 | 0.7143 | 0.6818 | 0.7934 |
| LocalOutlierFactor | 0.7500 | 0.6818 | 0.7143 | 0.8512 |
LOUO Cross-Validation (N=10, LocalOutlierFactor):
| User | AUC-ROC | F1 |
|---|---|---|
| P01–P10 | 0.73–0.88 | 0.60–0.77 |
| Mean ± Std | 0.8031 ± 0.052 | 0.6891 ± 0.060 |
Full results generated automatically. Run
python src/report.pyto regenerate after new experiments.
All models are wrapped in a sklearn.Pipeline with a prepended StandardScaler. Training uses only presumed-normal (rested-state) windows, which is the correct protocol for one-class anomaly detection.
| Model | Key Hyperparameters | Complexity |
|---|---|---|
| IsolationForest | n_estimators=200, contamination=0.05 | O(n log n) |
| OneClassSVM | kernel=rbf, nu=0.05, gamma=scale | O(n² · d) |
| LocalOutlierFactor | n_neighbors=20, contamination=0.05, novelty=True | O(n² · d) |
Key identity is discarded at the hardware-event layer inside collector.py, before any data is written to disk. The system records only:
timestamp_s (float) | iki_ms (float)
No key characters, no text content, no n-gram sequences are ever stored. This constitutes a Privacy-by-Design implementation consistent with GDPR Article 25 (data protection by default), and is the architectural property that distinguishes FlowState from conventional keystroke-dynamics systems.
- Demonstrates feasibility of passive cognitive load estimation using keystroke dynamics alone
- Introduces a statistically rich, seven-dimensional feature representation for behavioral biometrics
- Provides a privacy-preserving alternative to sensor-based affective computing systems
- Establishes a reproducible experimental pipeline with convergent validation for unsupervised fatigue detection
- Applies Hjorth complexity analysis (previously EEG-domain) to keystroke timing signals
- Epp, C., Lippold, M., & Mandryk, R. L. (2011). Identifying emotional states using keystroke dynamics. Proc. ACM CHI.
- Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX. Advances in Psychology, 52, 139–183.
- Hjorth, B. (1970). EEG analysis based on time domain properties. Electroencephalography and Clinical Neurophysiology, 29(3), 306–310.
- Ackerman, P. L., & Kanfer, R. (2009). Test length and cognitive fatigue. Journal of Experimental Psychology: Applied, 15(2), 163–181.
- Liu, F. T., Ting, K. M., & Zhou, Z-H. (2008). Isolation forest. Proc. ICDM, 413–422.
- Revett, K. (2009). A survey of user authentication using keystroke dynamics. ICCSA.
All hyperparameters are centralised in src/config.py. Every experiment result is regenerable from the commands in the Quickstart section. Trained models are saved as joblib pipelines in models/; results tables are saved as CSVs in reports/.
random_state = 42
rolling_window_size = 50
contamination = 0.05
test_size = 0.25

