This repository contains the code used to run controlled split-head teacher/student experiments on MNIST and EMNIST. The implementation is intentionally lightweight: each call to src/run_subliminal.py performs one experiment, writes tidy CSV/JSONL outputs, and exposes the experimental controls through command-line flags. Multi-seed manuscript sweeps are provided as shell scripts under scripts/.
subliminal_learning/
├── README.md # This file
├── LICENSE # BSD 3-Clause license
├── subliminal-cpu.yml # Conda environment for CPU execution
├── src/
│ ├── run_subliminal.py # Command-line entry point; one run per invocation
│ └── subliminal_core.py # Models, data, training, perturbations, metrics, logging
└── scripts/
├── figure2/ # Multi-seed m-sweeps over layer-initialisation controls
├── figure3/ # Architecture and EMNIST class-count sweeps
├── figure4/ # Shared m/N-sweeps
├── figure5/ # Student-head perturbation sweeps
└── figure6/ # Hidden-dimension and head-freezing sweeps
The scripts/*/output and scripts/*/outputs directories, if present, are generated run outputs or example output stubs. They can be deleted and regenerated by rerunning the corresponding scripts.
Create and activate the supplied CPU environment:
conda env create -f subliminal-cpu.yml
conda activate subliminal-cpuThe environment specification is:
name: subliminal-cpu
channels:
- conda-forge
dependencies:
- python=3.10
- pip
- numpy=2.2
- pytorch-cpu=2.8
- torchvision=0.23Equivalent mamba commands may also be used. GPU execution is supported by the code through --device cuda, but the included environment and scripts are CPU-oriented.
Run a single default MNIST MLP→MLP experiment from the repository root:
python src/run_subliminal.py --device cpu --outdir ./outputs/baselineThis uses the default configuration: seed 42, two hidden layers of width 256, auxiliary dimension m=10, 5 teacher epochs, 5 student epochs, 60 synthetic-noise batches per student epoch, and uniform noise. The first run downloads MNIST into ./MNIST_DATA unless --data-dir is changed.
To inspect the full command-line interface:
python src/run_subliminal.py --helpThe implementation follows a controlled split-head setup:
- A teacher and a student are instantiated with separate feature extractors and two output heads.
class_headoutputs classification logits and is used for supervised teacher training.aux_headoutputs auxiliary logits and is used for student distillation on task-unrelated synthetic noise.- The teacher is trained on labeled MNIST or EMNIST with cross-entropy loss.
- The student is trained to match the teacher auxiliary logits on synthetic noise using mean-squared error.
- The student class head receives no direct supervised gradient during the distillation phase unless made trainable and indirectly affected through shared features.
- Initial/final teacher and student weights are snapshotted and compared layer-wise.
src/subliminal_core.py contains the reusable components:
| Component | Purpose |
|---|---|
ExperimentConfig |
Central dataclass for all experiment settings. |
SplitHeadMLP |
MLP feature extractor with class_head and aux_head. |
SplitHeadCNN |
Configurable CNN feature extractor with split heads. |
| Data utilities | MNIST/EMNIST loading, optional class truncation, deterministic dataloader seeds. |
| Noise utilities | Uniform, Gaussian, and Perlin-noise sampling. |
| Training utilities | Teacher cross-entropy training and student auxiliary-MSE distillation. |
| Perturbation utilities | Layer-wise additive Gaussian perturbations at named experiment timings. |
| Logging utilities | Per-run, per-layer, perturbation, and config output files. |
Each invocation appends to a seed-specific output directory:
<outdir>/seed_000042/runs.csv
<outdir>/seed_000042/layer_metrics.csv
<outdir>/seed_000042/perturbations.csv
<outdir>/seed_000042/configs.jsonl
| File | Contents |
|---|---|
runs.csv |
One row per experiment, including configuration, seeds, parameter counts, teacher/student accuracies and losses, auxiliary MSE, and selected compact layer-metric aliases. |
layer_metrics.csv |
Per-layer tensor comparisons between teacher/student and initial/final snapshots. Metrics include cosine similarity, norm differences, relative differences, shapes, and status fields for missing or mismatched layers. |
perturbations.csv |
Perturbation diagnostics. Empty unless perturbations are requested. |
configs.jsonl |
Full effective configuration, resolved layer initialisation/trainability configs, architecture descriptions, and derived seed bookkeeping. |
With --append-global, aggregate files are also appended under <outdir>:
<outdir>/runs_all_seeds.csv
<outdir>/layer_metrics_all_seeds.csv
<outdir>/perturbations_all_seeds.csv
Use global appends for sequential sweeps. For many concurrent jobs, prefer per-seed outputs unless the filesystem safely handles concurrent appends.
The shell scripts in scripts/ are multi-seed sweeps. They use relative paths such as ../../src/run_subliminal.py, so run each script from its own directory:
cd scripts/figure2
bash same_init.shDo not run them from the repository root as bash scripts/figure2/same_init.sh unless you first modify the relative paths.
Most scripts use 20 seeds, seq 0 19. They are intended for figure-level experiments and can take substantially longer than the quick-start run.
All Figure 2 scripts run MNIST MLP→MLP sweeps over
m = 3, 10, 25, 50, 100, 250
seeds = 0, ..., 19
teacher hidden dims = 256,256
student hidden dims = 256,256
noise = uniform
For a two-hidden-layer MLP, positional layer configs follow
fc1, fc2, class_head, aux_head
| Script | Student initialisation condition |
|---|---|
same_init.sh |
A,A,A,A: all student layers share source A with the teacher. |
rand_first_hid.sh |
random,A,A,A: random student fc1. |
rand_sec_hid.sh |
A,random,A,A: random student fc2. |
rand_both_hid.sh |
random,random,A,A: random student hidden layers. |
rand_class_head.sh |
A,A,random,A: random student class_head. |
rand_aux_head.sh |
A,A,A,random: random student aux_head. |
| Script | Sweep |
|---|---|
emnist_sweep.sh |
EMNIST balanced class-count sweep with K=2,...,47, m=50, random hidden layers, shared heads. |
student_first_layer_d_sweep_m10.sh |
MNIST sweep over the first student hidden width with m=10. |
student_first_layer_d_sweep_m50.sh |
Same first-layer-width sweep with m=50. |
student_minus_one_layer_m10.sh |
Teacher has two hidden layers; student has one hidden layer; m=10. |
student_minus_one_layer_m50.sh |
Same one-layer-smaller student setting with m=50. |
student_plus_one_layer_m10.sh |
Teacher has two hidden layers; student has three hidden layers; m=10. |
student_plus_one_layer_m50.sh |
Same one-layer-larger student setting with m=50. |
mlp_teacher_cnn_student_m10.sh |
Cross-architecture sweep with a MLP teacher and CNN student; m=10. |
mlp_teacher_cnn_student_m50.sh |
Same MLP-teacher/CNN-student setting with m=50. |
The first-layer-width scripts use:
D1 = 8, 11, 16, 23, 32, 45, 64, 91, 128, 181, 256, 362,
512, 724, 1024, 1448, 2048, 2896, 4096
| Script | Sweep |
|---|---|
noise_m_sweep.sh |
MNIST sweep over networks with different auxiliary head sizes m and number of noise steps noise-steps |
m1_noise_sweep.sh |
Same sweep but only with a single auxiliary neuron (m=1) |
noise1_m_sweep.sh |
Same sweep but with a fixed budget of noise steps (N=10^3) |
| Script | Perturbed layer |
|---|---|
perturb_student_aux_head.sh |
Student aux_head. |
perturb_student_class_head.sh |
Student class_head. |
Both scripts use MNIST MLP→MLP with all layers initialised from source A, all layers trainable, m=10, and perturb the selected student head immediately before student training:
perturbation std = 40 linearly spaced values from 0.0 to 0.2
seeds = 0, ..., 19
timing = before_student_training
include_weight = true
include_bias = true
Figure 6 scripts: hidden-dimension and head-freezing sweeps
All Figure 6 scripts sweep the second hidden-layer width D:
D = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 16, 23, 32, 45, 64,
91, 128, 181, 256, 362, 512, 724, 1024, 1448, 2048, 2896,
4096, 5793, 8192, 11585
seeds = 0, ..., 19
teacher hidden dims = 256,D
student hidden dims = 256,D
noise = uniform
| Script | Dataset / condition |
|---|---|
d_sweep_baseline.sh |
MNIST baseline with m=10, all layers trainable. |
d_sweep_baseline_emnist.sh |
EMNIST baseline with m=50, all layers trainable. |
d_sweep_fixed_class_head.sh |
MNIST, class heads frozen for teacher and student. |
d_sweep_fixed_aux_head.sh |
MNIST, auxiliary heads frozen for teacher and student. |
d_sweep_fixed_aux_class_head.sh |
MNIST, both class and auxiliary heads frozen for teacher and student. |
| Flag | Default | Meaning |
|---|---|---|
--outdir |
./outputs |
Root directory for generated outputs. |
--data-dir |
./MNIST |
Dataset storage directory. |
--seed |
42 |
Base seed. Deterministic derived seeds are used internally for model initialisation, training, dataloader shuffling, perturbations, and noise evaluation. |
--num-workers |
0 |
Number of PyTorch dataloader workers. |
--device |
auto |
Device string: auto, cpu, cuda, cuda:0, etc. |
--sweep-name |
default |
Free-form sweep label written to outputs. |
--run-label |
empty | Additional free-form label written to outputs. |
--append-global |
off | Also append to aggregate CSV files under --outdir. |
| Flag | Default | Meaning |
|---|---|---|
--dataset |
mnist |
Dataset: mnist or emnist. |
--emnist-split |
balanced |
EMNIST split: balanced or letters. Ignored for MNIST. |
--class-count |
None |
If set, keep only labels 0,...,K-1 after any EMNIST target transform and remap them to zero-based labels. |
--class-selection |
first |
Class-truncation rule. Currently only first is implemented. |
MNIST has 10 classes. EMNIST balanced has 47 classes. EMNIST letters has 26 classes after shifting labels to zero-based indexing.
| Flag | Default | Meaning |
|---|---|---|
--teacher-type |
mlp |
Teacher architecture: mlp or cnn. |
--student-type |
mlp |
Student architecture: mlp or cnn. |
--teacher-hidden-dims |
256,256 |
Comma-separated teacher MLP hidden widths. For the default CNN, the last value sets the final linear feature dimension. |
--student-hidden-dims |
256,256 |
Same for the student. |
--teacher-arch-spec |
None |
JSON string or path to a JSON list describing teacher CNN feature layers. Ignored for MLP. |
--student-arch-spec |
None |
JSON string or path to a JSON list describing student CNN feature layers. Ignored for MLP. |
--m |
10 |
Auxiliary head output dimension. |
MLP layer names are stable and ordered as:
fc1, fc2, ..., class_head, aux_head
CNN layer names are taken from the architecture spec, followed by:
class_head, aux_head
If no CNN architecture spec is supplied, the default CNN feature extractor is:
conv1: 32 channels, 3x3, padding 1, ReLU, max-pool 2
conv2: 64 channels, 3x3, padding 1, ReLU, max-pool 2
fc1: final linear feature layer
A CNN spec can be passed either as a JSON string or as a path to a JSON file. Example:
[
{
"name": "conv1",
"type": "conv2d",
"out_channels": 32,
"kernel_size": 3,
"padding": 1,
"activation": "relu",
"pool": {"type": "max", "kernel_size": 2}
},
{
"name": "fc1",
"type": "linear",
"out_features": 256,
"activation": "relu"
}
]Supported CNN feature-layer types are conv2d and linear. Supported activations are relu, gelu, tanh, sigmoid, and identity-style values such as none. Supported pools include max pooling, average pooling, and adaptive average pooling.
| Flag | Default | Meaning |
|---|---|---|
--teacher-init |
all A |
Per-layer teacher initialisation source. |
--student-init |
all A |
Per-layer student initialisation source. |
--teacher-trainable |
all true |
Per-layer teacher trainability. |
--student-trainable |
all true |
Per-layer student trainability. |
Initialisation sources:
A deterministic source A
B deterministic source B
random keep the model's independently seeded random initialisation
Layer configs may be positional:
--teacher-init A,A,A,A
--student-trainable true,true,true,trueor named:
--teacher-init fc1:A,fc2:A,class_head:A,aux_head:A
--student-trainable all:trueThe named form accepts all:<value> as a default override. Named configs are recommended whenever teacher and student architectures have different numbers of layers.
| Flag | Default | Meaning |
|---|---|---|
--teacher-epochs |
5 |
Number of supervised teacher epochs. |
--student-epochs |
5 |
Number of student distillation epochs. |
--data-bsize |
1024 |
Batch size for supervised data loaders. |
--noise-bsize |
1000 |
Batch size for synthetic-noise samples during student training. |
--noise-steps |
60 |
Number of noise batches per student epoch. Samples per student epoch are noise_bsize × noise_steps. |
--teacher-lr |
1e-3 |
Teacher Adam learning rate. |
--student-lr |
1e-3 |
Student Adam learning rate. |
| Flag | Default | Meaning |
|---|---|---|
--noise-dist |
uniform |
Synthetic noise distribution: uniform, normal, or perlin. |
--perlin-res |
8 |
Perlin grid resolution when --noise-dist perlin is used. |
--normalize-noise |
off | Apply MNIST normalisation constants to synthetic noise. |
--eval-noise-batches |
10 |
Number of noise batches for auxiliary-MSE evaluation. |
--eval-noise-bsize |
1000 |
Batch size for auxiliary-MSE evaluation. |
Uniform noise is sampled in [-1, 1]. Normal noise is sampled from a standard Gaussian. Perlin noise is generated on 28×28 images and scaled by its per-sample maximum absolute value. Dataset images are normalised with MNIST constants, mean 0.1307 and standard deviation 0.3081.
Perturbations can be specified with repeatable shorthand flags:
--perturb "student:aux_head,std=0.1,timing=before_student_training,include_weight=true,include_bias=true"or with a JSON object/list:
--perturb-spec '[{"target":"student","layers":["aux_head"],"std":0.1,"timing":"before_student_training"}]'Available perturbation fields:
| Field | Meaning |
|---|---|
target |
teacher or student. |
layers |
Layer name, list of layer names, all, or shorthand such as fc1+aux_head. |
std / sigma |
Standard deviation of the additive Gaussian perturbation. |
timing |
When to apply the perturbation. |
include_weight |
Whether to perturb layer weights. |
include_bias |
Whether to perturb layer biases. |
distribution |
Currently normal. |
Allowed timings:
before_teacher_training
after_teacher_training
before_student_training
after_student_training
Change only the seed:
python src/run_subliminal.py --device cpu --seed 123 --outdir ./outputs/seed_123_testUse a larger auxiliary dimension:
python src/run_subliminal.py --device cpu --m 100 --outdir ./outputs/m100Use EMNIST balanced with the first 20 classes:
python src/run_subliminal.py \
--device cpu \
--dataset emnist \
--emnist-split balanced \
--class-count 20 \
--data-dir ./EMNIST_DATA \
--outdir ./outputs/emnist_k20Randomise the student hidden layers but keep the heads shared:
python src/run_subliminal.py \
--device cpu \
--teacher-init A,A,A,A \
--student-init random,random,A,A \
--outdir ./outputs/random_student_featuresFreeze both student heads during distillation:
python src/run_subliminal.py \
--device cpu \
--student-trainable fc1:true,fc2:true,class_head:false,aux_head:false \
--outdir ./outputs/frozen_student_headsUse Perlin noise:
python src/run_subliminal.py \
--device cpu \
--noise-dist perlin \
--perlin-res 8 \
--outdir ./outputs/perlin_res8Perturb the student auxiliary head before student training:
python src/run_subliminal.py \
--device cpu \
--outdir ./outputs/perturb_aux \
--perturb "student:aux_head,std=0.1,timing=before_student_training,include_weight=true,include_bias=true"- Each invocation performs exactly one run.
- The base
--seedis expanded into deterministic derived seeds for model initialisation, dataloader shuffling, teacher training, student noise training, perturbations, and noise evaluation. - The code records resolved configurations and derived seed bookkeeping in
configs.jsonl. - Linear and convolutional layers use PyTorch initialisation unless reset through deterministic sources
AorB. - The sweep scripts do not perform plotting or bootstrap aggregation. They generate CSV/JSONL files intended for downstream analysis.
- If a shell script cannot be executed directly, run it via
bash script_name.shor make it executable withchmod +x script_name.sh. - Run figure scripts from their own directory so that their relative paths resolve correctly.
- The first MNIST/EMNIST run may spend additional time downloading data.
- For quick implementation checks, prefer a single command such as
python src/run_subliminal.py --device cpu --outdir ./outputs/smoke_testbefore launching a full multi-seed sweep.
This repository is distributed under the BSD 3-Clause license; see LICENSE.