A project for training handwritten Chinese character recognition models from .gnt handwriting datasets.
- Convert
.gntfiles into grayscale PNG character images - Load generated PNG files directly as training data
- Apply default preprocessing during training: resize, float scaling, and normalization
- Train a CNN classifier and save a checkpoint with class mappings
- Python 3.10 or newer
- A virtual environment is recommended
Windows PowerShell:
python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install torch torchvision numpy pillowmacOS or Linux:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install torch torchvision numpy pillowThe Docker files are scaffolded, but the current container setup expects a populated requirements.txt. If you want to use Docker, update that file first.
- Place your raw
.gntfiles indata/raw/. - Run the converter on a directory or a specific
.gntfile.
Convert every .gnt file in data/raw/ into the default output directory:
python scripts/convert_gnt.py data/rawConvert a single file:
python scripts/convert_gnt.py data/raw/001-f.gntWrite images and the manifest to a custom directory:
python scripts/convert_gnt.py data/raw --output-dir .\convertedUseful flags:
--output-dirto choose where PNG files are written--manifest-pathto choose wheresamples.txtis written--globto control which files are matched when the input is a directory--append-manifestto append metadata instead of replacing it--verboseto print one line per converted sample
- The converter writes PNG files and metadata to the output directory used by the training pipeline.
Current output locations:
output/by defaultoutput/samples.txtby default
The training loader scans output/ first and choom/output/ as a fallback.
For meaningful training, convert multiple .gnt files. A single file can leave you with only one sample per class, which is not enough for useful validation.
Optional follow-up scripts:
python scripts/preprocess.pyprints a quick summary of the generated PNG dataset.python scripts/augment.py --copies 2creates anoutput_augmented/directory with originals plus augmented PNGs.- Train on augmented data with
python -m choom.training.train --img-dir .\output_augmented.
python -m choom.training.train --epochs 10 --batch-size 64 --device cpuIf you have CUDA available, use:
python -m choom.training.train --epochs 10 --batch-size 64 --device cudaUseful flags:
--img-dirto point at a specific generated image directory--epochsto control training length--batch-sizeto control memory usage--validation-splitto reserve part of the dataset for validation--output-pathto choose where the checkpoint is written--num-workersto increase DataLoader throughput
If each class only has one sample, use --validation-split 0. Once you have multiple samples per class, a value like 0.2 is reasonable.
- Scans the generated PNG files in
output/orchoom/output/. - Builds labels from each filename prefix, such as
b0a1. - Applies default image transforms:
- Resizes each image to
64x64. - Converts the image to
float32and scales pixel values to[0, 1]. - Normalizes the image with mean
0.5and standard deviation0.5. - Creates the
CharacterCNNmodel and trains it withCrossEntropyLossandAdamW. - Saves a checkpoint to
choom/output/character_cnn.ptunless you override the path.
One quick smoke test on CPU:
python -m choom.training.train --epochs 1 --batch-size 256 --validation-split 0 --device cpuA longer run after converting more .gnt files:
python -m choom.training.train --epochs 20 --batch-size 64 --validation-split 0.2 --device cudaTraining creates or updates these artifacts:
- Generated PNG samples in
output/by default - Sample metadata in
output/samples.txtby default - Model checkpoint in
choom/output/character_cnn.pt
MIT License