AI-powered synthetic defect image generation for industrial quality control
Training AI defect detectors in manufacturing requires a large volume of defect images — but real production lines generate defects only rarely. This data scarcity makes it difficult to build reliable inspection models.
This project addresses that gap by using generative AI to synthesize defect images , augmenting training datasets where real examples are limited.
Image → [VQGAN] → Tokens → [MaskGIT] → Filled Tokens → [VQGAN] → Synthesized Image
VQGAN (Tokenizer) — Compresses an image into 256 discrete tokens (like breaking a photo into a 256-piece puzzle)
MaskGIT (Generator) — Fills in masked tokens conditioned on a defect class (e.g., "scratches"), learning the visual style of each defect type
Inpainting — Replaces a region of a normal image with the generated defect patch
Component
Description
Role
LlamaGen VQGAN
Image tokenizer with 8-dim codebook
Image ↔ Token conversion
Halton-MaskGIT
Low-discrepancy mask scheduling (ICLR 2025)
Token → Image generation
Classifier-Free Guidance
Blends conditional / unconditional outputs
Generation quality
Adaptive GAN Loss
Auto-balances reconstruction vs. GAN loss
VQGAN training stability
Dataset
Images
Source
NEU-DET
1,440
Kaggle
SD-saliency-900
900
Kaggle
X-SDD
319
Kaggle
Total
2,659
8× augmentation → 21,272 samples
Class
Description
crazing
Fine web-like surface cracks
inclusion
Foreign particles embedded in the surface
patches
Irregular blotchy regions
pitted_surface
Small pits or craters
rolled-in_scale
Linear streaks from the rolling process
scratches
Linear scratch marks
Codebook: 16,384 tokens, 8-dim embeddings
Downsampling: 16× (256×256 → 16×16 = 256 tokens)
Fine-tuning result: Edge IoU +10.6%
Parameters: ~69M (Small config)
Architecture: 12 layers, 8 heads, 512 hidden dim
Features: AdaLayerNorm, SwiGLU FFN, QK Norm, Weight Tying
Metal-Defect-Synthesis/
├── V0/ # PoC notebooks (Colab-ready)
│ ├── metal_defect_synthesis(PoCFinal).ipynb # VQGAN fine-tuning
│ ├── metal_defect_HaltonMaskGIT(PoCFinal).ipynb # MaskGIT training
│ ├── metal_defect_gradio_demo_LlamaGen_Halton(PoCFinal).ipynb # Demo
│ └── Metal_Defect_Synthesis_PRD_v2_0.pdf # Technical spec
│
├── src/metal_defect_synthesis/ # Modularized Python package
│ ├── models/ # Model architectures
│ │ ├── layers.py # Transformer building blocks
│ │ ├── maskgit.py # MaskGIT Transformer
│ │ └── vqgan_wrapper.py # VQGAN wrapper
│ ├── data/ # Data pipeline
│ │ ├── dataset.py # Dataset class
│ │ ├── augmentation.py # 8× data augmentation
│ │ └── token_cache.py # Token cache builder
│ ├── training/ # Training modules
│ │ ├── vqgan_trainer.py # VQGAN training logic
│ │ ├── maskgit_trainer.py # MaskGIT training logic
│ │ └── scheduler.py # LR scheduler
│ ├── sampling/ # Inference
│ │ ├── halton.py # Halton sequence
│ │ ├── sampler.py # Image sampler
│ │ └── inpainting.py # Defect synthesis
│ └── utils/ # Utilities
│ ├── image.py # Image transforms
│ ├── seed.py # Seed management
│ └── metrics.py # Evaluation metrics
│
├── app/gradio_demo.py # Gradio demo UI
├── scripts/ # CLI entry points
│ ├── train_vqgan.py
│ ├── train_maskgit.py
│ └── generate.py
├── configs/ # YAML configs
│ ├── vqgan.yaml
│ ├── maskgit.yaml
│ └── inference.yaml
└── docs/
└── portfolio_narrative.md
Run on Google Colab (Recommended)
git clone https://github.com/LimPark996/Metal-Defect-Synthesis.git
cd Metal-Defect-Synthesis
pip install -r requirements.txt
# Training
python scripts/train_vqgan.py --config configs/vqgan.yaml
python scripts/train_maskgit.py --config configs/maskgit.yaml
# Image generation
python scripts/generate.py --class scratches --num 10
Component
Status
Notes
VQGAN Fine-tuning
✅ Done
Edge IoU +10.6%
MaskGIT Training
🔄 Converging
Loss 6.77 (target: ~4.0)
Gradio Demo
✅ Live
Generation quality needs improvement
Code Modularization
✅ Done
src/ package structure
Insufficient MaskGIT training data (21K samples vs. recommended 1M+)
Limited inter-class visual variation in generated outputs
Texture consistency needs improvement
Near-term (current architecture)
Version
Date
Changes
v2.1
2026-04-01
Code modularization, config management, README rewrite
v2.0
2024-12-12
Migrated to LlamaGen VQGAN + Halton-MaskGIT
v1.0
2024-12-11
Initial draft (taming VQGAN + custom MaskGIT)