General-purpose Python image crawler — Selenium-based scraper that downloads images from search engines based on any keyword list. Build custom image datasets for ML training, research, or visual analysis.
A general-purpose Python utility that uses Selenium WebDriver to scrape images from search engines. Pass any list of keywords and the crawler downloads dozens of representative photos per keyword into category folders — making it easy to build a labeled image dataset for any domain.
The tool is keyword-agnostic: animals, products, places, faces, objects, paintings, anything indexed by image search engines.
- 📊 ML / AI training data collection — Build labeled image datasets for classification or detection models
- 🔬 Visual research — Bulk-collect images for academic analysis
- 🛒 Product / market analysis — Scrape product images by category
- 🎨 Reference libraries — Build mood boards or visual archives
Originally built as the data pipeline for the kpopface AI face matcher — given a list of K-Pop idol names, it downloaded hundreds of representative photos per idol to train a Teachable Machine model. The same crawler works just as well with any other keyword set.
| Layer | Technology |
|---|---|
| Language | Python 3.9 |
| Browser Automation | Selenium WebDriver |
git clone https://github.com/moony01/py-image-crawling.git
cd py-image-crawling
pip install -r requirements.txt
# Edit search keywords in index.py, then run
python index.pyDownloaded images land in the dataset/ directory, organized into one folder per keyword.
MIT License © 2024–2026 moony01