This project demonstrates web scraping using Python by extracting book-related data from the Books to Scrape website.
The scraper collects detailed information about books and stores it in structured CSV files for analysis and practice.
The goal of this project is to practice:
- Web scraping fundamentals
- HTML parsing
- Data extraction and structuring
- Working with real-world scraped datasets
All scraping logic is implemented in a Jupyter Notebook for easy understanding and modification.
- Python 🐍
- Requests – HTTP requests
- BeautifulSoup – HTML parsing
- Pandas – Data handling
- Jupyter Notebook
- 📓 BooksScraping.ipynb – Main notebook with scraping logic
- 📊 Books.csv – Basic extracted dataset
- 📊 BookInfo.csv – Detailed book information
- 📊 BookDataSet(Scraped).csv – Combined and cleaned dataset
The scraper collects:
- Book title
- Price
- Availability
- Rating
- Category
- Additional book details (where available)
- Sends requests to web pages
- Parses HTML content using BeautifulSoup
- Extracts structured book data
- Cleans and organizes data using Pandas
- Stores results in CSV format
This project is built to:
- Learn and practice web scraping
- Work with structured and unstructured web data
- Improve Python and data handling skills
- Build beginner-friendly data projects
This project is for educational purposes only.
The target website (Books to Scrape) is designed specifically for practicing web scraping.
- Scrape multiple categories automatically
- Store data in a database (SQLite / MongoDB)
- Add data visualization
- Build a scraping pipeline
Anupam Singh
Aspiring Data Analyst & Developer