Skip to content

kami4ka/2dehands-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2dehands Scraper

A Python scraper for extracting product listings from 2dehands.be using the ScrapingAnt API.

Features

  • Scrapes product listings from 2dehands.be category pages
  • Extracts: title, price, description, location, condition, shipping info, seller name
  • Supports URL-based pagination (multiple pages per category)
  • Supports multiple categories (electronics, phones, gaming, fashion, etc.)
  • Exports results to CSV and JSON formats
  • Automatic deduplication of listings

Prerequisites

Note: The ScrapingAnt free plan has a concurrency limit of 1 thread. For faster scraping with multiple concurrent requests, consider upgrading to a paid plan.

Installation

  1. Clone the repository:
git clone https://github.com/kami4ka/2dehands-scraper.git
cd 2dehands-scraper
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set your ScrapingAnt API key:
export SCRAPINGANT_API_KEY="your-api-key-here"

Usage

Basic Usage

Scrape default categories (Windows Laptops + iPhone):

python main.py -v

Scrape Specific Categories

python main.py --categories "computers-en-software/windows-laptops" -v

Scrape Multiple Pages

python main.py --pages 3 -v

Scrape All Categories

python main.py --all-categories -v

List Available Categories

python main.py --list-categories

Custom Output Paths

python main.py --output-csv my_listings.csv --output-json my_listings.json -v

Available Categories

Electronics

  • Computers en Software: computers-en-software
  • Windows Laptops: computers-en-software/windows-laptops
  • Apple MacBook: computers-en-software/apple-macbook
  • Monitoren: computers-en-software/monitoren
  • Tablets: computers-en-software/tablets

Audio & TV

  • Audio, TV en Foto: audio-tv-en-foto
  • Televisies: audio-tv-en-foto/televisies
  • Luidsprekers: audio-tv-en-foto/luidsprekers
  • Koptelefoons: audio-tv-en-foto/koptelefoons

Phones

  • Mobiele Telefoons: telecommunicatie/mobiele-telefoons
  • Apple iPhone: telecommunicatie/mobiele-telefoons-apple-iphone
  • Samsung Phones: telecommunicatie/mobiele-telefoons-samsung

Gaming

  • Spelcomputers en Games: spelcomputers-en-games
  • PlayStation: spelcomputers-en-games/playstation
  • Xbox: spelcomputers-en-games/xbox
  • Nintendo: spelcomputers-en-games/nintendo

Home & Garden

  • Huis en Inrichting: huis-en-inrichting
  • Meubels: huis-en-inrichting/meubels
  • Tuin en Terras: tuin-en-terras

Vehicles

  • Fietsen en Brommers: fietsen-en-brommers
  • Heren Fietsen: fietsen-en-brommers/fietsen-heren
  • Dames Fietsen: fietsen-en-brommers/fietsen-dames

Fashion

  • Kleding Dames: kleding-dames
  • Kleding Heren: kleding-heren
  • Sieraden, Tassen en Uiterlijk: sieraden-tassen-en-uiterlijk

Output Format

CSV Fields

  • title - Product name
  • price - Listed price (e.g., "€ 99,95", "Bieden", "Gratis")
  • description - Product description
  • location - Seller location (e.g., "Brussel", "Antwerpen")
  • listing_url - Direct link to the listing
  • category - Category name
  • condition - Item condition (Nieuw, Gebruikt, Zo goed als nieuw, Refurbished)
  • shipping - Shipping options (Verzenden, Ophalen, Ophalen of Verzenden)
  • seller_name - Seller name
  • date_posted - Posting date (e.g., "Vandaag", "Gisteren", "12 jan")
  • scraped_at - UTC timestamp of when data was scraped

Sample Output

title,price,description,location,listing_url,category,condition,shipping,seller_name,date_posted,scraped_at
Lenovo ThinkPad Core i5,€ 249,00,Refurbished laptop met garantie,Brussel,https://www.2dehands.be/v/computers-en-software/windows-laptops/a1234567-lenovo-thinkpad,Windows Laptops,Refurbished,Ophalen,TechShop,Vandaag,2025-01-13T10:00:00+00:00

Project Structure

2dehandsScraper/
├── main.py           # CLI entry point
├── scraper.py        # Main scraper class
├── models.py         # Data models (Listing, ListingCollection)
├── config.py         # Configuration settings
├── utils.py          # Utility functions
├── requirements.txt  # Python dependencies
├── output/           # Output directory for CSV/JSON files
└── README.md

How It Works

  1. API Request: Uses ScrapingAnt's browser rendering with Belgian residential proxy to access 2dehands.be
  2. Wait for Content: Waits for listing cards to load using CSS selector .hz-Listing
  3. Pagination: Supports URL-based pagination (/l/category/p/2/, /l/category/p/3/, etc.)
  4. Parsing: Extracts listing data from card elements using BeautifulSoup
  5. Deduplication: Removes duplicate listings based on URL
  6. Export: Saves results to CSV and JSON formats

Rate Limiting

The scraper includes built-in delays between requests to respect rate limits. For the ScrapingAnt free tier with 1 thread concurrency, expect approximately 30-60 seconds per page.

License

MIT License

About

A Python scraper for extracting product listings from 2dehands.be using ScrapingAnt API

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages