TheScrapper is a versatile web scraping tool designed to extract emails, phone numbers, and social media accounts from websites. You can use the gathered information for various purposes, such as further research or contacting the website's owners.
- Clone the repository:
git clone https://github.com/champmq/TheScrapper.git- Change the directory:
cd TheScrapper- Install all the requirements:
pip3 install -r requirements.txtA browser-based interface is available. It supports single URL scraping and batch scraping from CSV or Excel files, with a live progress bar and direct download of results.
streamlit run app.py- Simple scan:
python3 TheScrapper.py --url URL- Scan and crawl found URLs:
python3 TheScrapper.py --url URL --crawl- Retrieve more information about found social media accounts:
python3 TheScrapper.py --url URL --social-extract- Scrape from a CSV or Excel file:
python3 TheScrapper.py --csv targets.csv
python3 TheScrapper.py --csv targets.xlsx --csv-column websiteResults are automatically written to output/<filename>_results.csv or .xlsx.
For all available flags:
python3 TheScrapper.py --helpIf you wish to add more social media sites for scraping, you can do so by appending them to the socials.txt file.
Feel free to contribute by submitting a pull request if you'd like to share your additions with the community.
When using a website that is already included in the socials.txt file, the --social-extract flag may produce less
useful output. To avoid this, consider excluding such URLs or refraining from using the flag.
LICENSE - GNU
Built by champmq -- also check out CoSINT, an AI-powered OSINT runtime.