Skip to content

champmq/TheScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TheScrapper

TheScrapper is a versatile web scraping tool designed to extract emails, phone numbers, and social media accounts from websites. You can use the gathered information for various purposes, such as further research or contacting the website's owners.

Installation & Setup

  1. Clone the repository:
git clone https://github.com/champmq/TheScrapper.git
  1. Change the directory:
cd TheScrapper
  1. Install all the requirements:
pip3 install -r requirements.txt

Web UI

A browser-based interface is available. It supports single URL scraping and batch scraping from CSV or Excel files, with a live progress bar and direct download of results.

streamlit run app.py

CLI Usage

  • Simple scan:
python3 TheScrapper.py --url URL
  • Scan and crawl found URLs:
python3 TheScrapper.py --url URL --crawl
  • Retrieve more information about found social media accounts:
python3 TheScrapper.py --url URL --social-extract
  • Scrape from a CSV or Excel file:
python3 TheScrapper.py --csv targets.csv
python3 TheScrapper.py --csv targets.xlsx --csv-column website

Results are automatically written to output/<filename>_results.csv or .xlsx.

For all available flags:

python3 TheScrapper.py --help

Adding More Social Media Sites

If you wish to add more social media sites for scraping, you can do so by appending them to the socials.txt file. Feel free to contribute by submitting a pull request if you'd like to share your additions with the community.

Known Problems

When using a website that is already included in the socials.txt file, the --social-extract flag may produce less useful output. To avoid this, consider excluding such URLs or refraining from using the flag.

LICENSE - GNU


Built by champmq -- also check out CoSINT, an AI-powered OSINT runtime.

About

Scrape emails, phone numbers and social media accounts from a website.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages