TheScrapper

TheScrapper is a versatile web scraping tool designed to extract emails, phone numbers, and social media accounts from websites. You can use the gathered information for various purposes, such as further research or contacting the website's owners.

Installation & Setup

Clone the repository:

git clone https://github.com/champmq/TheScrapper.git

Change the directory:

cd TheScrapper

Install all the requirements:

pip3 install -r requirements.txt

Web UI

A browser-based interface is available. It supports single URL scraping and batch scraping from CSV or Excel files, with a live progress bar and direct download of results.

streamlit run app.py

CLI Usage

Simple scan:

python3 TheScrapper.py --url URL

Scan and crawl found URLs:

python3 TheScrapper.py --url URL --crawl

Retrieve more information about found social media accounts:

python3 TheScrapper.py --url URL --social-extract

Scrape from a CSV or Excel file:

python3 TheScrapper.py --csv targets.csv
python3 TheScrapper.py --csv targets.xlsx --csv-column website

Results are automatically written to output/<filename>_results.csv or .xlsx.

For all available flags:

python3 TheScrapper.py --help

Adding More Social Media Sites

If you wish to add more social media sites for scraping, you can do so by appending them to the socials.txt file. Feel free to contribute by submitting a pull request if you'd like to share your additions with the community.

Known Problems

When using a website that is already included in the socials.txt file, the --social-extract flag may produce less useful output. To avoid this, consider excluding such URLs or refraining from using the flag.

LICENSE - GNU

Built by champmq -- also check out CoSINT, an AI-powered OSINT runtime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TheScrapper

Installation & Setup

Web UI

CLI Usage

Adding More Social Media Sites

Known Problems

LICENSE - GNU

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
modules		modules
output		output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TheScrapper.py		TheScrapper.py
app.py		app.py
requirements.txt		requirements.txt
socials.txt		socials.txt

Folders and files

Latest commit

History

Repository files navigation

TheScrapper

Installation & Setup

Web UI

CLI Usage

Adding More Social Media Sites

Known Problems

LICENSE - GNU

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages