Sometimes manually scanning or searching multiple websites for specific data can be tedious and time-consuming. We can fix that by automating these tasks with a Python script that extracts the domain name, IP address, performs an Nmap scan, and retrieves WHOIS information from a given website URL. The script also stores the results in text files for easy readability.
Disclaimer: Scanning websites without explicit permission is illegal. For this project, I used the intentionally vulnerable website https://www.hackthissite.org to test my script.
- Visual Studio Code: https://code.visualstudio.com/download
- Python: https://www.python.org/downloads/
- Nmap: https://nmap.org/download.html
- import os (operating system): https://docs.python.org/3/library/os.html
- pip install tld (Top Level Domain): https://pypi.org/project/tld/
- import socket:
- import subprocess:
- pip install python-whois: https://pypi.org/project/python-whois/
- First I needed to import os so that we can interact with the operating system to read and write files. Next I created 2 core functions that we will be using later which will be 'create_dir' to create a directory and if that directory does not exist, one will be created. Our second function 'write_file' will open the file in write mode and will store our collected data into that file.
- Next, I will create a new file named domain_name.py to create a function that will accept a url and extracts its domain name as its output. Pip install the tld package in our terminal and import get_fld. We will also be using urlib.parse to parse the url into sections to check if there was a scheme inputted.
- You can also check if you have properly installed the tld package via the cmd.
- If you are having trouble getting the package to work like I did, hit control+shift P and search for 'Python: select interpretor' and make sure you have selected the correct Python environment. I accidentally didn't select the 'add ON PATH' option when I had initially downloaded Python. You can either add it manually by searching "where python" in cmd and copying and pasting the path into your environment variable path or uninstall, re-install, and select the 'add ON PATH' check box.
- Let's run our get_domain_name code to see if we get our desired output.
- Now we can move on to create our ip grabber by importing socket. We could have imported get_tld to get the domain again but I wanted to try something different by using .replace and .split. I used try and except method in the event the code throws an error and ran my code to see if I get the desired output.
- This Nmap section was a little tricky for me and took the most time for me to get the syntax right. After you downloaded Nmap, make sure you have it correctly downloaded via the commandline tool and added in our environment variable path just like our Python program help steps from earlier.
- Nmap --version (should not throw an error)
- where Nmap: copy and paste the path and manually add the path onto your environment variables if you need to.
4.1 The subprocess module gives us the ability to start applications/programs that can pass arguments to them from Python. Let's import subprocess and create a function that accepts 2 parameters 'options' which will be 1 or more Nmap commands the user will input, and the ip address. I ran my code to see if my code runs as expected.
- Finally our WHOIS function. Luckily this script is the easiest to code. Pip install python-whois in our terminal, Import os and whois. Create a function that will accept a url and use whois.whois on the input url to get the website's whois information. I was getting an error in the terminal that the output couldn't be read because it wasn't a string. So I just converted the output into a str and I got the desired output.
- Now that we have all of our scripts ready, let's create a new file titled main.py to tie everything together. Start by importing all of our scripts and creating a root directory named to whatever you would like.
- This scan_data function will take 2 parameters: the name of our website, and the full url. when it is called, it will call all of our scripts and store the results into their respective variables and at the end will call the scan_results function that takes in all of those results as arguments to store in a txt file; which we will create next.
- This scan_results function will be called in our scan_data function. It will add the name of the website we want to scan to our root directory and create a new directory, then it will write the scanned data/results into txt files and add it to that new directory.
- Now, we can finally hit run on our code and the scripts should save everything!



















