SiteScraper

The Site Scraper Tool is an ethical hacking program developed in Python that enables users to clone websites for educational purposes by copying HTML, CSS, JavaScript, and PHP.

Note: Use this tool responsibly and only on sites where you have explicit permission, as unauthorized scraping can lead to legal issues.

Use Responsibly

Warning: Use SiteScraper only on websites you own or have explicit permission to test and analyze. Unauthorized use of this tool on external sites without permission may violate laws and terms of service.

Installation

Clone the repository:

git clone https://github.com/s-r-e-e-r-a-j/SiteScraper.git

Navigate to the SiteScraper directory

cd SiteScraper

install Required libraries:-

pip3 install -r requirements.txt

Navigate to the Site Scraper directory

cd 'Site Scraper'

install the tool:

sudo python3 install.py

Then Enter y for install

Usage

Run SiteScraper from the command line with the following options:

sitescraper <URL> [options]

Command-Line Options

Option Description

<URL> The URL of the website to clone

-d, --depth (Optional) Set the maximum crawl depth (default: 3)

-o, --output (Optional) Set the output directory (default: website_clone) you can also specify path to save example -o /home/kali/Desktop/result

Example

To clone a website up to a depth of 2 and save it in a directory named my_clone, use the following command:

sitescraper https://example.com -d 2 -o /home/kali/Desktop/my_clone

After the cloning process is complete, a directory named after the domain (e.g., http.example.com) will be created inside my_clone.

To view the cloned website, open the index.html file in a browser.

If you see .php files in the directory, it means the website has a PHP backend, and you need to start a PHP server to run it properly.

Starting the PHP Server

Navigate to the Cloned Website Directory

cd /home/kali/Desktop/my_clone/http.example.com

Start the PHP Server

Replace yourmachineipaddress with your actual local IP (e.g., 192.168.1.5):

php -S yourmachineipaddress:8080

Example:

php -S 192.168.1.5:8080

Open the Cloned Website in a Browser

In your web browser, enter:

http://yourmachineipaddress:8080

Example:

http://192.168.1.5:8080

Now, you should be able to access and interact with the cloned website.

How It Works

SiteScraper follows these steps:

Initial Crawl: Downloads the main page of the target site.
Recursive Crawling: Finds all internal links, then recursively crawls and saves them.
Asset Handling: Downloads and saves linked assets (CSS, JS, images).
File Structure Preservation: Saves files with the same structure as the original website, maintaining directories and paths.

uninstallation

cd SiteScraper

cd 'Site Scraper'

sudo python3 install.py

Then Enter n for uninstall

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
Site Scraper		Site Scraper
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SiteScraper

Use Responsibly

Installation

Clone the repository:

Navigate to the SiteScraper directory

install Required libraries:-

Navigate to the Site Scraper directory

install the tool:

Usage

Command-Line Options

Option Description

Example

Starting the PHP Server

How It Works

uninstallation

License

About

Releases

Packages

Languages

License

s-r-e-e-r-a-j/SiteScraper

Folders and files

Latest commit

History

Repository files navigation

SiteScraper

Use Responsibly

Installation

Clone the repository:

Navigate to the SiteScraper directory

install Required libraries:-

Navigate to the Site Scraper directory

install the tool:

Usage

Command-Line Options

Option Description

Example

Starting the PHP Server

How It Works

uninstallation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages