Snaptrack is a site-snapshot and change-tracking tool. It captures the HTML of any given site (or set of pages), stores snapshots (including HTTP headers) in a local SQLite database, and highlights differences between consecutive snapshots.
- Overview
- Features
- Getting Started
- Usage
- Storing HTTP Headers & Status Codes
- Use Cases
- Configuration & Customization
- Contributing
- License
Snaptrack is a Go-based application designed to monitor websites for changes over time. It does this by:
- Fetching the raw HTML (and HTTP headers) from a URL or recursively crawling an entire domain.
- Storing each snapshot in a local SQLite database.
- Comparing each new snapshot to the previous version for that page and presenting a unified diff of what changed.
The tool can be run in CLI mode for batch usage (crawl
, check
, etc.) or in a TUI (Text-based User Interface) for interactive exploration of snapshots.
- Recursive Crawl: Optionally follow links within the same domain to capture snapshots of multiple pages.
- SQLite Database: Stores snapshots locally—simple, lightweight, no external DB required.
- Diff Highlights: Compares new HTML to the previous version for each page, generating a unified diff (with optional color).
- TUI Interface: A text-based interface lets you browse tracked URLs, see changes, and re-check pages on demand.
- Stores Request/Response Headers: Helpful for auditing server headers, analyzing status codes, and monitoring security headers over time.
- Raw HTTP Approach (No Headless Browser)**: Faster, simpler for static or server-rendered pages.
- (If needed, revert to a headless browser approach for JavaScript-heavy sites.)
- Go (version 1.18+ recommended).
- A SQLite driver (e.g.,
github.com/mattn/go-sqlite3
) automatically installed viago mod tidy
. - (Optional) A color-supporting terminal for color-coded diffs.
- Clone this repository:
git clone https://github.com/copyleftdev/snaptrack.git
- Change to the directory:
cd snaptrack
- Install dependencies:
go mod tidy
Use the Makefile:
make build
make run
Or manually:
go build -o bin/snapstack ./cmd/snapstack
The executable snapstack
is placed in ./bin/
.
Snaptrack can be invoked via CLI subcommands or launched in a TUI if no arguments are provided.
./bin/snapstack crawl https://example.com --max-depth=2
- Crawl the specified domain (
example.com
) recursively up to 2 levels. - Store HTML snapshots (and headers/status code) in
snapshots.db
. - Show diff logs if changes are detected on subsequent crawls.
./bin/snapstack check https://example.com
(If implemented—example usage. Checks a single page.)
./bin/snapstack diff https://example.com
(If implemented—example usage. Shows a unified diff for that page’s last two snapshots.)
./bin/snapstack
- Launches a text-based interface to:
- List all tracked URLs in your DB.
- Select a URL to see if it changed.
- Press d for diff output, r to recapture, etc.
- Press q or Esc to quit.
By default, Snaptrack now captures and stores:
- Request Headers (the final headers sent, such as User-Agent).
- Response Headers (e.g.
Content-Type
,Set-Cookie
,Cache-Control
). - HTTP Status Code (e.g.
200
,301
,404
).
They’re stored as JSON in the snapshots
table under request_headers
and response_headers
columns, along with an integer status_code
. You can optionally parse or display this data in your TUI or logs to monitor header changes or track server responses over time.
- Maintain a historical record of content changes over time.
- Quickly identify any unapproved modifications or mistakes in text or layout.
- Monitor a public site for unexpected or malicious header changes or inserted scripts.
- Diff after each deployment or scheduled check to confirm the site hasn’t been tampered with.
- Helps detect defacement, backdoors, or suspicious header values if an attacker alters responses.
- Compare staging and production pages by capturing snapshots from each environment.
- Confirm no undesired changes slipped into a new release, both in HTML and server headers.
- Record each build’s output so you can see exactly what changed from one version to the next.
- Track how metadata, headings, or content changes might affect SEO.
- Keep a historical log of keyword or content modifications.
- Database Path: Defaults to
snapshots.db
in the current directory. Change inmain.go
or environment variables as desired. - Crawl Depth & Concurrency:
--max-depth
plus internal concurrency settings let you control the scope and speed of crawling. - Timeout: Each HTTP request uses a default of ~15 seconds. Adjust in
capture.go
if needed. - Unified Diff: We produce a standard “unified diff.” For color highlighting, ensure your terminal supports ANSI or integrate with external tools.
We welcome contributions! Please:
- Fork this repo & create a feature branch.
- Submit a pull request when ready.
- Open an issue to discuss features, request improvements, or report bugs.
Snaptrack is licensed under the MIT License. See the LICENSE file for more info.