Skip to content

a site-snapshot and change-tracking tool. It captures the HTML of any given site (or set of pages), stores snapshots in a local SQLite database, and highlights differences between consecutive snapshots.

License

Notifications You must be signed in to change notification settings

copyleftdev/snaptrack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snaptrack Logo

Snaptrack

Snaptrack is a site-snapshot and change-tracking tool. It captures the HTML of any given site (or set of pages), stores snapshots (including HTTP headers) in a local SQLite database, and highlights differences between consecutive snapshots.

Table of Contents

  1. Overview
  2. Features
  3. Getting Started
    1. Prerequisites
    2. Installation
    3. Building from Source
  4. Usage
    1. Crawling a Domain
    2. Checking or Diffing a Single URL
    3. TUI (Interactive Mode)
  5. Storing HTTP Headers & Status Codes
  6. Use Cases
    1. Site Owners & Content Managers
    2. Security Teams & Professionals
    3. Testers & QA Engineers
    4. SEO & Marketing Analysts
  7. Configuration & Customization
  8. Contributing
  9. License

Overview

Snaptrack is a Go-based application designed to monitor websites for changes over time. It does this by:

  1. Fetching the raw HTML (and HTTP headers) from a URL or recursively crawling an entire domain.
  2. Storing each snapshot in a local SQLite database.
  3. Comparing each new snapshot to the previous version for that page and presenting a unified diff of what changed.

The tool can be run in CLI mode for batch usage (crawl, check, etc.) or in a TUI (Text-based User Interface) for interactive exploration of snapshots.


Features

  • Recursive Crawl: Optionally follow links within the same domain to capture snapshots of multiple pages.
  • SQLite Database: Stores snapshots locally—simple, lightweight, no external DB required.
  • Diff Highlights: Compares new HTML to the previous version for each page, generating a unified diff (with optional color).
  • TUI Interface: A text-based interface lets you browse tracked URLs, see changes, and re-check pages on demand.
  • Stores Request/Response Headers: Helpful for auditing server headers, analyzing status codes, and monitoring security headers over time.
  • Raw HTTP Approach (No Headless Browser)**: Faster, simpler for static or server-rendered pages.
    • (If needed, revert to a headless browser approach for JavaScript-heavy sites.)

Getting Started

1. Prerequisites

  • Go (version 1.18+ recommended).
  • A SQLite driver (e.g., github.com/mattn/go-sqlite3) automatically installed via go mod tidy.
  • (Optional) A color-supporting terminal for color-coded diffs.

2. Installation

  1. Clone this repository:
    git clone https://github.com/copyleftdev/snaptrack.git
  2. Change to the directory:
    cd snaptrack
  3. Install dependencies:
    go mod tidy

3. Building from Source

Use the Makefile:

make build
make run

Or manually:

go build -o bin/snapstack ./cmd/snapstack

The executable snapstack is placed in ./bin/.


Usage

Snaptrack can be invoked via CLI subcommands or launched in a TUI if no arguments are provided.

1. Crawling a Domain

./bin/snapstack crawl https://example.com --max-depth=2
  • Crawl the specified domain (example.com) recursively up to 2 levels.
  • Store HTML snapshots (and headers/status code) in snapshots.db.
  • Show diff logs if changes are detected on subsequent crawls.

2. Checking or Diffing a Single URL

./bin/snapstack check https://example.com

(If implemented—example usage. Checks a single page.)

./bin/snapstack diff https://example.com

(If implemented—example usage. Shows a unified diff for that page’s last two snapshots.)

3. TUI (Interactive Mode)

./bin/snapstack
  • Launches a text-based interface to:
    • List all tracked URLs in your DB.
    • Select a URL to see if it changed.
    • Press d for diff output, r to recapture, etc.
    • Press q or Esc to quit.

Storing HTTP Headers & Status Codes

By default, Snaptrack now captures and stores:

  • Request Headers (the final headers sent, such as User-Agent).
  • Response Headers (e.g. Content-Type, Set-Cookie, Cache-Control).
  • HTTP Status Code (e.g. 200, 301, 404).

They’re stored as JSON in the snapshots table under request_headers and response_headers columns, along with an integer status_code. You can optionally parse or display this data in your TUI or logs to monitor header changes or track server responses over time.


Use Cases

1. Site Owners & Content Managers

  • Maintain a historical record of content changes over time.
  • Quickly identify any unapproved modifications or mistakes in text or layout.

2. Security Teams & Professionals

  • Monitor a public site for unexpected or malicious header changes or inserted scripts.
  • Diff after each deployment or scheduled check to confirm the site hasn’t been tampered with.
  • Helps detect defacement, backdoors, or suspicious header values if an attacker alters responses.

3. Testers & QA Engineers

  • Compare staging and production pages by capturing snapshots from each environment.
  • Confirm no undesired changes slipped into a new release, both in HTML and server headers.
  • Record each build’s output so you can see exactly what changed from one version to the next.

4. SEO & Marketing Analysts

  • Track how metadata, headings, or content changes might affect SEO.
  • Keep a historical log of keyword or content modifications.

Configuration & Customization

  • Database Path: Defaults to snapshots.db in the current directory. Change in main.go or environment variables as desired.
  • Crawl Depth & Concurrency: --max-depth plus internal concurrency settings let you control the scope and speed of crawling.
  • Timeout: Each HTTP request uses a default of ~15 seconds. Adjust in capture.go if needed.
  • Unified Diff: We produce a standard “unified diff.” For color highlighting, ensure your terminal supports ANSI or integrate with external tools.

Contributing

We welcome contributions! Please:

  1. Fork this repo & create a feature branch.
  2. Submit a pull request when ready.
  3. Open an issue to discuss features, request improvements, or report bugs.

License

Snaptrack is licensed under the MIT License. See the LICENSE file for more info.

About

a site-snapshot and change-tracking tool. It captures the HTML of any given site (or set of pages), stores snapshots in a local SQLite database, and highlights differences between consecutive snapshots.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published