HuB commited on
Commit Β·
cc4181a
1
Parent(s): f71e0c8
Expand API endpoints and add comprehensive documentation (README_FULL.md)
Browse files- README.md +23 -1
- README_FULL.md +60 -0
- api/routes.py +44 -6
README.md
CHANGED
|
@@ -53,5 +53,27 @@ To enable full functionality, set the following in your `.env`:
|
|
| 53 |
- `GOOGLE_API_KEY`: For Safe Browsing checks.
|
| 54 |
- `FAST_CHECK_INTERVAL`: Frequency of uptime checks (default: 60s).
|
| 55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
## Deployment
|
| 57 |
-
WebGuard is pre-configured for deployment on Hugging Face Spaces using the provided `Dockerfile`. It follows the standard port (7860) and user (1000) requirements for secure execution.
|
|
|
|
| 53 |
- `GOOGLE_API_KEY`: For Safe Browsing checks.
|
| 54 |
- `FAST_CHECK_INTERVAL`: Frequency of uptime checks (default: 60s).
|
| 55 |
|
| 56 |
+
## API Reference
|
| 57 |
+
|
| 58 |
+
WebGuard provides a comprehensive REST API for triggering scans and retrieving monitoring data.
|
| 59 |
+
|
| 60 |
+
### 1. Unified Full Scan
|
| 61 |
+
`POST /api/scan/full`
|
| 62 |
+
Runs all available checks (SSL, DNS, HTTP, Port, Ping, Blacklist) and provides an AI-powered analysis summary.
|
| 63 |
+
|
| 64 |
+
### 2. Protocol-Specific Scans
|
| 65 |
+
- `POST /api/scan/ssl`: Targeted SSL certificate audit.
|
| 66 |
+
- `POST /api/scan/dns`: Deep DNS record verification.
|
| 67 |
+
- `POST /api/scan/ping`: Latency and reachability check.
|
| 68 |
+
- `POST /api/scan/port`: Common service discovery.
|
| 69 |
+
- `POST /api/scan/http`: Web server and redirect analysis.
|
| 70 |
+
- `POST /api/scan/blacklist`: Reputation and phishing database lookup.
|
| 71 |
+
|
| 72 |
+
### 3. Monitoring Management
|
| 73 |
+
- `GET /api/urls`: List all currently monitored URLs.
|
| 74 |
+
- `POST /api/monitor/add?url={url}`: Start continuous monitoring for a site.
|
| 75 |
+
- `POST /api/monitor/remove?url={url}`: Stop monitoring.
|
| 76 |
+
- `GET /api/results/{url}`: Retrieve historical scan data for a specific site.
|
| 77 |
+
|
| 78 |
## Deployment
|
| 79 |
+
WebGuard is pre-configured for deployment on Hugging Face Spaces using the provided `Dockerfile`. It follows the standard port (7860) and user (1000) requirements for secure execution.
|
README_FULL.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# WebGuard: Full Project Architecture & Technical Specification
|
| 2 |
+
|
| 3 |
+
## 1. Project Overview
|
| 4 |
+
WebGuard is a modular, asynchronous monitoring engine designed to provide high-fidelity insights into web infrastructure. Unlike traditional uptime monitors that only perform simple HTTP "pings," WebGuard audits the entire stackβfrom low-level ICMP reachability to high-level visual consistency.
|
| 5 |
+
|
| 6 |
+
## 2. Technical Stack & Tool Selection
|
| 7 |
+
|
| 8 |
+
### Backend: FastAPI
|
| 9 |
+
- **Why**: Chosen for its native support for `asyncio`, which is critical for a monitoring tool that performs hundreds of concurrent network requests. It provides automatic OpenAPI documentation and high performance.
|
| 10 |
+
- **Workflow**: Serves as the API gateway and orchestrates the check lifecycle via a Pydantic-validated request/response model.
|
| 11 |
+
|
| 12 |
+
### Automation: Playwright (Chromium)
|
| 13 |
+
- **Why**: Industry-leading browser automation tool. Unlike Selenium, it is faster, more reliable, and handles modern SPAs (Single Page Applications) natively.
|
| 14 |
+
- **Role**: Powers the `crawler` for subpage auditing, `visual_regression` for layout shift detection, and `screenshot` for visual evidence of outages.
|
| 15 |
+
|
| 16 |
+
### Scheduler: APScheduler
|
| 17 |
+
- **Why**: A flexible, in-process scheduler that doesn't require an external message broker like Redis or RabbitMQ.
|
| 18 |
+
- **Workflows**:
|
| 19 |
+
- **Fast (60s)**: Lightweight checks (Ping, HTTP, SSL).
|
| 20 |
+
- **Medium (300s)**: Heuristic checks (Blacklists, Headers, DNS).
|
| 21 |
+
- **Heavy (900s)**: Resource-intensive checks (Browser crawling, Visual regression).
|
| 22 |
+
|
| 23 |
+
### Database: SQLite & SQLAlchemy
|
| 24 |
+
- **Why**: SQLite is a zero-config, serverless database ideal for self-hosted apps on platforms like Hugging Face Spaces. SQLAlchemy provides a robust ORM to handle complex queries for historical data analysis.
|
| 25 |
+
|
| 26 |
+
### AI Engine: Anthropic Claude (API)
|
| 27 |
+
- **Why**: Superior reasoning capabilities for technical debugging.
|
| 28 |
+
- **Role**: Analyzes raw JSON results from all scanners and provides a human-readable "semantic" explanation of what exactly is failing (e.g., "The site is up, but your SSL certificate is mismatched for the WWW subdomain").
|
| 29 |
+
|
| 30 |
+
## 3. Directory Structure
|
| 31 |
+
|
| 32 |
+
```text
|
| 33 |
+
uptime/
|
| 34 |
+
βββ ai/ # AI Analysis logic (Anthropic integration)
|
| 35 |
+
βββ api/ # FastAPI routes and Pydantic schemas
|
| 36 |
+
βββ browser/ # Playwright-based browser automation (Crawler, Screenshots)
|
| 37 |
+
βββ checkers/ # Protocol-specific scanner modules (SSL, DNS, Port, etc.)
|
| 38 |
+
βββ config/ # Application settings and logging configuration
|
| 39 |
+
βββ frontend/ # Static HTML/JS for the dashboard
|
| 40 |
+
βββ scheduler/ # Background task management and monitoring cycles
|
| 41 |
+
βββ storage/ # Database models, migrations, and initialization
|
| 42 |
+
βββ app.py # Main entry point (FastAPI initialization)
|
| 43 |
+
βββ Dockerfile # Containerization for Hugging Face Spaces
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
## 4. How It Works (The Lifecycle)
|
| 47 |
+
|
| 48 |
+
1. **Request**: A user or the scheduler triggers a scan for a URL.
|
| 49 |
+
2. **Orchestration**: The `runner.py` in the scheduler module picks up the request. It dynamically imports and executes the relevant modules from the `checkers/` and `browser/` directories.
|
| 50 |
+
3. **Execution**: Scanners run in parallel using `asyncio.gather` to maximize performance.
|
| 51 |
+
4. **Logging**: Every step is captured by the centralized logging system, providing a real-time audit trail in `webguard.log`.
|
| 52 |
+
5. **Persistence**: Results are normalized and stored in the SQLite database.
|
| 53 |
+
6. **AI Analysis**: If issues are detected, the raw data is sent to the AI Agent. The agent returns a root-cause summary.
|
| 54 |
+
7. **Response**: The API returns the combined results, including the AI's semantic interpretation.
|
| 55 |
+
|
| 56 |
+
## 5. Security & Deployment Standards
|
| 57 |
+
WebGuard is built with security-first principles:
|
| 58 |
+
- **Isolation**: Runs as a non-privileged user (UID 1000) inside Docker.
|
| 59 |
+
- **Data Privacy**: All monitoring data stays on your local storage; only sanitized scan results are sent to the optional AI provider.
|
| 60 |
+
- **Reliability**: Uses PhishTank and Google Safe Browsing to provide a "Shield" for your brand reputation.
|
api/routes.py
CHANGED
|
@@ -1,16 +1,24 @@
|
|
| 1 |
from fastapi import APIRouter, HTTPException
|
| 2 |
-
from fastapi.responses import FileResponse
|
| 3 |
from api.schemas import ScanRequest, ScanResponse
|
| 4 |
from scheduler.runner import run_all_now, add_url, remove_url, _monitored_urls
|
| 5 |
-
from storage.db import get_latest_results, get_all_urls
|
| 6 |
-
import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
|
|
|
| 8 |
router = APIRouter()
|
| 9 |
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
try:
|
|
|
|
| 14 |
result = await run_all_now(req.url)
|
| 15 |
if req.monitor:
|
| 16 |
add_url(req.url)
|
|
@@ -20,8 +28,38 @@ async def scan(req: ScanRequest):
|
|
| 20 |
results=result["results"],
|
| 21 |
)
|
| 22 |
except Exception as e:
|
|
|
|
| 23 |
raise HTTPException(status_code=500, detail=str(e))
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
@router.get("/results/{url:path}")
|
| 27 |
async def get_results(url: str, limit: int = 20):
|
|
|
|
| 1 |
from fastapi import APIRouter, HTTPException
|
|
|
|
| 2 |
from api.schemas import ScanRequest, ScanResponse
|
| 3 |
from scheduler.runner import run_all_now, add_url, remove_url, _monitored_urls
|
| 4 |
+
from storage.db import get_latest_results, get_all_urls
|
| 5 |
+
import checkers.ssl_check as ssl_check
|
| 6 |
+
import checkers.dns_check as dns_check
|
| 7 |
+
import checkers.http_check as http_check
|
| 8 |
+
import checkers.ping_check as ping_check
|
| 9 |
+
import checkers.port_check as port_check
|
| 10 |
+
import checkers.blacklist_check as blacklist_check
|
| 11 |
+
import checkers.multi_scanner as multi_scanner
|
| 12 |
+
from config.logging import get_logger
|
| 13 |
|
| 14 |
+
logger = get_logger("api.routes")
|
| 15 |
router = APIRouter()
|
| 16 |
|
| 17 |
+
@router.post("/scan/full", response_model=ScanResponse)
|
| 18 |
+
async def scan_full(req: ScanRequest):
|
| 19 |
+
"""Run all available checks and return a unified report."""
|
| 20 |
try:
|
| 21 |
+
logger.info(f"Full scan requested for: {req.url}")
|
| 22 |
result = await run_all_now(req.url)
|
| 23 |
if req.monitor:
|
| 24 |
add_url(req.url)
|
|
|
|
| 28 |
results=result["results"],
|
| 29 |
)
|
| 30 |
except Exception as e:
|
| 31 |
+
logger.error(f"Full scan failed: {e}")
|
| 32 |
raise HTTPException(status_code=500, detail=str(e))
|
| 33 |
|
| 34 |
+
@router.post("/scan/ssl")
|
| 35 |
+
async def scan_ssl(req: ScanRequest):
|
| 36 |
+
return await ssl_check.run(req.url)
|
| 37 |
+
|
| 38 |
+
@router.post("/scan/dns")
|
| 39 |
+
async def scan_dns(req: ScanRequest):
|
| 40 |
+
return await dns_check.run(req.url)
|
| 41 |
+
|
| 42 |
+
@router.post("/scan/http")
|
| 43 |
+
async def scan_http(req: ScanRequest):
|
| 44 |
+
return await http_check.run(req.url)
|
| 45 |
+
|
| 46 |
+
@router.post("/scan/ping")
|
| 47 |
+
async def scan_ping(req: ScanRequest):
|
| 48 |
+
return await ping_check.run(req.url)
|
| 49 |
+
|
| 50 |
+
@router.post("/scan/port")
|
| 51 |
+
async def scan_port(req: ScanRequest):
|
| 52 |
+
return await port_check.run(req.url)
|
| 53 |
+
|
| 54 |
+
@router.post("/scan/blacklist")
|
| 55 |
+
async def scan_blacklist(req: ScanRequest):
|
| 56 |
+
return await blacklist_check.run(req.url)
|
| 57 |
+
|
| 58 |
+
@router.post("/scan", response_model=ScanResponse)
|
| 59 |
+
async def scan_legacy(req: ScanRequest):
|
| 60 |
+
# Keep original /scan for backward compatibility
|
| 61 |
+
return await scan_full(req)
|
| 62 |
+
|
| 63 |
|
| 64 |
@router.get("/results/{url:path}")
|
| 65 |
async def get_results(url: str, limit: int = 20):
|