HuB commited on
Commit
cc4181a
Β·
1 Parent(s): f71e0c8

Expand API endpoints and add comprehensive documentation (README_FULL.md)

Browse files
Files changed (3) hide show
  1. README.md +23 -1
  2. README_FULL.md +60 -0
  3. api/routes.py +44 -6
README.md CHANGED
@@ -53,5 +53,27 @@ To enable full functionality, set the following in your `.env`:
53
  - `GOOGLE_API_KEY`: For Safe Browsing checks.
54
  - `FAST_CHECK_INTERVAL`: Frequency of uptime checks (default: 60s).
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Deployment
57
- WebGuard is pre-configured for deployment on Hugging Face Spaces using the provided `Dockerfile`. It follows the standard port (7860) and user (1000) requirements for secure execution.
 
53
  - `GOOGLE_API_KEY`: For Safe Browsing checks.
54
  - `FAST_CHECK_INTERVAL`: Frequency of uptime checks (default: 60s).
55
 
56
+ ## API Reference
57
+
58
+ WebGuard provides a comprehensive REST API for triggering scans and retrieving monitoring data.
59
+
60
+ ### 1. Unified Full Scan
61
+ `POST /api/scan/full`
62
+ Runs all available checks (SSL, DNS, HTTP, Port, Ping, Blacklist) and provides an AI-powered analysis summary.
63
+
64
+ ### 2. Protocol-Specific Scans
65
+ - `POST /api/scan/ssl`: Targeted SSL certificate audit.
66
+ - `POST /api/scan/dns`: Deep DNS record verification.
67
+ - `POST /api/scan/ping`: Latency and reachability check.
68
+ - `POST /api/scan/port`: Common service discovery.
69
+ - `POST /api/scan/http`: Web server and redirect analysis.
70
+ - `POST /api/scan/blacklist`: Reputation and phishing database lookup.
71
+
72
+ ### 3. Monitoring Management
73
+ - `GET /api/urls`: List all currently monitored URLs.
74
+ - `POST /api/monitor/add?url={url}`: Start continuous monitoring for a site.
75
+ - `POST /api/monitor/remove?url={url}`: Stop monitoring.
76
+ - `GET /api/results/{url}`: Retrieve historical scan data for a specific site.
77
+
78
  ## Deployment
79
+ WebGuard is pre-configured for deployment on Hugging Face Spaces using the provided `Dockerfile`. It follows the standard port (7860) and user (1000) requirements for secure execution.
README_FULL.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # WebGuard: Full Project Architecture & Technical Specification
2
+
3
+ ## 1. Project Overview
4
+ WebGuard is a modular, asynchronous monitoring engine designed to provide high-fidelity insights into web infrastructure. Unlike traditional uptime monitors that only perform simple HTTP "pings," WebGuard audits the entire stackβ€”from low-level ICMP reachability to high-level visual consistency.
5
+
6
+ ## 2. Technical Stack & Tool Selection
7
+
8
+ ### Backend: FastAPI
9
+ - **Why**: Chosen for its native support for `asyncio`, which is critical for a monitoring tool that performs hundreds of concurrent network requests. It provides automatic OpenAPI documentation and high performance.
10
+ - **Workflow**: Serves as the API gateway and orchestrates the check lifecycle via a Pydantic-validated request/response model.
11
+
12
+ ### Automation: Playwright (Chromium)
13
+ - **Why**: Industry-leading browser automation tool. Unlike Selenium, it is faster, more reliable, and handles modern SPAs (Single Page Applications) natively.
14
+ - **Role**: Powers the `crawler` for subpage auditing, `visual_regression` for layout shift detection, and `screenshot` for visual evidence of outages.
15
+
16
+ ### Scheduler: APScheduler
17
+ - **Why**: A flexible, in-process scheduler that doesn't require an external message broker like Redis or RabbitMQ.
18
+ - **Workflows**:
19
+ - **Fast (60s)**: Lightweight checks (Ping, HTTP, SSL).
20
+ - **Medium (300s)**: Heuristic checks (Blacklists, Headers, DNS).
21
+ - **Heavy (900s)**: Resource-intensive checks (Browser crawling, Visual regression).
22
+
23
+ ### Database: SQLite & SQLAlchemy
24
+ - **Why**: SQLite is a zero-config, serverless database ideal for self-hosted apps on platforms like Hugging Face Spaces. SQLAlchemy provides a robust ORM to handle complex queries for historical data analysis.
25
+
26
+ ### AI Engine: Anthropic Claude (API)
27
+ - **Why**: Superior reasoning capabilities for technical debugging.
28
+ - **Role**: Analyzes raw JSON results from all scanners and provides a human-readable "semantic" explanation of what exactly is failing (e.g., "The site is up, but your SSL certificate is mismatched for the WWW subdomain").
29
+
30
+ ## 3. Directory Structure
31
+
32
+ ```text
33
+ uptime/
34
+ β”œβ”€β”€ ai/ # AI Analysis logic (Anthropic integration)
35
+ β”œβ”€β”€ api/ # FastAPI routes and Pydantic schemas
36
+ β”œβ”€β”€ browser/ # Playwright-based browser automation (Crawler, Screenshots)
37
+ β”œβ”€β”€ checkers/ # Protocol-specific scanner modules (SSL, DNS, Port, etc.)
38
+ β”œβ”€β”€ config/ # Application settings and logging configuration
39
+ β”œβ”€β”€ frontend/ # Static HTML/JS for the dashboard
40
+ β”œβ”€β”€ scheduler/ # Background task management and monitoring cycles
41
+ β”œβ”€β”€ storage/ # Database models, migrations, and initialization
42
+ β”œβ”€β”€ app.py # Main entry point (FastAPI initialization)
43
+ └── Dockerfile # Containerization for Hugging Face Spaces
44
+ ```
45
+
46
+ ## 4. How It Works (The Lifecycle)
47
+
48
+ 1. **Request**: A user or the scheduler triggers a scan for a URL.
49
+ 2. **Orchestration**: The `runner.py` in the scheduler module picks up the request. It dynamically imports and executes the relevant modules from the `checkers/` and `browser/` directories.
50
+ 3. **Execution**: Scanners run in parallel using `asyncio.gather` to maximize performance.
51
+ 4. **Logging**: Every step is captured by the centralized logging system, providing a real-time audit trail in `webguard.log`.
52
+ 5. **Persistence**: Results are normalized and stored in the SQLite database.
53
+ 6. **AI Analysis**: If issues are detected, the raw data is sent to the AI Agent. The agent returns a root-cause summary.
54
+ 7. **Response**: The API returns the combined results, including the AI's semantic interpretation.
55
+
56
+ ## 5. Security & Deployment Standards
57
+ WebGuard is built with security-first principles:
58
+ - **Isolation**: Runs as a non-privileged user (UID 1000) inside Docker.
59
+ - **Data Privacy**: All monitoring data stays on your local storage; only sanitized scan results are sent to the optional AI provider.
60
+ - **Reliability**: Uses PhishTank and Google Safe Browsing to provide a "Shield" for your brand reputation.
api/routes.py CHANGED
@@ -1,16 +1,24 @@
1
  from fastapi import APIRouter, HTTPException
2
- from fastapi.responses import FileResponse
3
  from api.schemas import ScanRequest, ScanResponse
4
  from scheduler.runner import run_all_now, add_url, remove_url, _monitored_urls
5
- from storage.db import get_latest_results, get_all_urls, save_alert
6
- import os
 
 
 
 
 
 
 
7
 
 
8
  router = APIRouter()
9
 
10
-
11
- @router.post("/scan", response_model=ScanResponse)
12
- async def scan(req: ScanRequest):
13
  try:
 
14
  result = await run_all_now(req.url)
15
  if req.monitor:
16
  add_url(req.url)
@@ -20,8 +28,38 @@ async def scan(req: ScanRequest):
20
  results=result["results"],
21
  )
22
  except Exception as e:
 
23
  raise HTTPException(status_code=500, detail=str(e))
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  @router.get("/results/{url:path}")
27
  async def get_results(url: str, limit: int = 20):
 
1
  from fastapi import APIRouter, HTTPException
 
2
  from api.schemas import ScanRequest, ScanResponse
3
  from scheduler.runner import run_all_now, add_url, remove_url, _monitored_urls
4
+ from storage.db import get_latest_results, get_all_urls
5
+ import checkers.ssl_check as ssl_check
6
+ import checkers.dns_check as dns_check
7
+ import checkers.http_check as http_check
8
+ import checkers.ping_check as ping_check
9
+ import checkers.port_check as port_check
10
+ import checkers.blacklist_check as blacklist_check
11
+ import checkers.multi_scanner as multi_scanner
12
+ from config.logging import get_logger
13
 
14
+ logger = get_logger("api.routes")
15
  router = APIRouter()
16
 
17
+ @router.post("/scan/full", response_model=ScanResponse)
18
+ async def scan_full(req: ScanRequest):
19
+ """Run all available checks and return a unified report."""
20
  try:
21
+ logger.info(f"Full scan requested for: {req.url}")
22
  result = await run_all_now(req.url)
23
  if req.monitor:
24
  add_url(req.url)
 
28
  results=result["results"],
29
  )
30
  except Exception as e:
31
+ logger.error(f"Full scan failed: {e}")
32
  raise HTTPException(status_code=500, detail=str(e))
33
 
34
+ @router.post("/scan/ssl")
35
+ async def scan_ssl(req: ScanRequest):
36
+ return await ssl_check.run(req.url)
37
+
38
+ @router.post("/scan/dns")
39
+ async def scan_dns(req: ScanRequest):
40
+ return await dns_check.run(req.url)
41
+
42
+ @router.post("/scan/http")
43
+ async def scan_http(req: ScanRequest):
44
+ return await http_check.run(req.url)
45
+
46
+ @router.post("/scan/ping")
47
+ async def scan_ping(req: ScanRequest):
48
+ return await ping_check.run(req.url)
49
+
50
+ @router.post("/scan/port")
51
+ async def scan_port(req: ScanRequest):
52
+ return await port_check.run(req.url)
53
+
54
+ @router.post("/scan/blacklist")
55
+ async def scan_blacklist(req: ScanRequest):
56
+ return await blacklist_check.run(req.url)
57
+
58
+ @router.post("/scan", response_model=ScanResponse)
59
+ async def scan_legacy(req: ScanRequest):
60
+ # Keep original /scan for backward compatibility
61
+ return await scan_full(req)
62
+
63
 
64
  @router.get("/results/{url:path}")
65
  async def get_results(url: str, limit: int = 20):