LarsHoliday / DOCUMENTATION.md
phhttps
feat: enhance scraper reliability, observability and scheduling
5dc68a0
# Lars Urlaubs-Deals: Technical Documentation
## Overview
This project is an AI-powered vacation deal finder specializing in dog-friendly accommodations across multiple platforms (Airbnb, Booking.com).
It is designed for:
- resilient scraping under rate limiting,
- unified scoring and filtering,
- transparent run diagnostics via observability and health data,
- practical dashboard usability for day-to-day deal checks.
## Recent Improvements (2026-02)
The latest enhancement cycle added:
- **Run-level observability** with run IDs, KPI counters, and structured events.
- **Central deal validation** before ranking.
- **Robust price alerts** (dedupe, cooldown, per-deal threshold overrides).
- **CLI scheduler mode** for periodic background searches.
- **Dashboard UX upgrades** (client-side filters, better empty/error states, price trend chips).
- **Currency normalization to EUR** for fair cross-market ranking.
## Core Features
### 1. Smart Scrapers (Multi-Strategy)
Both Airbnb and Booking.com scrapers follow a tiered strategy:
- **Strategy 1: Local Curl/HTTP** (fastest, cheapest)
- **Strategy 2: Firecrawl Cloud** (reliable rendered fallback)
- **Strategy 3: Static fallback data** (keeps UI functional if everything else fails)
Strategy attempts are instrumented with source/strategy duration and success metrics.
### 2. Rate Limit Bypass
- **User-Agent rotation**
- **Adaptive delays** that increase under pressure
- **Exponential backoff** for repeated throttling
- **Optional session warming** for more realistic request patterns
### 3. Central Validation Pipeline
All raw deals are validated before ranking:
- required fields (name, location, source, url)
- numeric sanity checks (price/rating/reviews)
- budget boundaries
- pet-friendly enforcement when pets are requested
Validation output is returned in API/agent results (`valid_count`, `rejected_count`, reasons).
### 4. Observability & KPI Tracking
A lightweight observability layer tracks each search run:
- unique run ID
- lifecycle events (`run_started`, source cache hits/misses, errors, run_finished)
- per-run counters (cache hits, misses, valid deals, triggered alerts, etc.)
- run summaries retained for health diagnostics
`/health` includes an observability snapshot with active/recent runs.
### 5. Price Alert System
Price alerts are persisted and now include robustness controls:
- configurable drop threshold (global + per-deal override)
- dedupe window for repeated identical updates
- cooldown window to suppress duplicate alerts at same price
- capped history size per property
### 6. Intelligent Caching
- Local JSON cache (`.search_cache.json`) with TTL
- repeated searches with same parameters return quickly
- cache metrics included in observability
### 7. Deal Ranking and Currency Normalization
Deals are scored by price/rating/reviews with pet/weather multipliers.
All ranking prices are normalized to **EUR** using built-in FX rates (or optional custom per-deal `fx_rate_to_eur`), while preserving original currency/price in output metadata.
### 8. Dashboard UX
The web dashboard includes:
- source tabs and sorting modes,
- client-side filters (minimum rating, max EUR/night, pet-only),
- explicit empty-state messaging ("no source results" vs "filtered out"),
- improved fetch error handling,
- price trend badges when previous price context is available.
## Scheduler Mode (CLI)
You can run periodic searches from CLI:
```bash
python main.py \
--cities "Amsterdam,Rotterdam" \
--checkin 2026-03-01 \
--checkout 2026-03-05 \
--schedule-minutes 30 \
--max-runs 6
```
- `--schedule-minutes 0` keeps one-shot behavior (default).
- `--max-runs 0` means unlimited scheduled cycles.
## Tech Stack
- **Backend:** Python, FastAPI, Uvicorn
- **Scraping:** httpx, BeautifulSoup4, Firecrawl API
- **Frontend:** Responsive HTML/JS dashboard (Tailwind-style utility classes)
- **Persistence:** Local JSON files for cache and alerts
## Testing
Primary regression coverage for the new features includes:
- `tests/test_price_alerts.py` (dedupe/cooldown/override + agent integration)
- `tests/test_agent_validation.py` (pet filter + validation counters)
- `tests/test_currency_normalization.py` (EUR normalization + custom FX override)
- `tests/test_scheduler_cli.py` (scheduler CLI argument parsing)
- `tests/test_caching.py` (cache behavior still valid)
Example run:
```bash
PYTHONPATH=. pytest -q \
tests/test_price_alerts.py \
tests/test_agent_validation.py \
tests/test_currency_normalization.py \
tests/test_scheduler_cli.py
```
## Deployment Notes
- Local web mode: `uvicorn api:app --reload`
- Health check endpoint: `/health`
- Search endpoint: `/search`
## Repository / Distribution
- Main source repository and deployment references remain unchanged.