LarsHoliday / DOCUMENTATION.md
phhttps
feat: enhance scraper reliability, observability and scheduling
5dc68a0

Lars Urlaubs-Deals: Technical Documentation

Overview

This project is an AI-powered vacation deal finder specializing in dog-friendly accommodations across multiple platforms (Airbnb, Booking.com).

It is designed for:

  • resilient scraping under rate limiting,
  • unified scoring and filtering,
  • transparent run diagnostics via observability and health data,
  • practical dashboard usability for day-to-day deal checks.

Recent Improvements (2026-02)

The latest enhancement cycle added:

  • Run-level observability with run IDs, KPI counters, and structured events.
  • Central deal validation before ranking.
  • Robust price alerts (dedupe, cooldown, per-deal threshold overrides).
  • CLI scheduler mode for periodic background searches.
  • Dashboard UX upgrades (client-side filters, better empty/error states, price trend chips).
  • Currency normalization to EUR for fair cross-market ranking.

Core Features

1. Smart Scrapers (Multi-Strategy)

Both Airbnb and Booking.com scrapers follow a tiered strategy:

  • Strategy 1: Local Curl/HTTP (fastest, cheapest)
  • Strategy 2: Firecrawl Cloud (reliable rendered fallback)
  • Strategy 3: Static fallback data (keeps UI functional if everything else fails)

Strategy attempts are instrumented with source/strategy duration and success metrics.

2. Rate Limit Bypass

  • User-Agent rotation
  • Adaptive delays that increase under pressure
  • Exponential backoff for repeated throttling
  • Optional session warming for more realistic request patterns

3. Central Validation Pipeline

All raw deals are validated before ranking:

  • required fields (name, location, source, url)
  • numeric sanity checks (price/rating/reviews)
  • budget boundaries
  • pet-friendly enforcement when pets are requested

Validation output is returned in API/agent results (valid_count, rejected_count, reasons).

4. Observability & KPI Tracking

A lightweight observability layer tracks each search run:

  • unique run ID
  • lifecycle events (run_started, source cache hits/misses, errors, run_finished)
  • per-run counters (cache hits, misses, valid deals, triggered alerts, etc.)
  • run summaries retained for health diagnostics

/health includes an observability snapshot with active/recent runs.

5. Price Alert System

Price alerts are persisted and now include robustness controls:

  • configurable drop threshold (global + per-deal override)
  • dedupe window for repeated identical updates
  • cooldown window to suppress duplicate alerts at same price
  • capped history size per property

6. Intelligent Caching

  • Local JSON cache (.search_cache.json) with TTL
  • repeated searches with same parameters return quickly
  • cache metrics included in observability

7. Deal Ranking and Currency Normalization

Deals are scored by price/rating/reviews with pet/weather multipliers.

All ranking prices are normalized to EUR using built-in FX rates (or optional custom per-deal fx_rate_to_eur), while preserving original currency/price in output metadata.

8. Dashboard UX

The web dashboard includes:

  • source tabs and sorting modes,
  • client-side filters (minimum rating, max EUR/night, pet-only),
  • explicit empty-state messaging ("no source results" vs "filtered out"),
  • improved fetch error handling,
  • price trend badges when previous price context is available.

Scheduler Mode (CLI)

You can run periodic searches from CLI:

python main.py \
  --cities "Amsterdam,Rotterdam" \
  --checkin 2026-03-01 \
  --checkout 2026-03-05 \
  --schedule-minutes 30 \
  --max-runs 6
  • --schedule-minutes 0 keeps one-shot behavior (default).
  • --max-runs 0 means unlimited scheduled cycles.

Tech Stack

  • Backend: Python, FastAPI, Uvicorn
  • Scraping: httpx, BeautifulSoup4, Firecrawl API
  • Frontend: Responsive HTML/JS dashboard (Tailwind-style utility classes)
  • Persistence: Local JSON files for cache and alerts

Testing

Primary regression coverage for the new features includes:

  • tests/test_price_alerts.py (dedupe/cooldown/override + agent integration)
  • tests/test_agent_validation.py (pet filter + validation counters)
  • tests/test_currency_normalization.py (EUR normalization + custom FX override)
  • tests/test_scheduler_cli.py (scheduler CLI argument parsing)
  • tests/test_caching.py (cache behavior still valid)

Example run:

PYTHONPATH=. pytest -q \
  tests/test_price_alerts.py \
  tests/test_agent_validation.py \
  tests/test_currency_normalization.py \
  tests/test_scheduler_cli.py

Deployment Notes

  • Local web mode: uvicorn api:app --reload
  • Health check endpoint: /health
  • Search endpoint: /search

Repository / Distribution

  • Main source repository and deployment references remain unchanged.