LarsHoliday / IMPLEMENTATION_ROADMAP.md
phhttps
feat: enhance scraper reliability, observability and scheduling
5dc68a0

Implementation Roadmap: Stability, Observability, Alerts, UX

Goal

Implement the agreed improvement bundle end-to-end with measurable KPIs:

  • Better stability and transparency during scraping
  • Strict validation before ranking
  • Robust price alerts without spam
  • Optional periodic background searches
  • Better dashboard filtering and feedback states
  • Basic currency normalization foundation

Phase 1 — Observability + KPI baseline

Scope

  • Introduce a lightweight run tracker with run_id, counters, and structured events.
  • Add per-run KPI counters in search orchestration.
  • Expose observability snapshot in API health endpoint.
  • Record scraper strategy attempts (success/failure/duration/result_count).

Files

  • observability.py (new)
  • holland_agent.py
  • api.py
  • booking_scraper.py
  • patchright_airbnb_scraper.py

Success Criteria

  • Every search run has a run_id.
  • API /health includes recent run summary.
  • Strategy metrics in scraper_metrics.json are populated over time.

Phase 2 — Validation + ranking normalization

Scope

  • Central deal validation before ranking:
    • required fields (name, location, price_per_night, url, source)
    • strict numeric checks (price > 0, rating bounds, reviews >= 0)
    • pet filter enforcement if requested
  • Keep invalid deals with reason stats for diagnostics.
  • Add basic currency normalization (EUR base) in ranking.

Files

  • holland_agent.py
  • deal_ranker.py

Success Criteria

  • Invalid deals are excluded with reason counters.
  • Ranker can normalize known currencies (EUR, USD, GBP) into EUR.

Phase 3 — Price alerts hardening

Scope

  • Add cooldown-based dedupe for repeated alerts.
  • Add per-property threshold overrides.
  • Keep alert metadata to avoid duplicate spam.

Files

  • rate_limit_bypass.py
  • holland_agent.py (pass threshold/cooldown config hooks)

Success Criteria

  • Repeated identical drops in short window do not trigger duplicate alerts.
  • Alerts still trigger for meaningful new drops.

Phase 4 — Scheduler (CLI)

Scope

  • Add CLI scheduling mode:
    • run search periodically every N minutes
    • optional fixed number of runs
  • Keep one-shot behavior as default.

Files

  • main.py

Success Criteria

  • main.py can execute periodic runs without external cron.

Phase 5 — Dashboard UX upgrades

Scope

  • Add client-side filters:
    • max nightly price
    • minimum rating
    • pet-friendly-only toggle
  • Add clear filter action and live result status text.
  • Improve error/empty communication and add price trend badge when available.

Files

  • frontend_dashboard.html

Success Criteria

  • Users can narrow results without new API call.
  • Filtered counts are visible and states are understandable.

Phase 6 — Test & docs

Scope

  • Update/add tests for validation, cache behavior, alert dedupe/cooldown, and scheduler flow.
  • Document changes and operations guide.

Files

  • tests/test_agent_validation.py
  • tests/test_price_alerts.py
  • tests/test_caching.py
  • DOCUMENTATION.md

Success Criteria

  • Core new behavior covered by tests.
  • Operational usage documented.

Dependencies and Order

  1. Phase 1 (foundation)
  2. Phase 2 (data quality + ranking)
  3. Phase 3 (alerts)
  4. Phase 4 (scheduler)
  5. Phase 5 (UX)
  6. Phase 6 (tests/docs)

KPI Targets

  • 429 incident rate trend visible in health metrics
  • Cache hit rate tracked per run
  • Validation rejection reasons available for troubleshooting
  • Price alert duplicates reduced via cooldown/dedupe
  • Dashboard usability improved via local filtering and clearer states