Spaces:

PHhTTPS
/

LarsHoliday

Sleeping

App Files Files Community

LarsHoliday / IMPLEMENTATION_ROADMAP.md

phhttps

feat: enhance scraper reliability, observability and scheduling

5dc68a0 3 months ago

|

history blame contribute delete

3.68 kB

Implementation Roadmap: Stability, Observability, Alerts, UX

Goal

Implement the agreed improvement bundle end-to-end with measurable KPIs:

Better stability and transparency during scraping
Strict validation before ranking
Robust price alerts without spam
Optional periodic background searches
Better dashboard filtering and feedback states
Basic currency normalization foundation

Phase 1 — Observability + KPI baseline

Scope

Introduce a lightweight run tracker with run_id, counters, and structured events.
Add per-run KPI counters in search orchestration.
Expose observability snapshot in API health endpoint.
Record scraper strategy attempts (success/failure/duration/result_count).

Files

observability.py (new)
holland_agent.py
api.py
booking_scraper.py
patchright_airbnb_scraper.py

Success Criteria

Every search run has a run_id.
API /health includes recent run summary.
Strategy metrics in scraper_metrics.json are populated over time.

Phase 2 — Validation + ranking normalization

Scope

Central deal validation before ranking:
- required fields (name, location, price_per_night, url, source)
- strict numeric checks (price > 0, rating bounds, reviews >= 0)
- pet filter enforcement if requested
Keep invalid deals with reason stats for diagnostics.
Add basic currency normalization (EUR base) in ranking.

Files

holland_agent.py
deal_ranker.py

Success Criteria

Invalid deals are excluded with reason counters.
Ranker can normalize known currencies (EUR, USD, GBP) into EUR.

Phase 3 — Price alerts hardening

Scope

Add cooldown-based dedupe for repeated alerts.
Add per-property threshold overrides.
Keep alert metadata to avoid duplicate spam.

Files

rate_limit_bypass.py
holland_agent.py (pass threshold/cooldown config hooks)

Success Criteria

Repeated identical drops in short window do not trigger duplicate alerts.
Alerts still trigger for meaningful new drops.

Phase 4 — Scheduler (CLI)

Scope

Add CLI scheduling mode:
- run search periodically every N minutes
- optional fixed number of runs
Keep one-shot behavior as default.

Files

main.py

Success Criteria

main.py can execute periodic runs without external cron.

Phase 5 — Dashboard UX upgrades

Scope

Add client-side filters:
- max nightly price
- minimum rating
- pet-friendly-only toggle
Add clear filter action and live result status text.
Improve error/empty communication and add price trend badge when available.

Files

frontend_dashboard.html

Success Criteria

Users can narrow results without new API call.
Filtered counts are visible and states are understandable.

Phase 6 — Test & docs

Scope

Update/add tests for validation, cache behavior, alert dedupe/cooldown, and scheduler flow.
Document changes and operations guide.

Files

tests/test_agent_validation.py
tests/test_price_alerts.py
tests/test_caching.py
DOCUMENTATION.md

Success Criteria

Core new behavior covered by tests.
Operational usage documented.

Dependencies and Order

Phase 1 (foundation)
Phase 2 (data quality + ranking)
Phase 3 (alerts)
Phase 4 (scheduler)
Phase 5 (UX)
Phase 6 (tests/docs)

KPI Targets

429 incident rate trend visible in health metrics
Cache hit rate tracked per run
Validation rejection reasons available for troubleshooting
Price alert duplicates reduced via cooldown/dedupe
Dashboard usability improved via local filtering and clearer states