Spaces:

jofaichow
/

roamify

Running

jofaichow commited on 17 days ago

Commit

e65d7f0

1 Parent(s): 8521e5c

v0.1.16 — Prewarm all 12 remaining cities (61/61 now cached)

- Prewarmed: Montreal, Moscow, Oslo, Reykjavik, Santiago, Stockholm,
Tel Aviv, Toronto, Vancouver, Venice, Warsaw, Washington
- 67/84 combos succeeded, LLM cache 344→411 entries
- 17 None combos not cached (v0.1.15 guard) — retry on next search
- New scripts/prewarm_12_remaining.py — targeted prewarm script
- README: city count 49→61, cache sizes updated

Files changed (5) hide show

.geocode_cache.json +0 -0
.image_cache.json +0 -0
.llm_cache.json +0 -0
README.md +10 -9
scripts/prewarm_12_remaining.py +134 -0

.geocode_cache.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

.image_cache.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

.llm_cache.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

README.md CHANGED Viewed

@@ -82,14 +82,14 @@ A provider is skipped if its API key is empty. Just set `OPENROUTER_API_KEY` and
 ## Features
-- **49 cities** across Asia, Europe, Africa, Americas & Oceania
 - **7 travel categories**: Landmark, Culture, Nature, Gems, Photo, Food, Shopping
 - **AI-generated recommendations** with descriptions, tips, and coordinates
 - **5-tier image fallback + emoji**: Wikipedia → Wikidata → Commons → Local name → Unsplash → emoji (🏛️)
 - **Real coordinates** from Nominatim geocoding with LLM-coord fast-path
 - **Leaflet map** with spider markers, card↔map hover sync
 - **Multi-language translation**: Traditional Chinese, Japanese, Korean, French, Spanish, German
-- **Japanese & Traditional Chinese pre-warmed** — 49 cities × 7 categories translated upfront
 - **Disk-persisted caches** — repeat searches are instant, survive restarts
 - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
 - **Dark Cyborg theme** with large fonts
@@ -127,7 +127,7 @@ python scripts/warmup.py -c "Hong Kong" -c Singapore
 python scripts/warmup.py --fix
 ```
-Generates up to 322 city × category combos (6,100+ items across 4 caches).
 Resumable — interrupted runs pick up where they left off.
 `scripts/prewarm_remaining.py` targets remaining uncached cities — useful
@@ -149,7 +149,7 @@ python scripts/prewarm_translations.py --lang Japanese --force
 python scripts/prewarm_translations.py --lang Korean --lang French
 ```
-~344 LLM cache entries × 2 languages = ~688 translation calls. Each translates
 all 19 items in a single LLM call. Takes ~2-4 hours to complete.
 ## Project Structure
@@ -168,14 +168,15 @@ roamify/
 │   ├── warmup.py                  # Full 28-city unified warmup (LLM + images + geocode)
 │   ├── prewarm_translations.py    # Translation pre-warm (JA, TC, etc.)
 │   ├── prewarm_remaining.py       # Prewarm remaining uncached cities
 │   ├── check_cache.py             # Cache health check & repair
 │   ├── fix_images.py              # Parallel image enrichment pass
 │   └── clear_poor_entries.py      # Clear cache for re-warmup
 ├── .streamlit/
 │   └── config.toml              # Streamlit server and theme config
-├── .llm_cache.json              # Disk-persisted recommendation cache (~2.7MB)
-├── .image_cache.json            # Disk-persisted image URL cache (~900KB)
-├── .geocode_cache.json          # Disk-persisted geocoding cache (~500KB)
 ├── .translation_cache.json      # Disk-persisted translation cache (~7.3MB)
 ├── Dockerfile                   # HF Spaces deployment
 ├── requirements.txt
@@ -191,8 +192,8 @@ roamify/
 5. Set secrets in HF Space Settings (same keys as your `.env`)
 Large cache files are normal — they're JSON and compress well in git.
-`.llm_cache.json` is typically ~2.7MB, translation cache ~7.3MB,
-images cache is URL-only (~900KB).
 ## License

 ## Features
+- **61 cities** across Asia, Europe, Africa, Americas & Oceania
 - **7 travel categories**: Landmark, Culture, Nature, Gems, Photo, Food, Shopping
 - **AI-generated recommendations** with descriptions, tips, and coordinates
 - **5-tier image fallback + emoji**: Wikipedia → Wikidata → Commons → Local name → Unsplash → emoji (🏛️)
 - **Real coordinates** from Nominatim geocoding with LLM-coord fast-path
 - **Leaflet map** with spider markers, card↔map hover sync
 - **Multi-language translation**: Traditional Chinese, Japanese, Korean, French, Spanish, German
+- **Japanese & Traditional Chinese pre-warmed** — 61 cities × 7 categories translated upfront
 - **Disk-persisted caches** — repeat searches are instant, survive restarts
 - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
 - **Dark Cyborg theme** with large fonts
 python scripts/warmup.py --fix
 ```
+Generates up to 427 city × category combos (8,100+ items across 4 caches).
 Resumable — interrupted runs pick up where they left off.
 `scripts/prewarm_remaining.py` targets remaining uncached cities — useful
 python scripts/prewarm_translations.py --lang Korean --lang French
 ```
+~411 LLM cache entries × 2 languages = ~822 translation calls. Each translates
 all 19 items in a single LLM call. Takes ~2-4 hours to complete.
 ## Project Structure
 │   ├── warmup.py                  # Full 28-city unified warmup (LLM + images + geocode)
 │   ├── prewarm_translations.py    # Translation pre-warm (JA, TC, etc.)
 │   ├── prewarm_remaining.py       # Prewarm remaining uncached cities
+│   ├── prewarm_12_remaining.py    # Targeted prewarm for specific city list
 │   ├── check_cache.py             # Cache health check & repair
 │   ├── fix_images.py              # Parallel image enrichment pass
 │   └── clear_poor_entries.py      # Clear cache for re-warmup
 ├── .streamlit/
 │   └── config.toml              # Streamlit server and theme config
+├── .llm_cache.json              # Disk-persisted recommendation cache (~3.3MB)
+├── .image_cache.json            # Disk-persisted image URL cache (~1.1MB)
+├── .geocode_cache.json          # Disk-persisted geocoding cache (~560KB)
 ├── .translation_cache.json      # Disk-persisted translation cache (~7.3MB)
 ├── Dockerfile                   # HF Spaces deployment
 ├── requirements.txt
 5. Set secrets in HF Space Settings (same keys as your `.env`)
 Large cache files are normal — they're JSON and compress well in git.
+`.llm_cache.json` is typically ~3.3MB, translation cache ~7.3MB,
+images cache is URL-only (~1.1MB).
 ## License

scripts/prewarm_12_remaining.py ADDED Viewed

	@@ -0,0 +1,134 @@

+#!/usr/bin/env python3
+"""Pre-warm LLM cache for the 12 remaining uncached cities.
+12 cities × 7 categories = 84 combos. Runs with 2 concurrent workers.
+Saves incrementally via get_recommendations_cached(). Reports progress
+to stdout which gets captured by the background process.
+Usage:
+    cd roamify && python scripts/prewarm_12_remaining.py
+"""
+import json
+import os
+import random
+import sys
+import threading
+import time
+from concurrent.futures import ThreadPoolExecutor, as_completed
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "src"))
+from dotenv import load_dotenv
+load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), "..", ".env"), override=True)
+from services.recommender import (
+    get_recommendations_cached,
+    _LLM_CACHE,
+    _save_llm_cache,
+    _save_image_cache,
+    _save_geocode_cache,
+)
+CATEGORY_NAMES = ["Landmark", "Culture", "Nature", "Gems", "Photo", "Food", "Shopping"]
+REMAINING_CITIES = [
+    "Montreal", "Moscow", "Oslo", "Reykjavik", "Santiago",
+    "Stockholm", "Tel Aviv", "Toronto", "Vancouver", "Venice",
+    "Warsaw", "Washington",
+]
+_COMBO_STATS = {"success": 0, "skipped": 0, "fail": 0, "total": 0}
+_COMBO_LOCK = threading.Lock()
+def process_combo(city: str, cat_name: str, combo_idx: int, total: int) -> None:
+    """Process a single city/category combo and update stats."""
+    categories = {name: (name == cat_name) for name in CATEGORY_NAMES}
+    cat_hash = json.dumps(categories, sort_keys=True)
+    if (city, cat_hash) in _LLM_CACHE:
+        with _COMBO_LOCK:
+            _COMBO_STATS["skipped"] += 1
+        print(f"  [{combo_idx:>3}/{total}] ⏭️  {city} / {cat_name} — already cached", flush=True)
+        return
+    print(f"  [{combo_idx:>3}/{total}] 🔍 {city} / {cat_name}...", end=" ", flush=True)
+    start = time.time()
+    try:
+        result = get_recommendations_cached(
+            city=city,
+            num_attractions=6,
+            categories=categories,
+            temperature=0,
+        )
+        elapsed = time.time() - start
+        if result:
+            items = len(result)
+            with _COMBO_LOCK:
+                _COMBO_STATS["success"] += 1
+            print(f"✅ {items} items in {elapsed:.1f}s", flush=True)
+        else:
+            with _COMBO_LOCK:
+                _COMBO_STATS["fail"] += 1
+            print(f"❌ returned None in {elapsed:.1f}s", flush=True)
+    except Exception as e:
+        elapsed = time.time() - start
+        with _COMBO_LOCK:
+            _COMBO_STATS["fail"] += 1
+        print(f"❌ error after {elapsed:.1f}s: {e}", flush=True)
+def prewarm():
+    """Run all combos concurrently with 2 workers."""
+    total_combos = len(REMAINING_CITIES) * len(CATEGORY_NAMES)
+    _COMBO_STATS["total"] = total_combos
+    llm_before = len(_LLM_CACHE)
+    print(f"Pre-warming caches: {len(REMAINING_CITIES)} cities × {len(CATEGORY_NAMES)} categories = {total_combos} combos")
+    print(f"  Workers: 2 (concurrent) — each uses random DeepSeek provider")
+    print(f"  Existing LLM cache entries: {llm_before}")
+    print()
+    # Build all combos, shuffle for load distribution across workers
+    all_combos = []
+    idx = 0
+    for city in REMAINING_CITIES:
+        for cat_name in CATEGORY_NAMES:
+            idx += 1
+            all_combos.append((city, cat_name, idx))
+    random.shuffle(all_combos)
+    # Re-assign sequential indices after shuffle (for display only)
+    for i, (city, cat_name, _) in enumerate(all_combos):
+        all_combos[i] = (city, cat_name, i + 1)
+    with ThreadPoolExecutor(max_workers=2) as pool:
+        futures = [
+            pool.submit(process_combo, city, cat_name, idx, total_combos)
+            for city, cat_name, idx in all_combos
+        ]
+        for future in as_completed(futures):
+            try:
+                future.result()
+            except Exception:
+                pass
+    llm_new = len(_LLM_CACHE) - llm_before
+    print()
+    print("═" * 55)
+    print("Pre-warm complete!")
+    print(f"  Combos: {_COMBO_STATS['success']} succeeded, {_COMBO_STATS['skipped']} skipped, {_COMBO_STATS['fail']} failed")
+    print(f"  New LLM cache entries: {llm_new} (total: {len(_LLM_CACHE)})")
+    _save_llm_cache()
+    _save_image_cache()
+    _save_geocode_cache()
+    print()
+    print("All caches saved to disk ✅")
+if __name__ == "__main__":
+    prewarm()