jofaichow commited on
Commit
e65d7f0
·
1 Parent(s): 8521e5c

v0.1.16 — Prewarm all 12 remaining cities (61/61 now cached)

Browse files

- Prewarmed: Montreal, Moscow, Oslo, Reykjavik, Santiago, Stockholm,
Tel Aviv, Toronto, Vancouver, Venice, Warsaw, Washington
- 67/84 combos succeeded, LLM cache 344→411 entries
- 17 None combos not cached (v0.1.15 guard) — retry on next search
- New scripts/prewarm_12_remaining.py — targeted prewarm script
- README: city count 49→61, cache sizes updated

.geocode_cache.json CHANGED
The diff for this file is too large to render. See raw diff
 
.image_cache.json CHANGED
The diff for this file is too large to render. See raw diff
 
.llm_cache.json CHANGED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -82,14 +82,14 @@ A provider is skipped if its API key is empty. Just set `OPENROUTER_API_KEY` and
82
 
83
  ## Features
84
 
85
- - **49 cities** across Asia, Europe, Africa, Americas & Oceania
86
  - **7 travel categories**: Landmark, Culture, Nature, Gems, Photo, Food, Shopping
87
  - **AI-generated recommendations** with descriptions, tips, and coordinates
88
  - **5-tier image fallback + emoji**: Wikipedia → Wikidata → Commons → Local name → Unsplash → emoji (🏛️)
89
  - **Real coordinates** from Nominatim geocoding with LLM-coord fast-path
90
  - **Leaflet map** with spider markers, card↔map hover sync
91
  - **Multi-language translation**: Traditional Chinese, Japanese, Korean, French, Spanish, German
92
- - **Japanese & Traditional Chinese pre-warmed** — 49 cities × 7 categories translated upfront
93
  - **Disk-persisted caches** — repeat searches are instant, survive restarts
94
  - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
95
  - **Dark Cyborg theme** with large fonts
@@ -127,7 +127,7 @@ python scripts/warmup.py -c "Hong Kong" -c Singapore
127
  python scripts/warmup.py --fix
128
  ```
129
 
130
- Generates up to 322 city × category combos (6,100+ items across 4 caches).
131
  Resumable — interrupted runs pick up where they left off.
132
 
133
  `scripts/prewarm_remaining.py` targets remaining uncached cities — useful
@@ -149,7 +149,7 @@ python scripts/prewarm_translations.py --lang Japanese --force
149
  python scripts/prewarm_translations.py --lang Korean --lang French
150
  ```
151
 
152
- ~344 LLM cache entries × 2 languages = ~688 translation calls. Each translates
153
  all 19 items in a single LLM call. Takes ~2-4 hours to complete.
154
 
155
  ## Project Structure
@@ -168,14 +168,15 @@ roamify/
168
  │ ├── warmup.py # Full 28-city unified warmup (LLM + images + geocode)
169
  │ ├── prewarm_translations.py # Translation pre-warm (JA, TC, etc.)
170
  │ ├── prewarm_remaining.py # Prewarm remaining uncached cities
 
171
  │ ├── check_cache.py # Cache health check & repair
172
  │ ├── fix_images.py # Parallel image enrichment pass
173
  │ └── clear_poor_entries.py # Clear cache for re-warmup
174
  ├── .streamlit/
175
  │ └── config.toml # Streamlit server and theme config
176
- ├── .llm_cache.json # Disk-persisted recommendation cache (~2.7MB)
177
- ├── .image_cache.json # Disk-persisted image URL cache (~900KB)
178
- ├── .geocode_cache.json # Disk-persisted geocoding cache (~500KB)
179
  ├── .translation_cache.json # Disk-persisted translation cache (~7.3MB)
180
  ├── Dockerfile # HF Spaces deployment
181
  ├── requirements.txt
@@ -191,8 +192,8 @@ roamify/
191
  5. Set secrets in HF Space Settings (same keys as your `.env`)
192
 
193
  Large cache files are normal — they're JSON and compress well in git.
194
- `.llm_cache.json` is typically ~2.7MB, translation cache ~7.3MB,
195
- images cache is URL-only (~900KB).
196
 
197
  ## License
198
 
 
82
 
83
  ## Features
84
 
85
+ - **61 cities** across Asia, Europe, Africa, Americas & Oceania
86
  - **7 travel categories**: Landmark, Culture, Nature, Gems, Photo, Food, Shopping
87
  - **AI-generated recommendations** with descriptions, tips, and coordinates
88
  - **5-tier image fallback + emoji**: Wikipedia → Wikidata → Commons → Local name → Unsplash → emoji (🏛️)
89
  - **Real coordinates** from Nominatim geocoding with LLM-coord fast-path
90
  - **Leaflet map** with spider markers, card↔map hover sync
91
  - **Multi-language translation**: Traditional Chinese, Japanese, Korean, French, Spanish, German
92
+ - **Japanese & Traditional Chinese pre-warmed** — 61 cities × 7 categories translated upfront
93
  - **Disk-persisted caches** — repeat searches are instant, survive restarts
94
  - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
95
  - **Dark Cyborg theme** with large fonts
 
127
  python scripts/warmup.py --fix
128
  ```
129
 
130
+ Generates up to 427 city × category combos (8,100+ items across 4 caches).
131
  Resumable — interrupted runs pick up where they left off.
132
 
133
  `scripts/prewarm_remaining.py` targets remaining uncached cities — useful
 
149
  python scripts/prewarm_translations.py --lang Korean --lang French
150
  ```
151
 
152
+ ~411 LLM cache entries × 2 languages = ~822 translation calls. Each translates
153
  all 19 items in a single LLM call. Takes ~2-4 hours to complete.
154
 
155
  ## Project Structure
 
168
  │ ├── warmup.py # Full 28-city unified warmup (LLM + images + geocode)
169
  │ ├── prewarm_translations.py # Translation pre-warm (JA, TC, etc.)
170
  │ ├── prewarm_remaining.py # Prewarm remaining uncached cities
171
+ │ ├── prewarm_12_remaining.py # Targeted prewarm for specific city list
172
  │ ├── check_cache.py # Cache health check & repair
173
  │ ├── fix_images.py # Parallel image enrichment pass
174
  │ └── clear_poor_entries.py # Clear cache for re-warmup
175
  ├── .streamlit/
176
  │ └── config.toml # Streamlit server and theme config
177
+ ├── .llm_cache.json # Disk-persisted recommendation cache (~3.3MB)
178
+ ├── .image_cache.json # Disk-persisted image URL cache (~1.1MB)
179
+ ├── .geocode_cache.json # Disk-persisted geocoding cache (~560KB)
180
  ├── .translation_cache.json # Disk-persisted translation cache (~7.3MB)
181
  ├── Dockerfile # HF Spaces deployment
182
  ├── requirements.txt
 
192
  5. Set secrets in HF Space Settings (same keys as your `.env`)
193
 
194
  Large cache files are normal — they're JSON and compress well in git.
195
+ `.llm_cache.json` is typically ~3.3MB, translation cache ~7.3MB,
196
+ images cache is URL-only (~1.1MB).
197
 
198
  ## License
199
 
scripts/prewarm_12_remaining.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """Pre-warm LLM cache for the 12 remaining uncached cities.
3
+
4
+ 12 cities × 7 categories = 84 combos. Runs with 2 concurrent workers.
5
+ Saves incrementally via get_recommendations_cached(). Reports progress
6
+ to stdout which gets captured by the background process.
7
+
8
+ Usage:
9
+ cd roamify && python scripts/prewarm_12_remaining.py
10
+ """
11
+
12
+ import json
13
+ import os
14
+ import random
15
+ import sys
16
+ import threading
17
+ import time
18
+ from concurrent.futures import ThreadPoolExecutor, as_completed
19
+
20
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "src"))
21
+
22
+ from dotenv import load_dotenv
23
+ load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), "..", ".env"), override=True)
24
+
25
+ from services.recommender import (
26
+ get_recommendations_cached,
27
+ _LLM_CACHE,
28
+ _save_llm_cache,
29
+ _save_image_cache,
30
+ _save_geocode_cache,
31
+ )
32
+
33
+ CATEGORY_NAMES = ["Landmark", "Culture", "Nature", "Gems", "Photo", "Food", "Shopping"]
34
+
35
+ REMAINING_CITIES = [
36
+ "Montreal", "Moscow", "Oslo", "Reykjavik", "Santiago",
37
+ "Stockholm", "Tel Aviv", "Toronto", "Vancouver", "Venice",
38
+ "Warsaw", "Washington",
39
+ ]
40
+
41
+ _COMBO_STATS = {"success": 0, "skipped": 0, "fail": 0, "total": 0}
42
+ _COMBO_LOCK = threading.Lock()
43
+
44
+
45
+ def process_combo(city: str, cat_name: str, combo_idx: int, total: int) -> None:
46
+ """Process a single city/category combo and update stats."""
47
+ categories = {name: (name == cat_name) for name in CATEGORY_NAMES}
48
+ cat_hash = json.dumps(categories, sort_keys=True)
49
+
50
+ if (city, cat_hash) in _LLM_CACHE:
51
+ with _COMBO_LOCK:
52
+ _COMBO_STATS["skipped"] += 1
53
+ print(f" [{combo_idx:>3}/{total}] ⏭️ {city} / {cat_name} — already cached", flush=True)
54
+ return
55
+
56
+ print(f" [{combo_idx:>3}/{total}] 🔍 {city} / {cat_name}...", end=" ", flush=True)
57
+ start = time.time()
58
+ try:
59
+ result = get_recommendations_cached(
60
+ city=city,
61
+ num_attractions=6,
62
+ categories=categories,
63
+ temperature=0,
64
+ )
65
+ elapsed = time.time() - start
66
+ if result:
67
+ items = len(result)
68
+ with _COMBO_LOCK:
69
+ _COMBO_STATS["success"] += 1
70
+ print(f"✅ {items} items in {elapsed:.1f}s", flush=True)
71
+ else:
72
+ with _COMBO_LOCK:
73
+ _COMBO_STATS["fail"] += 1
74
+ print(f"❌ returned None in {elapsed:.1f}s", flush=True)
75
+ except Exception as e:
76
+ elapsed = time.time() - start
77
+ with _COMBO_LOCK:
78
+ _COMBO_STATS["fail"] += 1
79
+ print(f"❌ error after {elapsed:.1f}s: {e}", flush=True)
80
+
81
+
82
+ def prewarm():
83
+ """Run all combos concurrently with 2 workers."""
84
+ total_combos = len(REMAINING_CITIES) * len(CATEGORY_NAMES)
85
+ _COMBO_STATS["total"] = total_combos
86
+
87
+ llm_before = len(_LLM_CACHE)
88
+
89
+ print(f"Pre-warming caches: {len(REMAINING_CITIES)} cities × {len(CATEGORY_NAMES)} categories = {total_combos} combos")
90
+ print(f" Workers: 2 (concurrent) — each uses random DeepSeek provider")
91
+ print(f" Existing LLM cache entries: {llm_before}")
92
+ print()
93
+
94
+ # Build all combos, shuffle for load distribution across workers
95
+ all_combos = []
96
+ idx = 0
97
+ for city in REMAINING_CITIES:
98
+ for cat_name in CATEGORY_NAMES:
99
+ idx += 1
100
+ all_combos.append((city, cat_name, idx))
101
+
102
+ random.shuffle(all_combos)
103
+ # Re-assign sequential indices after shuffle (for display only)
104
+ for i, (city, cat_name, _) in enumerate(all_combos):
105
+ all_combos[i] = (city, cat_name, i + 1)
106
+
107
+ with ThreadPoolExecutor(max_workers=2) as pool:
108
+ futures = [
109
+ pool.submit(process_combo, city, cat_name, idx, total_combos)
110
+ for city, cat_name, idx in all_combos
111
+ ]
112
+ for future in as_completed(futures):
113
+ try:
114
+ future.result()
115
+ except Exception:
116
+ pass
117
+
118
+ llm_new = len(_LLM_CACHE) - llm_before
119
+
120
+ print()
121
+ print("═" * 55)
122
+ print("Pre-warm complete!")
123
+ print(f" Combos: {_COMBO_STATS['success']} succeeded, {_COMBO_STATS['skipped']} skipped, {_COMBO_STATS['fail']} failed")
124
+ print(f" New LLM cache entries: {llm_new} (total: {len(_LLM_CACHE)})")
125
+
126
+ _save_llm_cache()
127
+ _save_image_cache()
128
+ _save_geocode_cache()
129
+ print()
130
+ print("All caches saved to disk ✅")
131
+
132
+
133
+ if __name__ == "__main__":
134
+ prewarm()