jofaichow commited on
Commit
8521e5c
·
1 Parent(s): a004ebb

v0.1.15 — Cache cleanup: 11 None entries purged across 6 cities

Browse files

- Removed 11 null cache entries from .llm_cache.json (Athens, Bali, LA, Madrid, Milan, Shanghai)
- Hardened _geocode_city() to disambiguate non-city results (e.g. Athens, GA vs Athens, GR)
- None results no longer cached — failed searches retry fresh on next request
- Timeout reduced 120s→30s for faster provider fallback
- Removed extra_body={'think': False} for Ollama Cloud (caused hangs)
- All 42 re-warmed combos verified with full recommendations
- README cache sizes updated; progress log v0.1.15 added

.geocode_cache.json CHANGED
The diff for this file is too large to render. See raw diff
 
.image_cache.json CHANGED
The diff for this file is too large to render. See raw diff
 
.llm_cache.json CHANGED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -45,7 +45,9 @@ The app uses a fallback chain of LLM providers. It tries each in order until one
45
  | 1 (primary) | OpenRouter | `deepseek/deepseek-v4-flash:free` | `OPENROUTER_API_KEY` | ✅ Highly recommended |
46
  | 2 (fallback) | Ollama Cloud | `deepseek-v4-flash:cloud` | `OLLAMA_API_KEY` | Optional |
47
  | 3 (fallback) | OpenRouter (Gemma) | `google/gemma-4-26b-a4b-it:free` | (uses same `OPENROUTER_API_KEY`) | Optional |
48
- | 4 (last resort) | Gemini | `gemini-2.5-flash` | `GEMINI_API_KEY` | Optional |
 
 
49
 
50
  All providers use OpenAI-compatible API endpoints. Temperature is configurable:
51
  - **Search** → temperature=0 (deterministic, cached results)
@@ -91,7 +93,7 @@ A provider is skipped if its API key is empty. Just set `OPENROUTER_API_KEY` and
91
  - **Disk-persisted caches** — repeat searches are instant, survive restarts
92
  - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
93
  - **Dark Cyborg theme** with large fonts
94
- - **Responsive 4-row stacking** — search controls auto-stack when category pills need 2+ rows, content-aware JS detects exact wrap point
95
 
96
  ## Caches
97
 
@@ -171,10 +173,10 @@ roamify/
171
  │ └── clear_poor_entries.py # Clear cache for re-warmup
172
  ├── .streamlit/
173
  │ └── config.toml # Streamlit server and theme config
174
- ├── .llm_cache.json # Disk-persisted recommendation cache (~2.6MB)
175
- ├── .image_cache.json # Disk-persisted image URL cache (~850KB)
176
- ├── .geocode_cache.json # Disk-persisted geocoding cache (~460KB)
177
- ├── .translation_cache.json # Disk-persisted translation cache (~6.9MB)
178
  ├── Dockerfile # HF Spaces deployment
179
  ├── requirements.txt
180
  └── README.md
@@ -189,8 +191,8 @@ roamify/
189
  5. Set secrets in HF Space Settings (same keys as your `.env`)
190
 
191
  Large cache files are normal — they're JSON and compress well in git.
192
- `.llm_cache.json` is typically ~800KB-1.6MB, translation cache ~220KB,
193
- images cache is URL-only (~200KB-350KB).
194
 
195
  ## License
196
 
 
45
  | 1 (primary) | OpenRouter | `deepseek/deepseek-v4-flash:free` | `OPENROUTER_API_KEY` | ✅ Highly recommended |
46
  | 2 (fallback) | Ollama Cloud | `deepseek-v4-flash:cloud` | `OLLAMA_API_KEY` | Optional |
47
  | 3 (fallback) | OpenRouter (Gemma) | `google/gemma-4-26b-a4b-it:free` | (uses same `OPENROUTER_API_KEY`) | Optional |
48
+ | 4 (last resort) | Gemini | `gemini-2.5-flash` | `GEMINI_API_KEY` | Optional (free quota may be exhausted) |
49
+
50
+ > **Note:** Ollama Cloud requires an up-to-date `certifi` CA bundle. If the Python OpenAI client times out against ollama.com, run `pip install --upgrade certifi`.
51
 
52
  All providers use OpenAI-compatible API endpoints. Temperature is configurable:
53
  - **Search** → temperature=0 (deterministic, cached results)
 
93
  - **Disk-persisted caches** — repeat searches are instant, survive restarts
94
  - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
95
  - **Dark Cyborg theme** with large fonts
96
+ - **Responsive 4-row stacking** — search controls auto-stack into rows when viewport is narrower than 50% of screen width, content-aware JS detects exact wrap point
97
 
98
  ## Caches
99
 
 
173
  │ └── clear_poor_entries.py # Clear cache for re-warmup
174
  ├── .streamlit/
175
  │ └── config.toml # Streamlit server and theme config
176
+ ├── .llm_cache.json # Disk-persisted recommendation cache (~2.7MB)
177
+ ├── .image_cache.json # Disk-persisted image URL cache (~900KB)
178
+ ├── .geocode_cache.json # Disk-persisted geocoding cache (~500KB)
179
+ ├── .translation_cache.json # Disk-persisted translation cache (~7.3MB)
180
  ├── Dockerfile # HF Spaces deployment
181
  ├── requirements.txt
182
  └── README.md
 
191
  5. Set secrets in HF Space Settings (same keys as your `.env`)
192
 
193
  Large cache files are normal — they're JSON and compress well in git.
194
+ `.llm_cache.json` is typically ~2.7MB, translation cache ~7.3MB,
195
+ images cache is URL-only (~900KB).
196
 
197
  ## License
198
 
src/services/recommender.py CHANGED
@@ -667,9 +667,26 @@ def _nominatim_search_cached(query: str, timeout: int = 10) -> tuple[dict | None
667
 
668
  def _geocode_city(city: str) -> tuple[float, float, list[float]] | None:
669
  """Geocode a city center via Nominatim (cached). Returns (lat, lon, boundingbox) or None."""
670
- result, _ = _nominatim_search_cached(city)
671
  if not result:
672
  return None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
673
  try:
674
  lat = float(result["lat"])
675
  lon = float(result["lon"])
@@ -983,8 +1000,6 @@ Attractions:
983
  temperature=0,
984
  max_tokens=512,
985
  )
986
- if verifier.name == "ollama-cloud":
987
- kwargs["extra_body"] = {"think": False}
988
  response = client.chat.completions.create(**kwargs)
989
  raw = response.choices[0].message.content
990
  if raw and raw.strip():
@@ -1008,8 +1023,6 @@ def _call_model(provider: _Provider, prompt: str, temperature: float = 0.1) -> l
1008
  """Call a single provider, parse JSON response, return items or None.
1009
  Uses generous timeout and retries. Includes a system message to suppress
1010
  internal reasoning — cuts response time by ~60% on reasoning models.
1011
- For Ollama Cloud, also passes extra_body={"think": False} to disable
1012
- the model's internal thinking/reasoning trace at the API level.
1013
  """
1014
  client = OpenAI(api_key=provider.api_key, base_url=provider.base_url)
1015
  kwargs = dict(
@@ -1020,11 +1033,8 @@ def _call_model(provider: _Provider, prompt: str, temperature: float = 0.1) -> l
1020
  ],
1021
  temperature=temperature,
1022
  max_tokens=4096,
1023
- timeout=120,
1024
  )
1025
- # Ollama Cloud supports the "think" parameter natively via extra_body
1026
- if provider.name == "ollama-cloud":
1027
- kwargs["extra_body"] = {"think": False}
1028
  for attempt in range(3):
1029
  try:
1030
  response = client.chat.completions.create(**kwargs)
@@ -1360,7 +1370,7 @@ def get_recommendations_cached(
1360
  cached = _LLM_CACHE[key]
1361
  if cached is not None:
1362
  return cached[:num_attractions]
1363
- return None
1364
  # Request the maximum (15 user max + 4 padding = 19 internal)
1365
  # This ensures any num_attractions choice hits the cache
1366
  result = get_recommendations(
@@ -1368,9 +1378,9 @@ def get_recommendations_cached(
1368
  categories=categories, temperature=0,
1369
  provider_log=provider_log,
1370
  )
1371
- _LLM_CACHE[key] = result
1372
- _save_llm_cache()
1373
  if result is not None:
 
 
1374
  return result[:num_attractions]
1375
  return None
1376
 
 
667
 
668
  def _geocode_city(city: str) -> tuple[float, float, list[float]] | None:
669
  """Geocode a city center via Nominatim (cached). Returns (lat, lon, boundingbox) or None."""
670
+ result, was_cached = _nominatim_search_cached(city)
671
  if not result:
672
  return None
673
+ # Check if the result is actually a city — if not (e.g. small town USA
674
+ # with same name), retry with a country-agnostic query that prefers cities
675
+ if result.get("type") != "city" and result.get("class") != "place":
676
+ # Try with country qualifier via structured params
677
+ url = "https://nominatim.openstreetmap.org/search?" + urllib.parse.urlencode({
678
+ "q": city, "format": "json", "limit": 5, "accept-language": "en",
679
+ })
680
+ data = _http_get_json(url, timeout=10, retries=1)
681
+ if data and isinstance(data, list):
682
+ # Pick the first result that looks like a real city
683
+ for item in data:
684
+ if item.get("type") == "city" or item.get("class") == "place":
685
+ result = item
686
+ # Update cache
687
+ _GEOCODE_CACHE[city] = item
688
+ _save_geocode_cache()
689
+ break
690
  try:
691
  lat = float(result["lat"])
692
  lon = float(result["lon"])
 
1000
  temperature=0,
1001
  max_tokens=512,
1002
  )
 
 
1003
  response = client.chat.completions.create(**kwargs)
1004
  raw = response.choices[0].message.content
1005
  if raw and raw.strip():
 
1023
  """Call a single provider, parse JSON response, return items or None.
1024
  Uses generous timeout and retries. Includes a system message to suppress
1025
  internal reasoning — cuts response time by ~60% on reasoning models.
 
 
1026
  """
1027
  client = OpenAI(api_key=provider.api_key, base_url=provider.base_url)
1028
  kwargs = dict(
 
1033
  ],
1034
  temperature=temperature,
1035
  max_tokens=4096,
1036
+ timeout=30,
1037
  )
 
 
 
1038
  for attempt in range(3):
1039
  try:
1040
  response = client.chat.completions.create(**kwargs)
 
1370
  cached = _LLM_CACHE[key]
1371
  if cached is not None:
1372
  return cached[:num_attractions]
1373
+ # Don't cache None — allow retry on next request
1374
  # Request the maximum (15 user max + 4 padding = 19 internal)
1375
  # This ensures any num_attractions choice hits the cache
1376
  result = get_recommendations(
 
1378
  categories=categories, temperature=0,
1379
  provider_log=provider_log,
1380
  )
 
 
1381
  if result is not None:
1382
+ _LLM_CACHE[key] = result
1383
+ _save_llm_cache()
1384
  return result[:num_attractions]
1385
  return None
1386