Spaces:

jofaichow
/

roamify

Running

jofaichow commited on 16 days ago

Commit

8521e5c

1 Parent(s): a004ebb

v0.1.15 — Cache cleanup: 11 None entries purged across 6 cities

- Removed 11 null cache entries from .llm_cache.json (Athens, Bali, LA, Madrid, Milan, Shanghai)
- Hardened _geocode_city() to disambiguate non-city results (e.g. Athens, GA vs Athens, GR)
- None results no longer cached — failed searches retry fresh on next request
- Timeout reduced 120s→30s for faster provider fallback
- Removed extra_body={'think': False} for Ollama Cloud (caused hangs)
- All 42 re-warmed combos verified with full recommendations
- README cache sizes updated; progress log v0.1.15 added

Files changed (5) hide show

.geocode_cache.json +0 -0
.image_cache.json +0 -0
.llm_cache.json +0 -0
README.md +10 -8
src/services/recommender.py +22 -12

.geocode_cache.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

.image_cache.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

.llm_cache.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

README.md CHANGED Viewed

@@ -45,7 +45,9 @@ The app uses a fallback chain of LLM providers. It tries each in order until one
 | 1 (primary) | OpenRouter | `deepseek/deepseek-v4-flash:free` | `OPENROUTER_API_KEY` | ✅ Highly recommended |
 | 2 (fallback) | Ollama Cloud | `deepseek-v4-flash:cloud` | `OLLAMA_API_KEY` | Optional |
 | 3 (fallback) | OpenRouter (Gemma) | `google/gemma-4-26b-a4b-it:free` | (uses same `OPENROUTER_API_KEY`) | Optional |
-| 4 (last resort) | Gemini | `gemini-2.5-flash` | `GEMINI_API_KEY` | Optional |
 All providers use OpenAI-compatible API endpoints. Temperature is configurable:
 - **Search** → temperature=0 (deterministic, cached results)
@@ -91,7 +93,7 @@ A provider is skipped if its API key is empty. Just set `OPENROUTER_API_KEY` and
 - **Disk-persisted caches** — repeat searches are instant, survive restarts
 - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
 - **Dark Cyborg theme** with large fonts
-- **Responsive 4-row stacking** — search controls auto-stack when category pills need 2+ rows, content-aware JS detects exact wrap point
 ## Caches
@@ -171,10 +173,10 @@ roamify/
 │   └── clear_poor_entries.py      # Clear cache for re-warmup
 ├── .streamlit/
 │   └── config.toml              # Streamlit server and theme config
-├── .llm_cache.json              # Disk-persisted recommendation cache (~2.6MB)
-├── .image_cache.json            # Disk-persisted image URL cache (~850KB)
-├── .geocode_cache.json          # Disk-persisted geocoding cache (~460KB)
-├── .translation_cache.json      # Disk-persisted translation cache (~6.9MB)
 ├── Dockerfile                   # HF Spaces deployment
 ├── requirements.txt
 └── README.md
@@ -189,8 +191,8 @@ roamify/
 5. Set secrets in HF Space Settings (same keys as your `.env`)
 Large cache files are normal — they're JSON and compress well in git.
-`.llm_cache.json` is typically ~800KB-1.6MB, translation cache ~220KB,
-images cache is URL-only (~200KB-350KB).
 ## License

 | 1 (primary) | OpenRouter | `deepseek/deepseek-v4-flash:free` | `OPENROUTER_API_KEY` | ✅ Highly recommended |
 | 2 (fallback) | Ollama Cloud | `deepseek-v4-flash:cloud` | `OLLAMA_API_KEY` | Optional |
 | 3 (fallback) | OpenRouter (Gemma) | `google/gemma-4-26b-a4b-it:free` | (uses same `OPENROUTER_API_KEY`) | Optional |
+| 4 (last resort) | Gemini | `gemini-2.5-flash` | `GEMINI_API_KEY` | Optional (free quota may be exhausted) |
+> **Note:** Ollama Cloud requires an up-to-date `certifi` CA bundle. If the Python OpenAI client times out against ollama.com, run `pip install --upgrade certifi`.
 All providers use OpenAI-compatible API endpoints. Temperature is configurable:
 - **Search** → temperature=0 (deterministic, cached results)
 - **Disk-persisted caches** — repeat searches are instant, survive restarts
 - **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
 - **Dark Cyborg theme** with large fonts
+- **Responsive 4-row stacking** — search controls auto-stack into rows when viewport is narrower than 50% of screen width, content-aware JS detects exact wrap point
 ## Caches
 │   └── clear_poor_entries.py      # Clear cache for re-warmup
 ├── .streamlit/
 │   └── config.toml              # Streamlit server and theme config
+├── .llm_cache.json              # Disk-persisted recommendation cache (~2.7MB)
+├── .image_cache.json            # Disk-persisted image URL cache (~900KB)
+├── .geocode_cache.json          # Disk-persisted geocoding cache (~500KB)
+├── .translation_cache.json      # Disk-persisted translation cache (~7.3MB)
 ├── Dockerfile                   # HF Spaces deployment
 ├── requirements.txt
 └── README.md
 5. Set secrets in HF Space Settings (same keys as your `.env`)
 Large cache files are normal — they're JSON and compress well in git.
+`.llm_cache.json` is typically ~2.7MB, translation cache ~7.3MB,
+images cache is URL-only (~900KB).
 ## License

src/services/recommender.py CHANGED Viewed

@@ -667,9 +667,26 @@ def _nominatim_search_cached(query: str, timeout: int = 10) -> tuple[dict | None
 def _geocode_city(city: str) -> tuple[float, float, list[float]] | None:
     """Geocode a city center via Nominatim (cached). Returns (lat, lon, boundingbox) or None."""
-    result, _ = _nominatim_search_cached(city)
     if not result:
         return None
     try:
         lat = float(result["lat"])
         lon = float(result["lon"])
@@ -983,8 +1000,6 @@ Attractions:
             temperature=0,
             max_tokens=512,
         )
-        if verifier.name == "ollama-cloud":
-            kwargs["extra_body"] = {"think": False}
         response = client.chat.completions.create(**kwargs)
         raw = response.choices[0].message.content
         if raw and raw.strip():
@@ -1008,8 +1023,6 @@ def _call_model(provider: _Provider, prompt: str, temperature: float = 0.1) -> l
     """Call a single provider, parse JSON response, return items or None.
     Uses generous timeout and retries. Includes a system message to suppress
     internal reasoning — cuts response time by ~60% on reasoning models.
-    For Ollama Cloud, also passes extra_body={"think": False} to disable
-    the model's internal thinking/reasoning trace at the API level.
     """
     client = OpenAI(api_key=provider.api_key, base_url=provider.base_url)
     kwargs = dict(
@@ -1020,11 +1033,8 @@ def _call_model(provider: _Provider, prompt: str, temperature: float = 0.1) -> l
         ],
         temperature=temperature,
         max_tokens=4096,
-        timeout=120,
     )
-    # Ollama Cloud supports the "think" parameter natively via extra_body
-    if provider.name == "ollama-cloud":
-        kwargs["extra_body"] = {"think": False}
     for attempt in range(3):
         try:
             response = client.chat.completions.create(**kwargs)
@@ -1360,7 +1370,7 @@ def get_recommendations_cached(
         cached = _LLM_CACHE[key]
         if cached is not None:
             return cached[:num_attractions]
-        return None
     # Request the maximum (15 user max + 4 padding = 19 internal)
     # This ensures any num_attractions choice hits the cache
     result = get_recommendations(
@@ -1368,9 +1378,9 @@ def get_recommendations_cached(
         categories=categories, temperature=0,
         provider_log=provider_log,
     )
-    _LLM_CACHE[key] = result
-    _save_llm_cache()
     if result is not None:
         return result[:num_attractions]
     return None

 def _geocode_city(city: str) -> tuple[float, float, list[float]] | None:
     """Geocode a city center via Nominatim (cached). Returns (lat, lon, boundingbox) or None."""
+    result, was_cached = _nominatim_search_cached(city)
     if not result:
         return None
+    # Check if the result is actually a city — if not (e.g. small town USA
+    # with same name), retry with a country-agnostic query that prefers cities
+    if result.get("type") != "city" and result.get("class") != "place":
+        # Try with country qualifier via structured params
+        url = "https://nominatim.openstreetmap.org/search?" + urllib.parse.urlencode({
+            "q": city, "format": "json", "limit": 5, "accept-language": "en",
+        })
+        data = _http_get_json(url, timeout=10, retries=1)
+        if data and isinstance(data, list):
+            # Pick the first result that looks like a real city
+            for item in data:
+                if item.get("type") == "city" or item.get("class") == "place":
+                    result = item
+                    # Update cache
+                    _GEOCODE_CACHE[city] = item
+                    _save_geocode_cache()
+                    break
     try:
         lat = float(result["lat"])
         lon = float(result["lon"])
             temperature=0,
             max_tokens=512,
         )
         response = client.chat.completions.create(**kwargs)
         raw = response.choices[0].message.content
         if raw and raw.strip():
     """Call a single provider, parse JSON response, return items or None.
     Uses generous timeout and retries. Includes a system message to suppress
     internal reasoning — cuts response time by ~60% on reasoning models.
     """
     client = OpenAI(api_key=provider.api_key, base_url=provider.base_url)
     kwargs = dict(
         ],
         temperature=temperature,
         max_tokens=4096,
+        timeout=30,
     )
     for attempt in range(3):
         try:
             response = client.chat.completions.create(**kwargs)
         cached = _LLM_CACHE[key]
         if cached is not None:
             return cached[:num_attractions]
+        # Don't cache None — allow retry on next request
     # Request the maximum (15 user max + 4 padding = 19 internal)
     # This ensures any num_attractions choice hits the cache
     result = get_recommendations(
         categories=categories, temperature=0,
         provider_log=provider_log,
     )
     if result is not None:
+        _LLM_CACHE[key] = result
+        _save_llm_cache()
         return result[:num_attractions]
     return None