v0.1.15 — Cache cleanup: 11 None entries purged across 6 cities
Browse files- Removed 11 null cache entries from .llm_cache.json (Athens, Bali, LA, Madrid, Milan, Shanghai)
- Hardened _geocode_city() to disambiguate non-city results (e.g. Athens, GA vs Athens, GR)
- None results no longer cached — failed searches retry fresh on next request
- Timeout reduced 120s→30s for faster provider fallback
- Removed extra_body={'think': False} for Ollama Cloud (caused hangs)
- All 42 re-warmed combos verified with full recommendations
- README cache sizes updated; progress log v0.1.15 added
- .geocode_cache.json +0 -0
- .image_cache.json +0 -0
- .llm_cache.json +0 -0
- README.md +10 -8
- src/services/recommender.py +22 -12
.geocode_cache.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
.image_cache.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
.llm_cache.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
README.md
CHANGED
|
@@ -45,7 +45,9 @@ The app uses a fallback chain of LLM providers. It tries each in order until one
|
|
| 45 |
| 1 (primary) | OpenRouter | `deepseek/deepseek-v4-flash:free` | `OPENROUTER_API_KEY` | ✅ Highly recommended |
|
| 46 |
| 2 (fallback) | Ollama Cloud | `deepseek-v4-flash:cloud` | `OLLAMA_API_KEY` | Optional |
|
| 47 |
| 3 (fallback) | OpenRouter (Gemma) | `google/gemma-4-26b-a4b-it:free` | (uses same `OPENROUTER_API_KEY`) | Optional |
|
| 48 |
-
| 4 (last resort) | Gemini | `gemini-2.5-flash` | `GEMINI_API_KEY` | Optional |
|
|
|
|
|
|
|
| 49 |
|
| 50 |
All providers use OpenAI-compatible API endpoints. Temperature is configurable:
|
| 51 |
- **Search** → temperature=0 (deterministic, cached results)
|
|
@@ -91,7 +93,7 @@ A provider is skipped if its API key is empty. Just set `OPENROUTER_API_KEY` and
|
|
| 91 |
- **Disk-persisted caches** — repeat searches are instant, survive restarts
|
| 92 |
- **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
|
| 93 |
- **Dark Cyborg theme** with large fonts
|
| 94 |
-
- **Responsive 4-row stacking** — search controls auto-stack when
|
| 95 |
|
| 96 |
## Caches
|
| 97 |
|
|
@@ -171,10 +173,10 @@ roamify/
|
|
| 171 |
│ └── clear_poor_entries.py # Clear cache for re-warmup
|
| 172 |
├── .streamlit/
|
| 173 |
│ └── config.toml # Streamlit server and theme config
|
| 174 |
-
├── .llm_cache.json # Disk-persisted recommendation cache (~2.
|
| 175 |
-
├── .image_cache.json # Disk-persisted image URL cache (~
|
| 176 |
-
├── .geocode_cache.json # Disk-persisted geocoding cache (~
|
| 177 |
-
├── .translation_cache.json # Disk-persisted translation cache (~
|
| 178 |
├── Dockerfile # HF Spaces deployment
|
| 179 |
├── requirements.txt
|
| 180 |
└── README.md
|
|
@@ -189,8 +191,8 @@ roamify/
|
|
| 189 |
5. Set secrets in HF Space Settings (same keys as your `.env`)
|
| 190 |
|
| 191 |
Large cache files are normal — they're JSON and compress well in git.
|
| 192 |
-
`.llm_cache.json` is typically ~
|
| 193 |
-
images cache is URL-only (~
|
| 194 |
|
| 195 |
## License
|
| 196 |
|
|
|
|
| 45 |
| 1 (primary) | OpenRouter | `deepseek/deepseek-v4-flash:free` | `OPENROUTER_API_KEY` | ✅ Highly recommended |
|
| 46 |
| 2 (fallback) | Ollama Cloud | `deepseek-v4-flash:cloud` | `OLLAMA_API_KEY` | Optional |
|
| 47 |
| 3 (fallback) | OpenRouter (Gemma) | `google/gemma-4-26b-a4b-it:free` | (uses same `OPENROUTER_API_KEY`) | Optional |
|
| 48 |
+
| 4 (last resort) | Gemini | `gemini-2.5-flash` | `GEMINI_API_KEY` | Optional (free quota may be exhausted) |
|
| 49 |
+
|
| 50 |
+
> **Note:** Ollama Cloud requires an up-to-date `certifi` CA bundle. If the Python OpenAI client times out against ollama.com, run `pip install --upgrade certifi`.
|
| 51 |
|
| 52 |
All providers use OpenAI-compatible API endpoints. Temperature is configurable:
|
| 53 |
- **Search** → temperature=0 (deterministic, cached results)
|
|
|
|
| 93 |
- **Disk-persisted caches** — repeat searches are instant, survive restarts
|
| 94 |
- **Deterministic mode** (Search) vs **Creative mode** (Surprise Me button)
|
| 95 |
- **Dark Cyborg theme** with large fonts
|
| 96 |
+
- **Responsive 4-row stacking** — search controls auto-stack into rows when viewport is narrower than 50% of screen width, content-aware JS detects exact wrap point
|
| 97 |
|
| 98 |
## Caches
|
| 99 |
|
|
|
|
| 173 |
│ └── clear_poor_entries.py # Clear cache for re-warmup
|
| 174 |
├── .streamlit/
|
| 175 |
│ └── config.toml # Streamlit server and theme config
|
| 176 |
+
├── .llm_cache.json # Disk-persisted recommendation cache (~2.7MB)
|
| 177 |
+
├── .image_cache.json # Disk-persisted image URL cache (~900KB)
|
| 178 |
+
├── .geocode_cache.json # Disk-persisted geocoding cache (~500KB)
|
| 179 |
+
├── .translation_cache.json # Disk-persisted translation cache (~7.3MB)
|
| 180 |
├── Dockerfile # HF Spaces deployment
|
| 181 |
├── requirements.txt
|
| 182 |
└── README.md
|
|
|
|
| 191 |
5. Set secrets in HF Space Settings (same keys as your `.env`)
|
| 192 |
|
| 193 |
Large cache files are normal — they're JSON and compress well in git.
|
| 194 |
+
`.llm_cache.json` is typically ~2.7MB, translation cache ~7.3MB,
|
| 195 |
+
images cache is URL-only (~900KB).
|
| 196 |
|
| 197 |
## License
|
| 198 |
|
src/services/recommender.py
CHANGED
|
@@ -667,9 +667,26 @@ def _nominatim_search_cached(query: str, timeout: int = 10) -> tuple[dict | None
|
|
| 667 |
|
| 668 |
def _geocode_city(city: str) -> tuple[float, float, list[float]] | None:
|
| 669 |
"""Geocode a city center via Nominatim (cached). Returns (lat, lon, boundingbox) or None."""
|
| 670 |
-
result,
|
| 671 |
if not result:
|
| 672 |
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 673 |
try:
|
| 674 |
lat = float(result["lat"])
|
| 675 |
lon = float(result["lon"])
|
|
@@ -983,8 +1000,6 @@ Attractions:
|
|
| 983 |
temperature=0,
|
| 984 |
max_tokens=512,
|
| 985 |
)
|
| 986 |
-
if verifier.name == "ollama-cloud":
|
| 987 |
-
kwargs["extra_body"] = {"think": False}
|
| 988 |
response = client.chat.completions.create(**kwargs)
|
| 989 |
raw = response.choices[0].message.content
|
| 990 |
if raw and raw.strip():
|
|
@@ -1008,8 +1023,6 @@ def _call_model(provider: _Provider, prompt: str, temperature: float = 0.1) -> l
|
|
| 1008 |
"""Call a single provider, parse JSON response, return items or None.
|
| 1009 |
Uses generous timeout and retries. Includes a system message to suppress
|
| 1010 |
internal reasoning — cuts response time by ~60% on reasoning models.
|
| 1011 |
-
For Ollama Cloud, also passes extra_body={"think": False} to disable
|
| 1012 |
-
the model's internal thinking/reasoning trace at the API level.
|
| 1013 |
"""
|
| 1014 |
client = OpenAI(api_key=provider.api_key, base_url=provider.base_url)
|
| 1015 |
kwargs = dict(
|
|
@@ -1020,11 +1033,8 @@ def _call_model(provider: _Provider, prompt: str, temperature: float = 0.1) -> l
|
|
| 1020 |
],
|
| 1021 |
temperature=temperature,
|
| 1022 |
max_tokens=4096,
|
| 1023 |
-
timeout=
|
| 1024 |
)
|
| 1025 |
-
# Ollama Cloud supports the "think" parameter natively via extra_body
|
| 1026 |
-
if provider.name == "ollama-cloud":
|
| 1027 |
-
kwargs["extra_body"] = {"think": False}
|
| 1028 |
for attempt in range(3):
|
| 1029 |
try:
|
| 1030 |
response = client.chat.completions.create(**kwargs)
|
|
@@ -1360,7 +1370,7 @@ def get_recommendations_cached(
|
|
| 1360 |
cached = _LLM_CACHE[key]
|
| 1361 |
if cached is not None:
|
| 1362 |
return cached[:num_attractions]
|
| 1363 |
-
|
| 1364 |
# Request the maximum (15 user max + 4 padding = 19 internal)
|
| 1365 |
# This ensures any num_attractions choice hits the cache
|
| 1366 |
result = get_recommendations(
|
|
@@ -1368,9 +1378,9 @@ def get_recommendations_cached(
|
|
| 1368 |
categories=categories, temperature=0,
|
| 1369 |
provider_log=provider_log,
|
| 1370 |
)
|
| 1371 |
-
_LLM_CACHE[key] = result
|
| 1372 |
-
_save_llm_cache()
|
| 1373 |
if result is not None:
|
|
|
|
|
|
|
| 1374 |
return result[:num_attractions]
|
| 1375 |
return None
|
| 1376 |
|
|
|
|
| 667 |
|
| 668 |
def _geocode_city(city: str) -> tuple[float, float, list[float]] | None:
|
| 669 |
"""Geocode a city center via Nominatim (cached). Returns (lat, lon, boundingbox) or None."""
|
| 670 |
+
result, was_cached = _nominatim_search_cached(city)
|
| 671 |
if not result:
|
| 672 |
return None
|
| 673 |
+
# Check if the result is actually a city — if not (e.g. small town USA
|
| 674 |
+
# with same name), retry with a country-agnostic query that prefers cities
|
| 675 |
+
if result.get("type") != "city" and result.get("class") != "place":
|
| 676 |
+
# Try with country qualifier via structured params
|
| 677 |
+
url = "https://nominatim.openstreetmap.org/search?" + urllib.parse.urlencode({
|
| 678 |
+
"q": city, "format": "json", "limit": 5, "accept-language": "en",
|
| 679 |
+
})
|
| 680 |
+
data = _http_get_json(url, timeout=10, retries=1)
|
| 681 |
+
if data and isinstance(data, list):
|
| 682 |
+
# Pick the first result that looks like a real city
|
| 683 |
+
for item in data:
|
| 684 |
+
if item.get("type") == "city" or item.get("class") == "place":
|
| 685 |
+
result = item
|
| 686 |
+
# Update cache
|
| 687 |
+
_GEOCODE_CACHE[city] = item
|
| 688 |
+
_save_geocode_cache()
|
| 689 |
+
break
|
| 690 |
try:
|
| 691 |
lat = float(result["lat"])
|
| 692 |
lon = float(result["lon"])
|
|
|
|
| 1000 |
temperature=0,
|
| 1001 |
max_tokens=512,
|
| 1002 |
)
|
|
|
|
|
|
|
| 1003 |
response = client.chat.completions.create(**kwargs)
|
| 1004 |
raw = response.choices[0].message.content
|
| 1005 |
if raw and raw.strip():
|
|
|
|
| 1023 |
"""Call a single provider, parse JSON response, return items or None.
|
| 1024 |
Uses generous timeout and retries. Includes a system message to suppress
|
| 1025 |
internal reasoning — cuts response time by ~60% on reasoning models.
|
|
|
|
|
|
|
| 1026 |
"""
|
| 1027 |
client = OpenAI(api_key=provider.api_key, base_url=provider.base_url)
|
| 1028 |
kwargs = dict(
|
|
|
|
| 1033 |
],
|
| 1034 |
temperature=temperature,
|
| 1035 |
max_tokens=4096,
|
| 1036 |
+
timeout=30,
|
| 1037 |
)
|
|
|
|
|
|
|
|
|
|
| 1038 |
for attempt in range(3):
|
| 1039 |
try:
|
| 1040 |
response = client.chat.completions.create(**kwargs)
|
|
|
|
| 1370 |
cached = _LLM_CACHE[key]
|
| 1371 |
if cached is not None:
|
| 1372 |
return cached[:num_attractions]
|
| 1373 |
+
# Don't cache None — allow retry on next request
|
| 1374 |
# Request the maximum (15 user max + 4 padding = 19 internal)
|
| 1375 |
# This ensures any num_attractions choice hits the cache
|
| 1376 |
result = get_recommendations(
|
|
|
|
| 1378 |
categories=categories, temperature=0,
|
| 1379 |
provider_log=provider_log,
|
| 1380 |
)
|
|
|
|
|
|
|
| 1381 |
if result is not None:
|
| 1382 |
+
_LLM_CACHE[key] = result
|
| 1383 |
+
_save_llm_cache()
|
| 1384 |
return result[:num_attractions]
|
| 1385 |
return None
|
| 1386 |
|