Spaces:
Sleeping
Phase 2 β Multi-Interest Recommender Walkthrough
What Was Built
A PinnerSage-style multi-interest recommendation engine that replaces Phase 1's raw-ID Qdrant queries with computed EWMA user profile embeddings, Ward hierarchical clustering for interest detection, heuristic re-ranking, and MMR diversity enforcement.
The old pipeline (Phase 1):
User saves papers β raw IDs β Qdrant BEST_SCORE β results
The new pipeline (Phase 2):
User saves papers
β
EWMA profiles update (background, non-blocking)
β
Ward clustering β K distinct interest medoids (auto K per user)
β
Qdrant prefetch + RRF fusion (~15-25ms, single API call) [β οΈ Replaced by Quota Fusion in Phase 4]
β
Heuristic re-ranking of ~100 candidates (~1-2ms)
β
MMR diversity selection β top 10-12 papers (<1ms)
β
Exploration injection β 1-2 serendipitous papers
β
Render HTML via HTMX
Total pipeline latency: <30ms (excluding metadata fetch if cold)
Why This Architecture
This architecture was chosen after deep research documented in 03-MultiInterest-Recommender-Architecture.md. The key insights:
The Interest Collapse Problem
A single average embedding for a user interested in both NLP and computer vision lands in meaningless embedding space β Pinterest called this the "energy-boosting breakfast" problem. PinnerSage (KDD 2020) solved it with multiple user vectors.
Why EWMA Over Rolling Windows
Rolling windows (last 30 days) lose valuable historical signal abruptly. EWMA (Exponentially Weighted Moving Average) provides smooth decay:
- Long-term (Ξ±=0.10): Effective window ~20 interactions. Tracks enduring research interests.
- Short-term (Ξ±=0.40): Effective window ~3-5 interactions. Captures current session context.
- Negative (Ξ±=0.15): Tracks papers the user explicitly dislikes.
Why Ward Over K-Means
K-Means requires pre-specifying K (number of clusters). Ward hierarchical clustering auto-determines K per user via a distance threshold β a user with 2 interests gets 2 clusters, a user with 5 gets 5. No hyperparameter tuning per user.
Why LightGBM Over BGE-reranker
The older Research-Recommender_Technical_Roadmap.md suggested BGE-reranker-v2 at ~800ms for 100 candidates on CPU. LightGBM scores 500 candidates in 2-5ms. On Render Free Tier (CPU-only, 512MB RAM), this is the only viable option. Currently using a heuristic scorer with the same feature interface β drop-in LightGBM upgrade when training data accumulates.
3-Tier Cascading Fallback
The recommender degrades gracefully based on how much data the user has:
| User State | Tier | Strategy | Latency |
|---|---|---|---|
| β₯5 saves | Tier 1 | Clustering β RRF β Rerank β MMR β Explore | ~25ms |
| 3-4 saves | Tier 2 | EWMA long-term vector β ANN search | ~10ms |
| 1-2 saves | Tier 3 | Qdrant BEST_SCORE (Phase 1 path) | ~15ms |
| 0 saves | Empty | "Save at least 1 paper..." | 0ms |
Each tier falls through to the next if it can't produce results.
New Files Created
app/recommend/__init__.py
Package init for the recommendation engine module.
app/recommend/profiles.py
EWMA temporal embedding profiles:
ewma_update(current, new_embedding, alpha)β core blending functionupdate_on_save(user_id, paper_embedding)β updates both LT and ST profilesupdate_on_dismiss(user_id, paper_embedding)β updates negative profileload_profile()/save_profile()β SQLite persistence as binary numpy blobs (4KB each)
app/recommend/clustering.py
Ward hierarchical clustering:
compute_clusters(paper_ids, embeddings)β list ofInterestCluster- Each cluster: medoid paper ID, medoid embedding, member paper IDs, importance score
- Auto K (1-7 clusters), recency-weighted importance
- Falls back to single cluster if <5 saved papers
app/recommend/reranker.py
Heuristic scorer (LightGBM-ready):
compute_features()β 4 features per candidate: cosine_sim_LT, cosine_sim_ST, paper_age, rrf_positionheuristic_score()β weighted sum: 45% relevance, 25% session, 20% recency, 10% rankrerank_candidates()β end-to-end: features β scores β sorted output
app/recommend/diversity.py
MMR diversity + exploration:
mmr_rerank(query, candidates, scores, Ξ»=0.6, top_k=20)β greedy diverse selectioninject_exploration(selected, pool, n_explore=2)β random serendipity injection
Modified Files
app/db.py
- Added
user_profilestable β EWMA vectors as BLOBs with interaction counts - Added
user_clusterstable β Ward clustering results (medoid IDs, importance, paper lists) - Added 4 helper functions:
get_user_profile,upsert_user_profile,save_user_clusters,get_user_clusters
app/qdrant_svc.py
- Added
get_paper_vectors()β fetch actual BGE-M3 embeddings from Qdrant (needed for EWMA) - Added
search_by_vector()β raw ANN search by embedding vector - Added
multi_interest_search()β prefetch + RRF fusion in a single API call - Imported new Qdrant models:
Prefetch,FusionQuery,Fusion
app/routers/events.py
- Save handler now triggers background EWMA profile update (LT + ST) via
asyncio.create_task - Dismiss handler triggers background negative profile update
- Both are non-blocking β user response is sent before the update completes
app/routers/recommendations.py
- Complete rewrite with 3-tier cascading fallback
- Tier 1: full 5-step pipeline (cluster β retrieve β rerank β MMR β explore)
- Tier 2: EWMA long-term single-vector search
- Tier 3: original BEST_SCORE (unchanged from Phase 1)
requirements.txt
- Added
numpy>=1.24β vector computations - Added
scipy>=1.11β Ward hierarchical clustering
What Was NOT Changed
These files are intentionally untouched:
app/user_state.pyβ still manages ID deques for the hot cacheapp/routers/search.pyβ search is a separate concern (see PHASE2-Hybrid-Search-Plan)app/routers/saved.pyβ saved papers page is unaffected- All templates β no UI changes needed, same HTMX partials
Test Coverage
| Test File | Tests | Description |
|---|---|---|
test_profiles.py |
11 | EWMA math, convergence, normalisation, DB round-trips |
test_clustering.py |
10 | Ward clustering, medoid validity, max clusters, DB persistence |
test_reranker_diversity.py |
13 | Heuristic scoring, MMR diversity, exploration injection |
| Existing tests | 52 | Integration, events, saved page, qdrant_svc |
| Total | 86 passed | 2 pre-existing live Qdrant failures (network-dependent) |
Upgrade Path: Heuristic β LightGBM
The heuristic scorer in reranker.py is designed for a zero-data-required drop-in to LightGBM:
- When: Interactions table has β₯500 save/dismiss rows
- How: Train offline with
lgb.train(params={'objective': 'lambdarank'}, ...) - Where: Save model to
models/reranker.lgb, replaceheuristic_score()withmodel.predict(features) - Impact: Same features, same interface β zero code changes in the router
Key Design Decisions & Rationale
| Decision | Chosen | Rejected | Why |
|---|---|---|---|
| User profile | EWMA (3 vectors) | Rolling window | Smooth decay, no abrupt signal loss |
| Clustering | Ward hierarchical | Fixed K-Means | Auto-determines K per user |
| Re-ranking | Heuristic β LightGBM | BGE-reranker-v2 | 800ms β 2ms on CPU |
| Diversity | MMR (Ξ»=0.6) | Random sampling | Principled relevance/diversity trade-off |
| Exploration | Random injection (2 papers) | None | Prevents filter bubbles |
| Multi-query | Qdrant prefetch+RRF | Sequential queries | Single network round-trip |