# Phase 2 — Multi-Interest Recommender Walkthrough ## What Was Built A PinnerSage-style multi-interest recommendation engine that replaces Phase 1's raw-ID Qdrant queries with computed EWMA user profile embeddings, Ward hierarchical clustering for interest detection, heuristic re-ranking, and MMR diversity enforcement. **The old pipeline (Phase 1):** ``` User saves papers → raw IDs → Qdrant BEST_SCORE → results ``` **The new pipeline (Phase 2):** ``` User saves papers ↓ EWMA profiles update (background, non-blocking) ↓ Ward clustering → K distinct interest medoids (auto K per user) ↓ Qdrant prefetch + RRF fusion (~15-25ms, single API call) [⚠️ Replaced by Quota Fusion in Phase 4] ↓ Heuristic re-ranking of ~100 candidates (~1-2ms) ↓ MMR diversity selection → top 10-12 papers (<1ms) ↓ Exploration injection → 1-2 serendipitous papers ↓ Render HTML via HTMX ``` **Total pipeline latency: <30ms** (excluding metadata fetch if cold) --- ## Why This Architecture This architecture was chosen after deep research documented in [03-MultiInterest-Recommender-Architecture.md](../research/03-MultiInterest-Recommender-Architecture.md). The key insights: ### The Interest Collapse Problem A single average embedding for a user interested in both *NLP* and *computer vision* lands in meaningless embedding space — Pinterest called this the "energy-boosting breakfast" problem. PinnerSage (KDD 2020) solved it with multiple user vectors. ### Why EWMA Over Rolling Windows Rolling windows (last 30 days) lose valuable historical signal abruptly. EWMA (Exponentially Weighted Moving Average) provides smooth decay: - **Long-term (α=0.10):** Effective window ~20 interactions. Tracks enduring research interests. - **Short-term (α=0.40):** Effective window ~3-5 interactions. Captures current session context. - **Negative (α=0.15):** Tracks papers the user explicitly dislikes. ### Why Ward Over K-Means K-Means requires pre-specifying K (number of clusters). Ward hierarchical clustering auto-determines K per user via a distance threshold — a user with 2 interests gets 2 clusters, a user with 5 gets 5. No hyperparameter tuning per user. ### Why LightGBM Over BGE-reranker The older `Research-Recommender_Technical_Roadmap.md` suggested BGE-reranker-v2 at ~800ms for 100 candidates on CPU. LightGBM scores 500 candidates in 2-5ms. On Render Free Tier (CPU-only, 512MB RAM), this is the only viable option. Currently using a heuristic scorer with the same feature interface — drop-in LightGBM upgrade when training data accumulates. --- ## 3-Tier Cascading Fallback The recommender degrades gracefully based on how much data the user has: | User State | Tier | Strategy | Latency | |---|---|---|---| | ≥5 saves | **Tier 1** | Clustering → RRF → Rerank → MMR → Explore | ~25ms | | 3-4 saves | **Tier 2** | EWMA long-term vector → ANN search | ~10ms | | 1-2 saves | **Tier 3** | Qdrant BEST_SCORE (Phase 1 path) | ~15ms | | 0 saves | Empty | "Save at least 1 paper..." | 0ms | Each tier falls through to the next if it can't produce results. --- ## New Files Created ### `app/recommend/__init__.py` Package init for the recommendation engine module. ### `app/recommend/profiles.py` EWMA temporal embedding profiles: - `ewma_update(current, new_embedding, alpha)` — core blending function - `update_on_save(user_id, paper_embedding)` — updates both LT and ST profiles - `update_on_dismiss(user_id, paper_embedding)` — updates negative profile - `load_profile()` / `save_profile()` — SQLite persistence as binary numpy blobs (4KB each) ### `app/recommend/clustering.py` Ward hierarchical clustering: - `compute_clusters(paper_ids, embeddings)` → list of `InterestCluster` - Each cluster: medoid paper ID, medoid embedding, member paper IDs, importance score - Auto K (1-7 clusters), recency-weighted importance - Falls back to single cluster if <5 saved papers ### `app/recommend/reranker.py` Heuristic scorer (LightGBM-ready): - `compute_features()` → 4 features per candidate: cosine_sim_LT, cosine_sim_ST, paper_age, rrf_position - `heuristic_score()` → weighted sum: 45% relevance, 25% session, 20% recency, 10% rank - `rerank_candidates()` → end-to-end: features → scores → sorted output ### `app/recommend/diversity.py` MMR diversity + exploration: - `mmr_rerank(query, candidates, scores, λ=0.6, top_k=20)` — greedy diverse selection - `inject_exploration(selected, pool, n_explore=2)` — random serendipity injection --- ## Modified Files ### `app/db.py` - Added `user_profiles` table — EWMA vectors as BLOBs with interaction counts - Added `user_clusters` table — Ward clustering results (medoid IDs, importance, paper lists) - Added 4 helper functions: `get_user_profile`, `upsert_user_profile`, `save_user_clusters`, `get_user_clusters` ### `app/qdrant_svc.py` - Added `get_paper_vectors()` — fetch actual BGE-M3 embeddings from Qdrant (needed for EWMA) - Added `search_by_vector()` — raw ANN search by embedding vector - Added `multi_interest_search()` — prefetch + RRF fusion in a single API call - Imported new Qdrant models: `Prefetch`, `FusionQuery`, `Fusion` ### `app/routers/events.py` - Save handler now triggers background EWMA profile update (LT + ST) via `asyncio.create_task` - Dismiss handler triggers background negative profile update - Both are non-blocking — user response is sent before the update completes ### `app/routers/recommendations.py` - Complete rewrite with 3-tier cascading fallback - Tier 1: full 5-step pipeline (cluster → retrieve → rerank → MMR → explore) - Tier 2: EWMA long-term single-vector search - Tier 3: original BEST_SCORE (unchanged from Phase 1) ### `requirements.txt` - Added `numpy>=1.24` — vector computations - Added `scipy>=1.11` — Ward hierarchical clustering --- ## What Was NOT Changed These files are intentionally untouched: - `app/user_state.py` — still manages ID deques for the hot cache - `app/routers/search.py` — search is a separate concern (see PHASE2-Hybrid-Search-Plan) - `app/routers/saved.py` — saved papers page is unaffected - All templates — no UI changes needed, same HTMX partials --- ## Test Coverage | Test File | Tests | Description | |---|---|---| | `test_profiles.py` | 11 | EWMA math, convergence, normalisation, DB round-trips | | `test_clustering.py` | 10 | Ward clustering, medoid validity, max clusters, DB persistence | | `test_reranker_diversity.py` | 13 | Heuristic scoring, MMR diversity, exploration injection | | Existing tests | 52 | Integration, events, saved page, qdrant_svc | | **Total** | **86 passed** | 2 pre-existing live Qdrant failures (network-dependent) | --- ## Upgrade Path: Heuristic → LightGBM The heuristic scorer in `reranker.py` is designed for a zero-data-required drop-in to LightGBM: 1. **When:** Interactions table has ≥500 save/dismiss rows 2. **How:** Train offline with `lgb.train(params={'objective': 'lambdarank'}, ...)` 3. **Where:** Save model to `models/reranker.lgb`, replace `heuristic_score()` with `model.predict(features)` 4. **Impact:** Same features, same interface — zero code changes in the router --- ## Key Design Decisions & Rationale | Decision | Chosen | Rejected | Why | |---|---|---|---| | User profile | EWMA (3 vectors) | Rolling window | Smooth decay, no abrupt signal loss | | Clustering | Ward hierarchical | Fixed K-Means | Auto-determines K per user | | Re-ranking | Heuristic → LightGBM | BGE-reranker-v2 | 800ms → 2ms on CPU | | Diversity | MMR (λ=0.6) | Random sampling | Principled relevance/diversity trade-off | | Exploration | Random injection (2 papers) | None | Prevents filter bubbles | | Multi-query | Qdrant prefetch+RRF | Sequential queries | Single network round-trip |