ResearchIT / docs /walkthroughs /02-Phase2-MultiInterest-Recommender.md
siddhm11
Phase 4 complete + Phase 4.5 instrumentation foundation
61d5f0d
# Phase 2 β€” Multi-Interest Recommender Walkthrough
## What Was Built
A PinnerSage-style multi-interest recommendation engine that replaces Phase 1's raw-ID Qdrant queries with computed EWMA user profile embeddings, Ward hierarchical clustering for interest detection, heuristic re-ranking, and MMR diversity enforcement.
**The old pipeline (Phase 1):**
```
User saves papers β†’ raw IDs β†’ Qdrant BEST_SCORE β†’ results
```
**The new pipeline (Phase 2):**
```
User saves papers
↓
EWMA profiles update (background, non-blocking)
↓
Ward clustering β†’ K distinct interest medoids (auto K per user)
↓
Qdrant prefetch + RRF fusion (~15-25ms, single API call) [⚠️ Replaced by Quota Fusion in Phase 4]
↓
Heuristic re-ranking of ~100 candidates (~1-2ms)
↓
MMR diversity selection β†’ top 10-12 papers (<1ms)
↓
Exploration injection β†’ 1-2 serendipitous papers
↓
Render HTML via HTMX
```
**Total pipeline latency: <30ms** (excluding metadata fetch if cold)
---
## Why This Architecture
This architecture was chosen after deep research documented in [03-MultiInterest-Recommender-Architecture.md](../research/03-MultiInterest-Recommender-Architecture.md). The key insights:
### The Interest Collapse Problem
A single average embedding for a user interested in both *NLP* and *computer vision* lands in meaningless embedding space β€” Pinterest called this the "energy-boosting breakfast" problem. PinnerSage (KDD 2020) solved it with multiple user vectors.
### Why EWMA Over Rolling Windows
Rolling windows (last 30 days) lose valuable historical signal abruptly. EWMA (Exponentially Weighted Moving Average) provides smooth decay:
- **Long-term (Ξ±=0.10):** Effective window ~20 interactions. Tracks enduring research interests.
- **Short-term (Ξ±=0.40):** Effective window ~3-5 interactions. Captures current session context.
- **Negative (Ξ±=0.15):** Tracks papers the user explicitly dislikes.
### Why Ward Over K-Means
K-Means requires pre-specifying K (number of clusters). Ward hierarchical clustering auto-determines K per user via a distance threshold β€” a user with 2 interests gets 2 clusters, a user with 5 gets 5. No hyperparameter tuning per user.
### Why LightGBM Over BGE-reranker
The older `Research-Recommender_Technical_Roadmap.md` suggested BGE-reranker-v2 at ~800ms for 100 candidates on CPU. LightGBM scores 500 candidates in 2-5ms. On Render Free Tier (CPU-only, 512MB RAM), this is the only viable option. Currently using a heuristic scorer with the same feature interface β€” drop-in LightGBM upgrade when training data accumulates.
---
## 3-Tier Cascading Fallback
The recommender degrades gracefully based on how much data the user has:
| User State | Tier | Strategy | Latency |
|---|---|---|---|
| β‰₯5 saves | **Tier 1** | Clustering β†’ RRF β†’ Rerank β†’ MMR β†’ Explore | ~25ms |
| 3-4 saves | **Tier 2** | EWMA long-term vector β†’ ANN search | ~10ms |
| 1-2 saves | **Tier 3** | Qdrant BEST_SCORE (Phase 1 path) | ~15ms |
| 0 saves | Empty | "Save at least 1 paper..." | 0ms |
Each tier falls through to the next if it can't produce results.
---
## New Files Created
### `app/recommend/__init__.py`
Package init for the recommendation engine module.
### `app/recommend/profiles.py`
EWMA temporal embedding profiles:
- `ewma_update(current, new_embedding, alpha)` β€” core blending function
- `update_on_save(user_id, paper_embedding)` β€” updates both LT and ST profiles
- `update_on_dismiss(user_id, paper_embedding)` β€” updates negative profile
- `load_profile()` / `save_profile()` β€” SQLite persistence as binary numpy blobs (4KB each)
### `app/recommend/clustering.py`
Ward hierarchical clustering:
- `compute_clusters(paper_ids, embeddings)` β†’ list of `InterestCluster`
- Each cluster: medoid paper ID, medoid embedding, member paper IDs, importance score
- Auto K (1-7 clusters), recency-weighted importance
- Falls back to single cluster if <5 saved papers
### `app/recommend/reranker.py`
Heuristic scorer (LightGBM-ready):
- `compute_features()` β†’ 4 features per candidate: cosine_sim_LT, cosine_sim_ST, paper_age, rrf_position
- `heuristic_score()` β†’ weighted sum: 45% relevance, 25% session, 20% recency, 10% rank
- `rerank_candidates()` β†’ end-to-end: features β†’ scores β†’ sorted output
### `app/recommend/diversity.py`
MMR diversity + exploration:
- `mmr_rerank(query, candidates, scores, Ξ»=0.6, top_k=20)` β€” greedy diverse selection
- `inject_exploration(selected, pool, n_explore=2)` β€” random serendipity injection
---
## Modified Files
### `app/db.py`
- Added `user_profiles` table β€” EWMA vectors as BLOBs with interaction counts
- Added `user_clusters` table β€” Ward clustering results (medoid IDs, importance, paper lists)
- Added 4 helper functions: `get_user_profile`, `upsert_user_profile`, `save_user_clusters`, `get_user_clusters`
### `app/qdrant_svc.py`
- Added `get_paper_vectors()` β€” fetch actual BGE-M3 embeddings from Qdrant (needed for EWMA)
- Added `search_by_vector()` β€” raw ANN search by embedding vector
- Added `multi_interest_search()` β€” prefetch + RRF fusion in a single API call
- Imported new Qdrant models: `Prefetch`, `FusionQuery`, `Fusion`
### `app/routers/events.py`
- Save handler now triggers background EWMA profile update (LT + ST) via `asyncio.create_task`
- Dismiss handler triggers background negative profile update
- Both are non-blocking β€” user response is sent before the update completes
### `app/routers/recommendations.py`
- Complete rewrite with 3-tier cascading fallback
- Tier 1: full 5-step pipeline (cluster β†’ retrieve β†’ rerank β†’ MMR β†’ explore)
- Tier 2: EWMA long-term single-vector search
- Tier 3: original BEST_SCORE (unchanged from Phase 1)
### `requirements.txt`
- Added `numpy>=1.24` β€” vector computations
- Added `scipy>=1.11` β€” Ward hierarchical clustering
---
## What Was NOT Changed
These files are intentionally untouched:
- `app/user_state.py` β€” still manages ID deques for the hot cache
- `app/routers/search.py` β€” search is a separate concern (see PHASE2-Hybrid-Search-Plan)
- `app/routers/saved.py` β€” saved papers page is unaffected
- All templates β€” no UI changes needed, same HTMX partials
---
## Test Coverage
| Test File | Tests | Description |
|---|---|---|
| `test_profiles.py` | 11 | EWMA math, convergence, normalisation, DB round-trips |
| `test_clustering.py` | 10 | Ward clustering, medoid validity, max clusters, DB persistence |
| `test_reranker_diversity.py` | 13 | Heuristic scoring, MMR diversity, exploration injection |
| Existing tests | 52 | Integration, events, saved page, qdrant_svc |
| **Total** | **86 passed** | 2 pre-existing live Qdrant failures (network-dependent) |
---
## Upgrade Path: Heuristic β†’ LightGBM
The heuristic scorer in `reranker.py` is designed for a zero-data-required drop-in to LightGBM:
1. **When:** Interactions table has β‰₯500 save/dismiss rows
2. **How:** Train offline with `lgb.train(params={'objective': 'lambdarank'}, ...)`
3. **Where:** Save model to `models/reranker.lgb`, replace `heuristic_score()` with `model.predict(features)`
4. **Impact:** Same features, same interface β€” zero code changes in the router
---
## Key Design Decisions & Rationale
| Decision | Chosen | Rejected | Why |
|---|---|---|---|
| User profile | EWMA (3 vectors) | Rolling window | Smooth decay, no abrupt signal loss |
| Clustering | Ward hierarchical | Fixed K-Means | Auto-determines K per user |
| Re-ranking | Heuristic β†’ LightGBM | BGE-reranker-v2 | 800ms β†’ 2ms on CPU |
| Diversity | MMR (Ξ»=0.6) | Random sampling | Principled relevance/diversity trade-off |
| Exploration | Random injection (2 papers) | None | Prevents filter bubbles |
| Multi-query | Qdrant prefetch+RRF | Sequential queries | Single network round-trip |