Spaces:

siddhm11
/

ResearchIT

Sleeping

App Files Files Community

ResearchIT / docs /walkthroughs /02-Phase2-MultiInterest-Recommender.md

siddhm11

Phase 4 complete + Phase 4.5 instrumentation foundation

61d5f0d about 1 month ago

preview code

raw

history blame contribute delete

7.83 kB

	# Phase 2 — Multi-Interest Recommender Walkthrough

	## What Was Built

	A PinnerSage-style multi-interest recommendation engine that replaces Phase 1's raw-ID Qdrant queries with computed EWMA user profile embeddings, Ward hierarchical clustering for interest detection, heuristic re-ranking, and MMR diversity enforcement.

	The old pipeline (Phase 1):
	```
	User saves papers → raw IDs → Qdrant BEST_SCORE → results
	```

	The new pipeline (Phase 2):
	```
	User saves papers
	↓
	EWMA profiles update (background, non-blocking)
	↓
	Ward clustering → K distinct interest medoids (auto K per user)
	↓
	Qdrant prefetch + RRF fusion (~15-25ms, single API call) [⚠️ Replaced by Quota Fusion in Phase 4]
	↓
	Heuristic re-ranking of ~100 candidates (~1-2ms)
	↓
	MMR diversity selection → top 10-12 papers (<1ms)
	↓
	Exploration injection → 1-2 serendipitous papers
	↓
	Render HTML via HTMX
	```

	Total pipeline latency: <30ms (excluding metadata fetch if cold)

	---

	## Why This Architecture

	This architecture was chosen after deep research documented in [03-MultiInterest-Recommender-Architecture.md](../research/03-MultiInterest-Recommender-Architecture.md). The key insights:

	### The Interest Collapse Problem
	A single average embedding for a user interested in both NLP and computer vision lands in meaningless embedding space — Pinterest called this the "energy-boosting breakfast" problem. PinnerSage (KDD 2020) solved it with multiple user vectors.

	### Why EWMA Over Rolling Windows
	Rolling windows (last 30 days) lose valuable historical signal abruptly. EWMA (Exponentially Weighted Moving Average) provides smooth decay:
	- Long-term (α=0.10): Effective window ~20 interactions. Tracks enduring research interests.
	- Short-term (α=0.40): Effective window ~3-5 interactions. Captures current session context.
	- Negative (α=0.15): Tracks papers the user explicitly dislikes.

	### Why Ward Over K-Means
	K-Means requires pre-specifying K (number of clusters). Ward hierarchical clustering auto-determines K per user via a distance threshold — a user with 2 interests gets 2 clusters, a user with 5 gets 5. No hyperparameter tuning per user.

	### Why LightGBM Over BGE-reranker
	The older `Research-Recommender_Technical_Roadmap.md` suggested BGE-reranker-v2 at ~800ms for 100 candidates on CPU. LightGBM scores 500 candidates in 2-5ms. On Render Free Tier (CPU-only, 512MB RAM), this is the only viable option. Currently using a heuristic scorer with the same feature interface — drop-in LightGBM upgrade when training data accumulates.

	---

	## 3-Tier Cascading Fallback

	The recommender degrades gracefully based on how much data the user has:

	\| User State \| Tier \| Strategy \| Latency \|
	\|---\|---\|---\|---\|
	\| ≥5 saves \| Tier 1 \| Clustering → RRF → Rerank → MMR → Explore \| ~25ms \|
	\| 3-4 saves \| Tier 2 \| EWMA long-term vector → ANN search \| ~10ms \|
	\| 1-2 saves \| Tier 3 \| Qdrant BEST_SCORE (Phase 1 path) \| ~15ms \|
	\| 0 saves \| Empty \| "Save at least 1 paper..." \| 0ms \|

	Each tier falls through to the next if it can't produce results.

	---

	## New Files Created

	### `app/recommend/__init__.py`
	Package init for the recommendation engine module.

	### `app/recommend/profiles.py`
	EWMA temporal embedding profiles:
	- `ewma_update(current, new_embedding, alpha)` — core blending function
	- `update_on_save(user_id, paper_embedding)` — updates both LT and ST profiles
	- `update_on_dismiss(user_id, paper_embedding)` — updates negative profile
	- `load_profile()` / `save_profile()` — SQLite persistence as binary numpy blobs (4KB each)

	### `app/recommend/clustering.py`
	Ward hierarchical clustering:
	- `compute_clusters(paper_ids, embeddings)` → list of `InterestCluster`
	- Each cluster: medoid paper ID, medoid embedding, member paper IDs, importance score
	- Auto K (1-7 clusters), recency-weighted importance
	- Falls back to single cluster if <5 saved papers

	### `app/recommend/reranker.py`
	Heuristic scorer (LightGBM-ready):
	- `compute_features()` → 4 features per candidate: cosine_sim_LT, cosine_sim_ST, paper_age, rrf_position
	- `heuristic_score()` → weighted sum: 45% relevance, 25% session, 20% recency, 10% rank
	- `rerank_candidates()` → end-to-end: features → scores → sorted output

	### `app/recommend/diversity.py`
	MMR diversity + exploration:
	- `mmr_rerank(query, candidates, scores, λ=0.6, top_k=20)` — greedy diverse selection
	- `inject_exploration(selected, pool, n_explore=2)` — random serendipity injection

	---

	## Modified Files

	### `app/db.py`
	- Added `user_profiles` table — EWMA vectors as BLOBs with interaction counts
	- Added `user_clusters` table — Ward clustering results (medoid IDs, importance, paper lists)
	- Added 4 helper functions: `get_user_profile`, `upsert_user_profile`, `save_user_clusters`, `get_user_clusters`

	### `app/qdrant_svc.py`
	- Added `get_paper_vectors()` — fetch actual BGE-M3 embeddings from Qdrant (needed for EWMA)
	- Added `search_by_vector()` — raw ANN search by embedding vector
	- Added `multi_interest_search()` — prefetch + RRF fusion in a single API call
	- Imported new Qdrant models: `Prefetch`, `FusionQuery`, `Fusion`

	### `app/routers/events.py`
	- Save handler now triggers background EWMA profile update (LT + ST) via `asyncio.create_task`
	- Dismiss handler triggers background negative profile update
	- Both are non-blocking — user response is sent before the update completes

	### `app/routers/recommendations.py`
	- Complete rewrite with 3-tier cascading fallback
	- Tier 1: full 5-step pipeline (cluster → retrieve → rerank → MMR → explore)
	- Tier 2: EWMA long-term single-vector search
	- Tier 3: original BEST_SCORE (unchanged from Phase 1)

	### `requirements.txt`
	- Added `numpy>=1.24` — vector computations
	- Added `scipy>=1.11` — Ward hierarchical clustering

	---

	## What Was NOT Changed

	These files are intentionally untouched:
	- `app/user_state.py` — still manages ID deques for the hot cache
	- `app/routers/search.py` — search is a separate concern (see PHASE2-Hybrid-Search-Plan)
	- `app/routers/saved.py` — saved papers page is unaffected
	- All templates — no UI changes needed, same HTMX partials

	---

	## Test Coverage

	\| Test File \| Tests \| Description \|
	\|---\|---\|---\|
	\| `test_profiles.py` \| 11 \| EWMA math, convergence, normalisation, DB round-trips \|
	\| `test_clustering.py` \| 10 \| Ward clustering, medoid validity, max clusters, DB persistence \|
	\| `test_reranker_diversity.py` \| 13 \| Heuristic scoring, MMR diversity, exploration injection \|
	\| Existing tests \| 52 \| Integration, events, saved page, qdrant_svc \|
	\| Total \| 86 passed \| 2 pre-existing live Qdrant failures (network-dependent) \|

	---

	## Upgrade Path: Heuristic → LightGBM

	The heuristic scorer in `reranker.py` is designed for a zero-data-required drop-in to LightGBM:

	1. When: Interactions table has ≥500 save/dismiss rows
	2. How: Train offline with `lgb.train(params={'objective': 'lambdarank'}, ...)`
	3. Where: Save model to `models/reranker.lgb`, replace `heuristic_score()` with `model.predict(features)`
	4. Impact: Same features, same interface — zero code changes in the router

	---

	## Key Design Decisions & Rationale

	\| Decision \| Chosen \| Rejected \| Why \|
	\|---\|---\|---\|---\|
	\| User profile \| EWMA (3 vectors) \| Rolling window \| Smooth decay, no abrupt signal loss \|
	\| Clustering \| Ward hierarchical \| Fixed K-Means \| Auto-determines K per user \|
	\| Re-ranking \| Heuristic → LightGBM \| BGE-reranker-v2 \| 800ms → 2ms on CPU \|
	\| Diversity \| MMR (λ=0.6) \| Random sampling \| Principled relevance/diversity trade-off \|
	\| Exploration \| Random injection (2 papers) \| None \| Prevents filter bubbles \|
	\| Multi-query \| Qdrant prefetch+RRF \| Sequential queries \| Single network round-trip \|