Spaces:
Sleeping
ResearchIT β Master Task Tracker
Purpose: Single source of truth for all completed, in-progress, and upcoming work.
Last updated: 2026-05-05
Current phase: Phase 6.5 (Instrumentation) β COMPLETE β | Phase 7 next
Legend
[x]β Done[/]β In progress[ ]β Not started[~]β Intentionally deferred (blocked by data/users/scale)[!]β Backlog item (documented, not yet coded)
Phase 1: Zero-ML Recommender β COMPLETE
Built the foundation: Qdrant connection, arXiv search, save/dismiss, cookie identity, HTMX frontend.
- Qdrant Cloud connection (1.6M BGE-M3 papers, BQ, HNSW m=32)
- Collection:
arxiv_bgem3_dense, 1024-dim dense vectors - File:
app/qdrant_svc.pyβ_get_client()
- Collection:
- BEST_SCORE Recommend API (raw paper IDs β Qdrant)
- File:
app/qdrant_svc.pyβrecommend()
- File:
- arXiv keyword API search (placeholder β replaced in Phase 3)
- File:
app/arxiv_svc.pyβsearch()
- File:
- arXiv metadata fetching + SQLite cache
- File:
app/arxiv_svc.pyβfetch_metadata_batch()
- File:
- SQLite database schema (interactions, paper_metadata)
- File:
app/db.pyβinit_db() - WAL mode, async via aiosqlite
- File:
- Cookie-based user identity
- File:
app/config.pyβCOOKIE_NAME
- File:
- User state management (positive/negative deques)
- File:
app/user_state.pyβUserState
- File:
- Save/Dismiss event logging
- File:
app/routers/events.py
- File:
- HTMX + Jinja2 frontend (search, recs, save, dismiss)
- Files:
app/templates/(base.html, index.html, search.html, saved.html, partials/)
- Files:
- Test suite β 55 tests passing
Gaps: None.
Phase 2a: EWMA Profile Embeddings β COMPLETE
Replaced raw ID-list approach with temporal decay vectors so recent interests outweigh old ones.
- Create
app/recommend/module with__init__.py - Create
app/recommend/profiles.pyβ EWMA computation + storage- Long-term: Ξ±=0.03 β (corrected from 0.10 per Doc 06)
- Short-term: Ξ±=0.40
- Negative: Ξ±=0.15
- All embeddings L2-normalized
- Modify
app/db.pyβ adduser_profilestable +user_clusterstable - Modify
app/qdrant_svc.pyβ addget_paper_vectors()andsearch_by_vector() - Modify
app/routers/events.pyβ trigger EWMA updates on save/dismiss - Modify
app/routers/recommendations.pyβ EWMA vector search with Tier 2 fallback - Add
numpy+scipytorequirements.txt - Tests for profiles module β 11 passed
- Full test suite β no regressions
Doc 06 correction applied: Ξ±_long 0.10 β 0.03 (PinnerSage rejected 0.10 as too recent-biased).
Gaps: None.
Phase 2b: Ward Clustering + Multi-Interest Retrieval β COMPLETE
Detect distinct user interests via hierarchical clustering, retrieve candidates per interest.
- Create
app/recommend/clustering.pyβ Ward clustering + medoid extraction- L2-normalize embeddings before Ward β (Doc 06 correction)
- Adaptive gap-based threshold (no fixed K)
- Medoid representation (real papers, not centroids) β
- Dynamic K (1β7 clusters, auto-determined)
- Recency-weighted importance scores
- Modify
app/qdrant_svc.pyβ addmulti_interest_search()with prefetch+RRF - Modify
app/routers/recommendations.pyβ 3-tier cascading pipeline- Tier 1 (β₯5 saves): Multi-interest clustering β prefetch + RRF
- Tier 2 (β₯3 saves): EWMA long-term vector β single ANN search
- Tier 3 (β₯1 save): Qdrant BEST_SCORE Recommend API
- Tests for clustering module β 10 passed
- Full test suite β no regressions
Doc 06 corrections applied: L2-normalization before Ward, medoid not centroid.
Gaps (deferred to Phase 4):
- [!] RRF β quota fusion (dominant clusters can swamp minority interests)
- [!] Hungarian matching for cluster ID stability across reclusterings
Phase 2c: Heuristic Re-ranking + MMR Diversity β COMPLETE
Added scoring and diversity layers on top of retrieval to produce the final feed.
- Create
app/recommend/reranker.pyβ 5-feature heuristic scorer- Feature 1: cosine_sim_longterm (weight 0.40)
- Feature 2: cosine_sim_shortterm (weight 0.25)
- Feature 3: paper_age_days / recency (weight 0.15)
- Feature 4: rrf_position (weight 0.10)
- Feature 5: cosine_sim_negative (weight -0.15) β (Doc 06 addition)
- Create
app/recommend/diversity.pyβ MMR + exploration injection- MMR with Ξ»=0.6
- 2 serendipitous exploration papers per feed
- Modify
app/routers/recommendations.pyβ full 5-step pipeline- Step 1: Clustering β Step 2: Retrieval β Step 3: Rerank β Step 4: MMR β Step 5: Exploration
- Tests for reranker + diversity β 13 passed
- Full test suite β 88 passed (86 + 2 pre-existing live Qdrant failures resolved)
Doc 06 correction applied: Negative EWMA profile wired as Feature 5 with 0.15 penalty.
Gaps: None. LightGBM model now integrated (Phase 6 β ).
Phase 2d: Advanced Models β DEFERRED (Blocked by data/users)
These logically belong to the recommendation engine but cannot be built without real user data or scale.
- [~] LightGBM lambdarank model β requires β₯500 labeled save/dismiss interactions β Phase 6
- [~] Collaborative filtering features β requires β₯500 users β Phase 9
- [~] DPP diversity β explicitly ruled out for v1 by Doc 06 β Phase 9+
- [~] Two-Tower model β requires GPU + large dataset β Phase 9+
Phase 3: Hybrid Semantic Search β COMPLETE
Replace the arXiv keyword API placeholder with real vector-based semantic search using Qdrant dense + Zilliz sparse + RRF.
Detailed plan:docs/phases/PHASE3-Hybrid-Semantic-Search.md
Prototype reference:docs/phases/PHASE2-Hybrid-Search-Plan.md
Deployment target: Hugging Face Spaces (Docker SDK, 16GB RAM, 2 vCPUs)
New files created
-
app/embed_svc.pyβ BGE-M3 model singleton (load BAAI/bge-m3 once at startup, ~570MB, ~15s cold)encode_query(text)β(dense: np.ndarray[1024], sparse: dict)- LRU cache for repeat queries
- Thread-safe, lazy loading with double-check locking
-
app/zilliz_svc.pyβ Zilliz Cloud sparse search client- Collection:
arxiv_bgem3_sparse - Schema:
id(INT64 auto PK),arxiv_id(VARCHAR),sparse_vector(SPARSE_FLOAT_VECTOR) - Index: SPARSE_INVERTED_INDEX, metric_type=IP
- Sparse format:
{int_token_id: float_weight}(BGE-M3 lexical weights, NOT string words) search_sparse(sparse_dict, limit)βlist[dict]with arxiv_id + score- gRPC reconnect handling
- Collection:
-
app/groq_svc.pyβ LLM query rewriter (Groq / llama-3.3-70b)rewrite(user_query)β academic query string- Graceful fallback to original query on error
- Academic-detection heuristic to skip unnecessary rewrites
- 2s hard timeout
-
app/hybrid_search_svc.pyβ search orchestrator- Rewrite β Encode β Parallel (Qdrant dense + Zilliz sparse) β RRF β Rerank
- Each step has independent failure handling
- Recency reranking: 0.80 RRF + 0.20 recency
Files modified
-
app/config.pyβ addedZILLIZ_URI,ZILLIZ_TOKEN,ZILLIZ_COLLECTION,GROQ_API_KEY,BGE_M3_MODEL,BGE_M3_DEVICE,ENCODE_CACHE_SIZE, search weights,APP_PORT -
app/qdrant_svc.pyβ addedsearch_dense(dense_vec, limit)for raw vector search returning scores -
app/routers/search.pyβ swappedarxiv_svc.search()βhybrid_search_svc.search()with arXiv fallback -
app/main.pyβ added graceful BGE-M3 warm-up to lifespan -
requirements.txtβ addedFlagEmbedding,pymilvus,groq -
run.pyβ configurable port (7860 default for HF Spaces)
Deployment files created
-
Dockerfileβ HF Spaces Docker SDK, CPU-only PyTorch, pre-baked BGE-M3 model -
.dockerignoreβ excludes notebooks, PDFs, databases, caches
Implementation steps completed
- Step 1: BGE-M3 model service (
embed_svc.py) + unit tests - Step 2: Zilliz client (
zilliz_svc.py) - Step 3: Dense search in Qdrant service
- Step 4: Groq rewriter (
groq_svc.py) - Step 5: Hybrid search orchestrator (
hybrid_search_svc.py) - Step 6: Swap search router
- Step 7: Model warm-up + deployment config
- Step 8: Tests β 21 new tests passing (RRF, recency, Groq heuristics, embed edge cases, orchestrator mocks)
Test results
- 88 original tests: β All pass (zero regressions)
- 21 Phase 3 unit tests: β All pass (RRF, recency, Groq, embed, orchestrator mocks)
- 6 search router tests: β All pass (ranking, fallback, HTMX, saved state)
- 8 live service tests: β All pass (Qdrant dense, Zilliz sparse, Groq rewrite, parallel)
- Total: 123 tests passing
Latency budget
| Stage | Time |
|---|---|
| LLM rewrite (Groq) | ~300ms (skippable) |
| BGE-M3 encode (CPU) | ~300ms first, ~0ms cached |
| Qdrant + Zilliz (parallel) | ~300ms |
| RRF + rerank | <5ms |
| Total (warm) | ~600ms |
Phase 3.5: Turso ArXiv Metadata DB β COMPLETE
Bulk-loaded 1.23 GB of arXiv paper metadata + citation data to Turso (libSQL) cloud DB.
Eliminates the unstable arXiv API dependency for metadata fetching (Phase 4.2 solved early).
Integrated into codebase and deployed to HF Spaces.
Infrastructure
- Turso cloud DB created:
arxiv-dataonaws-ap-south-1- URL:
https://arxiv-data-siddhm11.aws-ap-south-1.turso.io - Auth: Platform token + DB auth token (minted via CLI)
- URL:
- Table:
paperswith columns:arxiv_id(TEXT, UNIQUE INDEXidx_papers_arxiv_id)title(TEXT)authors(TEXT)categories(TEXT)primary_topic(TEXT)update_date(TEXT)abstract_preview(TEXT, truncated to 500 chars)citation_count(INTEGER, default 0)influential_citations(INTEGER, default 0)
- Data sources:
arxiv_comprehensive_papers.csv(Kaggle: siddhm11/arxivdata)arxiv_citations_summary.csv(Kaggle: siddhm11/citation-data-letsgoo)- Joined on
id=arxiv_id_clean, deduplicated
- Row count verified: local β remote match
- Unique index on
arxiv_idfor fast lookups
Integration (DONE)
- Added
TURSO_URLandTURSO_DB_TOKENtoconfig.py/.env/ HF Secrets - Created
app/turso_svc.pyβ metadata lookup servicefetch_metadata_batch(arxiv_ids)β{arxiv_id: paper_dict}- Uses Turso HTTP pipeline API (zero new Python deps β just httpx)
- Includes citation_count + influential_citations
-
app/routers/search.pyβ Turso primary, arXiv API fallback (only for IDs not in Turso) - Created
tests/test_turso_timing.pyβ timing benchmark - Verified: 10/10 title match, 6.1x end-to-end speedup on HF Spaces
- Impact: Avg search time dropped from ~10.7s to ~1.75s on HF Spaces
Phase 4: Recommendation Pipeline Fixes β COMPLETE
Fixed the known architectural debt in the recommendation pipeline.
Detailed plan:docs/phases/PHASE4-Recommendation-Pipeline-Fixes.md
4.1 β Replace RRF with Importance-Weighted Quota Fusion
- Create
app/recommend/fusion.pyβ quota allocation logicw_k = importance_k / sum(importance_k)slot_k = max(floor(F Γ w_k), F_min=3)β every cluster gets at least 3 slots- Distribute remainder by largest fractional part
- Create
tests/test_fusion.pyβ 20 unit tests for quota allocation- Proportionality, floor enforcement, total invariant, edge cases, Doc 06 worked examples
- Refactor
_multi_interest_recommend()inrecommendations.py- Replace
multi_interest_search()with per-cluster separate ANN queries - Use
asyncio.gather()for concurrent searches (~15ms wall-clock) - Allocate feed slots proportionally via
allocate_quotas() - Deduplicate across clusters (first-occurrence = highest-ranked cluster wins)
- MMR over merged union (unchanged)
- Replace
- Keep
qdrant_svc.multi_interest_search()in codebase (no deletion)
4.2 β Pre-populate Metadata Store β DONE (via Turso)
- Bulk-loaded arXiv metadata from Kaggle to Turso cloud DB (Phase 3.5)
- 1.23 GB, includes citation counts from Semantic Scholar
- Wired Turso service into
search.py(Turso primary, arXiv API fallback) - arXiv API is now fallback only for genuinely new papers
- Impact: Search time dropped from ~10.7s to ~1.75s on HF Spaces
4.3 β Hungarian Matching for Cluster Stability
- Add
stabilize_cluster_ids()function toclustering.py- Uses
scipy.optimize.linear_sum_assignment(already a dependency) - Cost matrix:
1 - cosine_sim(new_medoid, old_medoid)β trivial at Kβ€7 - Matched clusters keep old indices; new clusters get next available
- Min cosine threshold (0.5) rejects unrelated matches
- Uses
- Call between
compute_clusters()andsave_clusters_to_db()in recommendations.py - 10 tests in
test_clustering.pyβ perturbed clusters preserve indices, unrelated match rejection, K growth/shrink, custom thresholds
4.4 β Category-Level Negative Suppression
- Add
get_suppressed_categories()todb.py- Joins
interactions+paper_metadatato find categories with β₯3 dismissals - Primary category only (decision: avoid over-suppression)
- 14-day window (standard default, Ο_neg = 14 days)
- Joins
- Add suppression filter in
_multi_interest_recommend()after reranking - Cache Turso metadata to
paper_metadataviacache_turso_metadata_batch() - 8 tests in
test_db.pyβ threshold, partitioning, user isolation, custom threshold - [~] Per-item short-term decay β deferred to Phase 6 (LightGBM feature)
Gaps: None.
Phase 4.5: Instrumentation Foundation β COMPLETE
Added telemetry columns to the interactions table so every saved/dismissed paper can be attributed to its pipeline tier, cluster origin, and ranker version. Doc 07 (ADR A4) identified this as the single most valuable early investment β retrofitting these fields after real user data exists is painful and blocks all later counterfactual evaluation.
Schema changes
- Add
ranker_version TEXTtointeractionstable β pipeline version tag - Add
candidate_source TEXTtointeractionsβ e.g.cluster_0,exploration,ewma_longterm,qdrant_recommend,short_term_supplement - Add
cluster_id INTEGERtointeractionsβ interest cluster index (NULL if N/A) - ALTER TABLE migration for existing DBs (safe try/except, idempotent)
Pipeline tagging
- Add
_RANKER_VERSIONconstant torecommendations.py - Tag Tier 1 papers with cluster origin, exploration status, short-term supplement
- Tag Tier 2 papers as
ewma_longterm - Tag Tier 3 papers as
qdrant_recommend - Build
paper_cluster_mapbefore quota merge (first-occurrence = cluster attribution) - Exploration papers tagged as
candidate_source='exploration'
End-to-end flow
-
recommendations.pyembeds tags in paper dicts -
action_buttons.htmlincludes tags inhx-valsJSON -
events.pyacceptsranker_version,candidate_source,cluster_idForm fields -
db.log_interaction()stores all three new columns
Files modified: app/db.py, app/routers/events.py, app/routers/recommendations.py, app/templates/partials/action_buttons.html
Gaps: None. propensity and policy_id fields deferred until Ξ΅-greedy exploration (Phase 9).
Phase 5: Cold-Start Onboarding β COMPLETE
Onboarding wizard for new users β category selection + seed paper search + trending fallback.
Reference: Doc 06 β "4-37% lift even once behavioral data exists"
5.1 β arXiv Category Multi-Select β
- UI screen on first visit: select 1-8 arXiv category groups
- Store selections in SQLite (
user_onboardingtable) - Use as pool filter for recommendations (via
get_user_category_filter()) - Preserve as LightGBM feature permanently (Feature 26:
onboarding_category_match) - Does NOT create "subject vectors" β just filters
5.2 β Seed Paper Import β
- Let users search for and save seed papers during onboarding
- Immediately create EWMA profiles + Ward clusters on next feed request
- Uses hybrid search (Phase 3) for discovery
5.3 β ORCID / Semantic Scholar Import β REMOVED
S2 author import was implemented but removed β not the onboarding direction we want. Onboarding focuses on category selection + manual seed paper search.
5.4 β Popularity Fallback β
- Category-filtered trending papers served via
turso_svc.fetch_trending_by_categories() - 1-hour TTL trending cache for performance
Phase 6: LightGBM Re-ranker β COMPLETE
Replaced heuristic scorer with a trained LightGBM lambdarank model.
Unblocked via citation-graph pseudo-labels from Semantic Scholar.
Handoff doc:docs/PHASE6-HANDOFF.md
Model repo: siddhm11/researchit-reranker-phase6
6.1 β ML Intern: Data Pipeline + Model Training β
- Export 1.6M arXiv IDs from Turso β
arxiv_ids.txt(scripts/export_arxiv_ids.py) - Fetch 242K citation edges from Semantic Scholar Batch API (
01_fetch_citation_edges.py) - Generate 98K training triples with pseudo-labels: cited=2, co-cited=1, negative=0 (
02_generate_training_triples.py) - 37-feature schema (20 content, 11 user behavior, 6 cross-features)
- Train LightGBM LambdaRank model: 141 trees, 63 leaves, lr=0.05 (
03_train_lightgbm.py) - nDCG@10 = 0.879 (+233% vs heuristic baseline)
- All artifacts pushed to HuggingFace
6.2 β Opus: Integration into ResearchIT β
- Rewrite
app/recommend/reranker.pyβ 5 features β 37 features - LightGBM model loading at import time with heuristic fallback
- Multi-path model file search (env var β relative β absolute)
- Backward-compatible
rerank_candidates()signature (old callers unaffected) - Add
lightgbm>=4.0,<5.0torequirements.txt - Fix CRLFβLF line endings in model file (Windows Git issue)
- 7 integration tests β all passing (
tests/test_reranker_integration.py) - Latency verified: 0.223ms per 100 candidates (target: <1ms) β
6.3 β Antigravity: Feature Wiring + Deployment Verification β
- Wire all 37 features into
recommendations.pycaller (was legacy 6-arg signature) - Per-candidate
cluster_importance(N,) frompaper_cluster_map - Per-candidate
cluster_medoid(N, 1024) per source cluster - Pre-computed
is_suppressed_categoryandonboarding_category_matcharrays - Pass
qdrant_scores,user_total_saves,user_total_dismissals -
reranker.pysupports both scalar broadcast and per-candidate arrays - Add model accessors:
is_model_loaded(),get_num_trees(),get_loaded_model_path() - Add per-request feature activation logging
- Create
GET /healthz/rerankerendpoint (app/routers/health.py) - Bug B fix: persist
medoid_embedding_blobBLOB inuser_clusterstable - Bug B fix: fall back to persisted blob instead of zero vector in Hungarian matching
- DB migration:
ALTER TABLE user_clusters ADD COLUMN medoid_embedding_blob BLOB - 9 new tests β all passing (
tests/test_phase6_feature_wiring.py) - Full suite: 203+ tests passing, 0 failures
- Updated
CLAUDE.md,PHASE6-HANDOFF.md,README.md
6.4 β Retraining [~] DEFERRED
Phase 6.4 retraining is deferred. The published model
siddhm11/researchit-reranker-phase6was trained on citation pseudo-labels with features 23β30 zero. Retraining is gated on either (a) the synthetic-user simulator (Phase 6.4b, ~30 days out) or (b) crossing 100 real users with β₯10 saves each. Until then, Phase 6.1+6.2+6.3 plumbing is the unit of deliverable work.
- [~] Synthetic user simulator (
scripts/simulate_users.py) β target: +30d - [~] Real-user retrain at 100-user threshold β target: +90d or threshold
- [~] HF model card backfill (library_name, pipeline_tag, metrics, schema)
Phase 6.5: Instrumentation β COMPLETE
Purpose: Stabilize the recommendation pipeline and prepare telemetry substrate for Phase 7 evaluation.
A1 β Real Qdrant cosine scores
- Switch
search_by_vector()βsearch_by_vector_with_scores()in per-cluster + short-term searches - Build
qdrant_score_mapfrom real cosines (replaces fake1.0 - rank*0.01linear decay) - Feature 0 (
qdrant_cosine_score) now receives actual cosine similarities
A2 β Deployment verification
-
curl /healthz/rerankerβmodel_loaded=true, n_trees=141, fallback_active=false - Verification timestamp added to
PHASE6-Reranker-Framing.md
B1 β query_id linkage
- Generate
query_id(UUID) once per feed request inget_recommendations() - Thread through all 4 tiers: trending, Tier 1, Tier 2, Tier 3
- Generate
query_idinsearch.pyper search request - Add
query_id+positiontoaction_buttons.htmlhx-vals
B2 β Propensity logging
- Add
propensity REAL+policy_id TEXTmigration tointeractionstable - Extend
db.log_interaction()with propensity + policy_id params - Compute propensity: 1.0 (deterministic) vs
n_explore/pool_size(exploration) - Thread through templates +
events.pyForm params
B3 β Cluster snapshot versioning
- Add
cluster_snapshotstable (append-only, content-addressed viapaper_ids_hash) -
save_cluster_snapshot()called after eachsave_clusters_to_db() -
prune_old_snapshots(30)on startup inmain.pylifespan
B4 β S2 author import β REMOVED
S2 author import was implemented and then removed β not the onboarding direction we want.
app/s2_svc.py, the/api/onboarding/import-authorendpoint, and the quick-import UI have all been deleted. Onboarding uses category selection + manual seed search only.
Documentation
-
CLAUDE.md: Rule 3.11 β interaction instrumentation invariants -
_RANKER_VERSIONbumped tov6.5_lightgbm_real_cosines - Phase status updated to 6.5 COMPLETE
- Tests: 203+ passing
Test suite
tests/test_reranker_integration.pyβ 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)tests/test_phase6_feature_wiring.pyβ 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)tests/demo_reranker.pyβ interactive demo with 20 realistic papers
Phase 7: Evaluation Framework π NOT STARTED
Build offline and online evaluation before scaling users.
Estimated effort: ~1 week
- Offline metrics: nDCG@10, Recall@50, HR@10, ILS, category entropy
- Time-split evaluation on unarXive 2022 + S2ORC
- Online metrics (once users exist): CTR, save rate, dwell time, return rate
Phase 8: LLM Interest Summaries + Distilled Re-ranker π NOT STARTED
Estimated effort: ~10-12 weeks (Doc 07)
Detailed research plan:docs/research/07-LLM-Summaries-Reranker-and-Scaling-Research.md
Entry criteria: Phase 7 eval producing stable nDCG@10; cluster stability Jaccard β₯0.7 over 7 days
8a β Claude-generated per-cluster interest summaries (Doc 07 Β§A)
- Cluster snapshot versioning (ADR A1)
- Content-addressed caching:
sha256(sorted(paper_ids) + prompt_version + model) - Shared summaries (not per-user) β Haiku 4.5 + Batch API (~$50-80/month @ 1K users)
- Nightly regeneration job with 7-day TTL + event-triggered refresh
- "You're reading about X" UI framing with sub-theme bullets
- Anthropic Citations API for hallucination prevention
8b β Distilled cross-encoder reranker (Doc 07 Β§B)
- Deploy
cross-encoder/ms-marco-TinyBERT-L-2-v2INT8 ONNX as MVP - 6ms budget for 20 pairs on CPU (AVX-512 VNNI)
- TinyBERT score as LightGBM feature (Option C architecture)
- Custom distillation from BGE-reranker-v2-m3 only if held-out gap >3 nDCG
- MarginMSE loss + SciNCL citation-graph hard negatives
8c β Use-cases and information-gain design doc (Doc 07 Β§C)
- 8 user personas (P1 cold-start through P8 stay-current)
- Information-gain table (save=3-5Γ, dismiss-as-label=β3-4Γ, passive skip=β0.1Γ)
- Mode-switching UI: "Stay Current" vs "Lit Review" toggle
- Failure mode detection rules (feed collapse, stale profile, filter bubble)
Phase 9: Exploration + Collaborative Filtering π NOT STARTED
Blocked by: β₯500 users
- Epsilon-greedy exploration (Ξ΅=0.25 new users, Ξ΅=0.05 established)
- LightFM hybrid CF model with switching strategy
- Category-level negative suppression
- Retrain LightGBM with dismissals as negative labels
Appendix: Infrastructure Status
| Component | Status | Details |
|---|---|---|
| Qdrant Cloud | β Live | 1.6M papers, BGE-M3 1024-dim, BQ enabled, HNSW m=32 |
| Zilliz Cloud | β Live | 1.6M papers, BGE-M3 sparse vectors, collection arxiv_bgem3_sparse |
| Turso (libSQL) | β Live | 1.23 GB arXiv metadata + citations, arxiv-data DB, papers table, unique index on arxiv_id |
| SQLite | β Live | interactions, paper_metadata (local cache), user_profiles, user_clusters |
| HF Spaces | β Deployed | Docker SDK, free tier, port 7860 β https://siddhm11-researchit.hf.space |
| Render | β οΈ Previous target (512MB RAM too small for BGE-M3) | May still be used for non-ML services |
| arXiv API | β Fallback only | Keyword search + metadata for papers not in Turso |
| BGE-M3 Model | β Live | Pre-baked in Docker image, warm-up at startup |
| Groq API | β Live + HF Secret | app/groq_svc.py β 2s timeout, academic heuristic skip |
| Notebooks | β Organized | notebooks/ β 01-upload, 02-test, 03-search-benchmark |
Credentials Status
| Credential | Status | Env Var | Notes |
|---|---|---|---|
| Qdrant Cloud | β
In .env |
QDRANT_URL, QDRANT_API_KEY |
Already wired |
| Zilliz Cloud | β
In .env |
ZILLIZ_URI, ZILLIZ_TOKEN |
Phase 3, wired |
| Turso (libSQL) | β
In .env + HF |
TURSO_URL, TURSO_DB_TOKEN |
Phase 3.5, wired + deployed |
| Groq | β
In .env + HF |
GROQ_API_KEY |
Phase 3, wired + deployed |
| HF Spaces | β Deployed | Secrets panel | All env vars set β |
Appendix: Test Suite
| Test File | Count | Status |
|---|---|---|
tests/test_profiles.py |
11 | β Passing |
tests/test_clustering.py |
21 | β Passing |
tests/test_reranker_diversity.py |
13 | β Passing |
tests/test_reranker_integration.py |
7 | β Passing |
tests/test_phase6_feature_wiring.py |
9 | β Passing |
tests/test_fusion.py |
20 | β Passing |
tests/test_db.py |
19 | β Passing |
tests/test_qdrant_svc.py |
β | β Passing |
tests/test_arxiv_svc.py |
β | β Passing |
tests/test_integration.py |
β | β Passing |
tests/test_user_state.py |
β | β Passing |
tests/test_saved.py |
β | β Passing |
tests/test_hybrid_search.py |
21 | β Passing |
tests/test_search_router.py |
6 | β Passing |
tests/test_live_search.py |
8 | β Passing |
| Total | 203+ | β |
test_e2e_recs.py (standalone) |
1 | β E2E simulation |
Appendix: Doc 06 Corrections β Tracking
| Correction | Status | Where |
|---|---|---|
| Ξ±_long 0.10 β 0.03 | β Applied | app/recommend/profiles.py:30 |
| L2-normalize before Ward clustering | β Applied | app/recommend/clustering.py |
| Medoid not centroid | β Applied | app/recommend/clustering.py β _find_medoid() |
| Negative EWMA wired into reranking | β Applied | app/recommend/reranker.py β Feature 5 |
| RRF β quota fusion for recommendations | β Applied | app/recommend/fusion.py (Phase 4.1) |
| Hungarian cluster matching | β Applied | app/recommend/clustering.py β stabilize_cluster_ids() (Phase 4.3) |
| Per-item short-term negative decay | [!] Backlog | Phase 6 (LightGBM feature) |
| Category-level suppression | β Applied | app/db.py β get_suppressed_categories() (Phase 4.4) |
| BGE-reranker NEVER in hot path | β Followed | Heuristic scorer used instead |