Spaces:

siddhm11
/

ResearchIT

Sleeping

App Files Files Community

ResearchIT / docs /TASK-TRACKER.md

siddhm11

Phase 6.5: Pipeline telemetry, search UX fixes, latency profiling

ec67b2f 20 days ago

preview code

raw

history blame contribute delete

28.7 kB

ResearchIT — Master Task Tracker

Purpose: Single source of truth for all completed, in-progress, and upcoming work.
Last updated: 2026-05-05
Current phase: Phase 6.5 (Instrumentation) — COMPLETE ✔ | Phase 7 next

Legend

[x] — Done
[/] — In progress
[ ] — Not started
[~] — Intentionally deferred (blocked by data/users/scale)
[!] — Backlog item (documented, not yet coded)

Phase 1: Zero-ML Recommender ✅ COMPLETE

Built the foundation: Qdrant connection, arXiv search, save/dismiss, cookie identity, HTMX frontend.

Qdrant Cloud connection (1.6M BGE-M3 papers, BQ, HNSW m=32)
- Collection: arxiv_bgem3_dense, 1024-dim dense vectors
- File: app/qdrant_svc.py → _get_client()
BEST_SCORE Recommend API (raw paper IDs → Qdrant)
- File: app/qdrant_svc.py → recommend()
arXiv keyword API search (placeholder — replaced in Phase 3)
- File: app/arxiv_svc.py → search()
arXiv metadata fetching + SQLite cache
- File: app/arxiv_svc.py → fetch_metadata_batch()
SQLite database schema (interactions, paper_metadata)
- File: app/db.py → init_db()
- WAL mode, async via aiosqlite
Cookie-based user identity
- File: app/config.py → COOKIE_NAME
User state management (positive/negative deques)
- File: app/user_state.py → UserState
Save/Dismiss event logging
- File: app/routers/events.py
HTMX + Jinja2 frontend (search, recs, save, dismiss)
- Files: app/templates/ (base.html, index.html, search.html, saved.html, partials/)
Test suite — 55 tests passing

Gaps: None.

Phase 2a: EWMA Profile Embeddings ✅ COMPLETE

Replaced raw ID-list approach with temporal decay vectors so recent interests outweigh old ones.

Create app/recommend/ module with __init__.py
Create app/recommend/profiles.py — EWMA computation + storage
- Long-term: α=0.03 ✅ (corrected from 0.10 per Doc 06)
- Short-term: α=0.40
- Negative: α=0.15
- All embeddings L2-normalized
Modify app/db.py — add user_profiles table + user_clusters table
Modify app/qdrant_svc.py — add get_paper_vectors() and search_by_vector()
Modify app/routers/events.py — trigger EWMA updates on save/dismiss
Modify app/routers/recommendations.py — EWMA vector search with Tier 2 fallback
Add numpy + scipy to requirements.txt
Tests for profiles module — 11 passed
Full test suite — no regressions

Doc 06 correction applied: α_long 0.10 → 0.03 (PinnerSage rejected 0.10 as too recent-biased).

Gaps: None.

Phase 2b: Ward Clustering + Multi-Interest Retrieval ✅ COMPLETE

Detect distinct user interests via hierarchical clustering, retrieve candidates per interest.

Create app/recommend/clustering.py — Ward clustering + medoid extraction
- L2-normalize embeddings before Ward ✅ (Doc 06 correction)
- Adaptive gap-based threshold (no fixed K)
- Medoid representation (real papers, not centroids) ✅
- Dynamic K (1–7 clusters, auto-determined)
- Recency-weighted importance scores
Modify app/qdrant_svc.py — add multi_interest_search() with prefetch+RRF
Modify app/routers/recommendations.py — 3-tier cascading pipeline
- Tier 1 (≥5 saves): Multi-interest clustering → prefetch + RRF
- Tier 2 (≥3 saves): EWMA long-term vector → single ANN search
- Tier 3 (≥1 save): Qdrant BEST_SCORE Recommend API
Tests for clustering module — 10 passed
Full test suite — no regressions

Doc 06 corrections applied: L2-normalization before Ward, medoid not centroid.

Gaps (deferred to Phase 4):

[!] RRF → quota fusion (dominant clusters can swamp minority interests)
[!] Hungarian matching for cluster ID stability across reclusterings

Phase 2c: Heuristic Re-ranking + MMR Diversity ✅ COMPLETE

Added scoring and diversity layers on top of retrieval to produce the final feed.

Create app/recommend/reranker.py — 5-feature heuristic scorer
- Feature 1: cosine_sim_longterm (weight 0.40)
- Feature 2: cosine_sim_shortterm (weight 0.25)
- Feature 3: paper_age_days / recency (weight 0.15)
- Feature 4: rrf_position (weight 0.10)
- Feature 5: cosine_sim_negative (weight -0.15) ✅ (Doc 06 addition)
Create app/recommend/diversity.py — MMR + exploration injection
- MMR with λ=0.6
- 2 serendipitous exploration papers per feed
Modify app/routers/recommendations.py — full 5-step pipeline
- Step 1: Clustering → Step 2: Retrieval → Step 3: Rerank → Step 4: MMR → Step 5: Exploration
Tests for reranker + diversity — 13 passed
Full test suite — 88 passed (86 + 2 pre-existing live Qdrant failures resolved)

Doc 06 correction applied: Negative EWMA profile wired as Feature 5 with 0.15 penalty.

Gaps: None. LightGBM model now integrated (Phase 6 ✅).

Phase 2d: Advanced Models ❌ DEFERRED (Blocked by data/users)

These logically belong to the recommendation engine but cannot be built without real user data or scale.

[~] LightGBM lambdarank model — requires ≥500 labeled save/dismiss interactions → Phase 6
[~] Collaborative filtering features — requires ≥500 users → Phase 9
[~] DPP diversity — explicitly ruled out for v1 by Doc 06 → Phase 9+
[~] Two-Tower model — requires GPU + large dataset → Phase 9+

Phase 3: Hybrid Semantic Search ✅ COMPLETE

Replace the arXiv keyword API placeholder with real vector-based semantic search using Qdrant dense + Zilliz sparse + RRF.
Detailed plan: docs/phases/PHASE3-Hybrid-Semantic-Search.md
Prototype reference: docs/phases/PHASE2-Hybrid-Search-Plan.md
Deployment target: Hugging Face Spaces (Docker SDK, 16GB RAM, 2 vCPUs)

New files created

app/embed_svc.py — BGE-M3 model singleton (load BAAI/bge-m3 once at startup, ~570MB, ~15s cold)
- encode_query(text) → (dense: np.ndarray[1024], sparse: dict)
- LRU cache for repeat queries
- Thread-safe, lazy loading with double-check locking
app/zilliz_svc.py — Zilliz Cloud sparse search client
- Collection: arxiv_bgem3_sparse
- Schema: id (INT64 auto PK), arxiv_id (VARCHAR), sparse_vector (SPARSE_FLOAT_VECTOR)
- Index: SPARSE_INVERTED_INDEX, metric_type=IP
- Sparse format: {int_token_id: float_weight} (BGE-M3 lexical weights, NOT string words)
- search_sparse(sparse_dict, limit) → list[dict] with arxiv_id + score
- gRPC reconnect handling
app/groq_svc.py — LLM query rewriter (Groq / llama-3.3-70b)
- rewrite(user_query) → academic query string
- Graceful fallback to original query on error
- Academic-detection heuristic to skip unnecessary rewrites
- 2s hard timeout
app/hybrid_search_svc.py — search orchestrator
- Rewrite → Encode → Parallel (Qdrant dense + Zilliz sparse) → RRF → Rerank
- Each step has independent failure handling
- Recency reranking: 0.80 RRF + 0.20 recency

Files modified

app/config.py — added ZILLIZ_URI, ZILLIZ_TOKEN, ZILLIZ_COLLECTION, GROQ_API_KEY, BGE_M3_MODEL, BGE_M3_DEVICE, ENCODE_CACHE_SIZE, search weights, APP_PORT
app/qdrant_svc.py — added search_dense(dense_vec, limit) for raw vector search returning scores
app/routers/search.py — swapped arxiv_svc.search() → hybrid_search_svc.search() with arXiv fallback
app/main.py — added graceful BGE-M3 warm-up to lifespan
requirements.txt — added FlagEmbedding, pymilvus, groq
run.py — configurable port (7860 default for HF Spaces)

Deployment files created

Dockerfile — HF Spaces Docker SDK, CPU-only PyTorch, pre-baked BGE-M3 model
.dockerignore — excludes notebooks, PDFs, databases, caches

Implementation steps completed

Step 1: BGE-M3 model service (embed_svc.py) + unit tests
Step 2: Zilliz client (zilliz_svc.py)
Step 3: Dense search in Qdrant service
Step 4: Groq rewriter (groq_svc.py)
Step 5: Hybrid search orchestrator (hybrid_search_svc.py)
Step 6: Swap search router
Step 7: Model warm-up + deployment config
Step 8: Tests — 21 new tests passing (RRF, recency, Groq heuristics, embed edge cases, orchestrator mocks)

Test results

88 original tests: ✅ All pass (zero regressions)
21 Phase 3 unit tests: ✅ All pass (RRF, recency, Groq, embed, orchestrator mocks)
6 search router tests: ✅ All pass (ranking, fallback, HTMX, saved state)
8 live service tests: ✅ All pass (Qdrant dense, Zilliz sparse, Groq rewrite, parallel)
Total: 123 tests passing

Latency budget

Stage	Time
LLM rewrite (Groq)	~300ms (skippable)
BGE-M3 encode (CPU)	~300ms first, ~0ms cached
Qdrant + Zilliz (parallel)	~300ms
RRF + rerank	<5ms
Total (warm)	~600ms

Phase 3.5: Turso ArXiv Metadata DB ✅ COMPLETE

Bulk-loaded 1.23 GB of arXiv paper metadata + citation data to Turso (libSQL) cloud DB.
Eliminates the unstable arXiv API dependency for metadata fetching (Phase 4.2 solved early).
Integrated into codebase and deployed to HF Spaces.

Infrastructure

Turso cloud DB created: arxiv-data on aws-ap-south-1
- URL: https://arxiv-data-siddhm11.aws-ap-south-1.turso.io
- Auth: Platform token + DB auth token (minted via CLI)
Table: papers with columns:
- arxiv_id (TEXT, UNIQUE INDEX idx_papers_arxiv_id)
- title (TEXT)
- authors (TEXT)
- categories (TEXT)
- primary_topic (TEXT)
- update_date (TEXT)
- abstract_preview (TEXT, truncated to 500 chars)
- citation_count (INTEGER, default 0)
- influential_citations (INTEGER, default 0)
Data sources:
- arxiv_comprehensive_papers.csv (Kaggle: siddhm11/arxivdata)
- arxiv_citations_summary.csv (Kaggle: siddhm11/citation-data-letsgoo)
- Joined on id = arxiv_id_clean, deduplicated
Row count verified: local ↔ remote match
Unique index on arxiv_id for fast lookups

Integration (DONE)

Added TURSO_URL and TURSO_DB_TOKEN to config.py / .env / HF Secrets
Created app/turso_svc.py — metadata lookup service
- fetch_metadata_batch(arxiv_ids) → {arxiv_id: paper_dict}
- Uses Turso HTTP pipeline API (zero new Python deps — just httpx)
- Includes citation_count + influential_citations
app/routers/search.py — Turso primary, arXiv API fallback (only for IDs not in Turso)
Created tests/test_turso_timing.py — timing benchmark
Verified: 10/10 title match, 6.1x end-to-end speedup on HF Spaces
Impact: Avg search time dropped from ~10.7s to ~1.75s on HF Spaces

Phase 4: Recommendation Pipeline Fixes ✅ COMPLETE

Fixed the known architectural debt in the recommendation pipeline.
Detailed plan: docs/phases/PHASE4-Recommendation-Pipeline-Fixes.md

4.1 — Replace RRF with Importance-Weighted Quota Fusion

Create app/recommend/fusion.py — quota allocation logic
- w_k = importance_k / sum(importance_k)
- slot_k = max(floor(F × w_k), F_min=3) — every cluster gets at least 3 slots
- Distribute remainder by largest fractional part
Create tests/test_fusion.py — 20 unit tests for quota allocation
- Proportionality, floor enforcement, total invariant, edge cases, Doc 06 worked examples
Refactor _multi_interest_recommend() in recommendations.py
- Replace multi_interest_search() with per-cluster separate ANN queries
- Use asyncio.gather() for concurrent searches (~15ms wall-clock)
- Allocate feed slots proportionally via allocate_quotas()
- Deduplicate across clusters (first-occurrence = highest-ranked cluster wins)
- MMR over merged union (unchanged)
Keep qdrant_svc.multi_interest_search() in codebase (no deletion)

4.2 — Pre-populate Metadata Store ✅ DONE (via Turso)

Bulk-loaded arXiv metadata from Kaggle to Turso cloud DB (Phase 3.5)
1.23 GB, includes citation counts from Semantic Scholar
Wired Turso service into search.py (Turso primary, arXiv API fallback)
arXiv API is now fallback only for genuinely new papers
Impact: Search time dropped from ~10.7s to ~1.75s on HF Spaces

4.3 — Hungarian Matching for Cluster Stability

Add stabilize_cluster_ids() function to clustering.py
- Uses scipy.optimize.linear_sum_assignment (already a dependency)
- Cost matrix: 1 - cosine_sim(new_medoid, old_medoid) — trivial at K≤7
- Matched clusters keep old indices; new clusters get next available
- Min cosine threshold (0.5) rejects unrelated matches
Call between compute_clusters() and save_clusters_to_db() in recommendations.py
10 tests in test_clustering.py — perturbed clusters preserve indices, unrelated match rejection, K growth/shrink, custom thresholds

4.4 — Category-Level Negative Suppression

Add get_suppressed_categories() to db.py
- Joins interactions + paper_metadata to find categories with ≥3 dismissals
- Primary category only (decision: avoid over-suppression)
- 14-day window (standard default, τ_neg = 14 days)
Add suppression filter in _multi_interest_recommend() after reranking
Cache Turso metadata to paper_metadata via cache_turso_metadata_batch()
8 tests in test_db.py — threshold, partitioning, user isolation, custom threshold
[~] Per-item short-term decay → deferred to Phase 6 (LightGBM feature)

Gaps: None.

Phase 4.5: Instrumentation Foundation ✅ COMPLETE

Added telemetry columns to the interactions table so every saved/dismissed paper can be attributed to its pipeline tier, cluster origin, and ranker version. Doc 07 (ADR A4) identified this as the single most valuable early investment — retrofitting these fields after real user data exists is painful and blocks all later counterfactual evaluation.

Schema changes

Add ranker_version TEXT to interactions table — pipeline version tag
Add candidate_source TEXT to interactions — e.g. cluster_0, exploration, ewma_longterm, qdrant_recommend, short_term_supplement
Add cluster_id INTEGER to interactions — interest cluster index (NULL if N/A)
ALTER TABLE migration for existing DBs (safe try/except, idempotent)

Pipeline tagging

Add _RANKER_VERSION constant to recommendations.py
Tag Tier 1 papers with cluster origin, exploration status, short-term supplement
Tag Tier 2 papers as ewma_longterm
Tag Tier 3 papers as qdrant_recommend
Build paper_cluster_map before quota merge (first-occurrence = cluster attribution)
Exploration papers tagged as candidate_source='exploration'

End-to-end flow

recommendations.py embeds tags in paper dicts
action_buttons.html includes tags in hx-vals JSON
events.py accepts ranker_version, candidate_source, cluster_id Form fields
db.log_interaction() stores all three new columns

Files modified: app/db.py, app/routers/events.py, app/routers/recommendations.py, app/templates/partials/action_buttons.html

Gaps: None. propensity and policy_id fields deferred until ε-greedy exploration (Phase 9).

Phase 5: Cold-Start Onboarding ✅ COMPLETE

Onboarding wizard for new users — category selection + seed paper search + trending fallback.
Reference: Doc 06 — "4-37% lift even once behavioral data exists"

5.1 — arXiv Category Multi-Select ✅

UI screen on first visit: select 1-8 arXiv category groups
Store selections in SQLite (user_onboarding table)
Use as pool filter for recommendations (via get_user_category_filter())
Preserve as LightGBM feature permanently (Feature 26: onboarding_category_match)
Does NOT create "subject vectors" — just filters

5.2 — Seed Paper Import ✅

Let users search for and save seed papers during onboarding
Immediately create EWMA profiles + Ward clusters on next feed request
Uses hybrid search (Phase 3) for discovery

5.3 — ORCID / Semantic Scholar Import ❌ REMOVED

S2 author import was implemented but removed — not the onboarding direction we want. Onboarding focuses on category selection + manual seed paper search.

5.4 — Popularity Fallback ✅

Category-filtered trending papers served via turso_svc.fetch_trending_by_categories()
1-hour TTL trending cache for performance

Phase 6: LightGBM Re-ranker ✅ COMPLETE

Replaced heuristic scorer with a trained LightGBM lambdarank model.
Unblocked via citation-graph pseudo-labels from Semantic Scholar.
Handoff doc: docs/PHASE6-HANDOFF.md
Model repo: siddhm11/researchit-reranker-phase6

6.1 — ML Intern: Data Pipeline + Model Training ✅

Export 1.6M arXiv IDs from Turso → arxiv_ids.txt (scripts/export_arxiv_ids.py)
Fetch 242K citation edges from Semantic Scholar Batch API (01_fetch_citation_edges.py)
Generate 98K training triples with pseudo-labels: cited=2, co-cited=1, negative=0 (02_generate_training_triples.py)
37-feature schema (20 content, 11 user behavior, 6 cross-features)
Train LightGBM LambdaRank model: 141 trees, 63 leaves, lr=0.05 (03_train_lightgbm.py)
nDCG@10 = 0.879 (+233% vs heuristic baseline)
All artifacts pushed to HuggingFace

6.2 — Opus: Integration into ResearchIT ✅

Rewrite app/recommend/reranker.py — 5 features → 37 features
LightGBM model loading at import time with heuristic fallback
Multi-path model file search (env var → relative → absolute)
Backward-compatible rerank_candidates() signature (old callers unaffected)
Add lightgbm>=4.0,<5.0 to requirements.txt
Fix CRLF→LF line endings in model file (Windows Git issue)
7 integration tests — all passing (tests/test_reranker_integration.py)
Latency verified: 0.223ms per 100 candidates (target: <1ms) ✅

6.3 — Antigravity: Feature Wiring + Deployment Verification ✅

Wire all 37 features into recommendations.py caller (was legacy 6-arg signature)
Per-candidate cluster_importance (N,) from paper_cluster_map
Per-candidate cluster_medoid (N, 1024) per source cluster
Pre-computed is_suppressed_category and onboarding_category_match arrays
Pass qdrant_scores, user_total_saves, user_total_dismissals
reranker.py supports both scalar broadcast and per-candidate arrays
Add model accessors: is_model_loaded(), get_num_trees(), get_loaded_model_path()
Add per-request feature activation logging
Create GET /healthz/reranker endpoint (app/routers/health.py)
Bug B fix: persist medoid_embedding_blob BLOB in user_clusters table
Bug B fix: fall back to persisted blob instead of zero vector in Hungarian matching
DB migration: ALTER TABLE user_clusters ADD COLUMN medoid_embedding_blob BLOB
9 new tests — all passing (tests/test_phase6_feature_wiring.py)
Full suite: 203+ tests passing, 0 failures
Updated CLAUDE.md, PHASE6-HANDOFF.md, README.md

6.4 — Retraining [~] DEFERRED

Phase 6.4 retraining is deferred. The published model siddhm11/researchit-reranker-phase6 was trained on citation pseudo-labels with features 23–30 zero. Retraining is gated on either (a) the synthetic-user simulator (Phase 6.4b, ~30 days out) or (b) crossing 100 real users with ≥10 saves each. Until then, Phase 6.1+6.2+6.3 plumbing is the unit of deliverable work.

[~] Synthetic user simulator (scripts/simulate_users.py) — target: +30d
[~] Real-user retrain at 100-user threshold — target: +90d or threshold
[~] HF model card backfill (library_name, pipeline_tag, metrics, schema)

Phase 6.5: Instrumentation ✅ COMPLETE

Purpose: Stabilize the recommendation pipeline and prepare telemetry substrate for Phase 7 evaluation.

A1 — Real Qdrant cosine scores

Switch search_by_vector() → search_by_vector_with_scores() in per-cluster + short-term searches
Build qdrant_score_map from real cosines (replaces fake 1.0 - rank*0.01 linear decay)
Feature 0 (qdrant_cosine_score) now receives actual cosine similarities

A2 — Deployment verification

curl /healthz/reranker → model_loaded=true, n_trees=141, fallback_active=false
Verification timestamp added to PHASE6-Reranker-Framing.md

B1 — query_id linkage

Generate query_id (UUID) once per feed request in get_recommendations()
Thread through all 4 tiers: trending, Tier 1, Tier 2, Tier 3
Generate query_id in search.py per search request
Add query_id + position to action_buttons.html hx-vals

B2 — Propensity logging

Add propensity REAL + policy_id TEXT migration to interactions table
Extend db.log_interaction() with propensity + policy_id params
Compute propensity: 1.0 (deterministic) vs n_explore/pool_size (exploration)
Thread through templates + events.py Form params

B3 — Cluster snapshot versioning

Add cluster_snapshots table (append-only, content-addressed via paper_ids_hash)
save_cluster_snapshot() called after each save_clusters_to_db()
prune_old_snapshots(30) on startup in main.py lifespan

B4 — S2 author import ❌ REMOVED

S2 author import was implemented and then removed — not the onboarding direction we want. app/s2_svc.py, the /api/onboarding/import-author endpoint, and the quick-import UI have all been deleted. Onboarding uses category selection + manual seed search only.

Documentation

CLAUDE.md: Rule 3.11 — interaction instrumentation invariants
_RANKER_VERSION bumped to v6.5_lightgbm_real_cosines
Phase status updated to 6.5 COMPLETE
Tests: 203+ passing

Test suite

tests/test_reranker_integration.py — 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)
tests/test_phase6_feature_wiring.py — 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)
tests/demo_reranker.py — interactive demo with 20 realistic papers

Phase 7: Evaluation Framework 📋 NOT STARTED

Build offline and online evaluation before scaling users.
Estimated effort: ~1 week

Offline metrics: nDCG@10, Recall@50, HR@10, ILS, category entropy
Time-split evaluation on unarXive 2022 + S2ORC
Online metrics (once users exist): CTR, save rate, dwell time, return rate

Phase 8: LLM Interest Summaries + Distilled Re-ranker 📋 NOT STARTED

Estimated effort: ~10-12 weeks (Doc 07)
Detailed research plan: docs/research/07-LLM-Summaries-Reranker-and-Scaling-Research.md
Entry criteria: Phase 7 eval producing stable nDCG@10; cluster stability Jaccard ≥0.7 over 7 days

8a — Claude-generated per-cluster interest summaries (Doc 07 §A)

Cluster snapshot versioning (ADR A1)
Content-addressed caching: sha256(sorted(paper_ids) + prompt_version + model)
Shared summaries (not per-user) — Haiku 4.5 + Batch API (~$50-80/month @ 1K users)
Nightly regeneration job with 7-day TTL + event-triggered refresh
"You're reading about X" UI framing with sub-theme bullets
Anthropic Citations API for hallucination prevention

8b — Distilled cross-encoder reranker (Doc 07 §B)

Deploy cross-encoder/ms-marco-TinyBERT-L-2-v2 INT8 ONNX as MVP
6ms budget for 20 pairs on CPU (AVX-512 VNNI)
TinyBERT score as LightGBM feature (Option C architecture)
Custom distillation from BGE-reranker-v2-m3 only if held-out gap >3 nDCG
MarginMSE loss + SciNCL citation-graph hard negatives

8c — Use-cases and information-gain design doc (Doc 07 §C)

8 user personas (P1 cold-start through P8 stay-current)
Information-gain table (save=3-5×, dismiss-as-label=−3-4×, passive skip=−0.1×)
Mode-switching UI: "Stay Current" vs "Lit Review" toggle
Failure mode detection rules (feed collapse, stale profile, filter bubble)

Phase 9: Exploration + Collaborative Filtering 📋 NOT STARTED

Blocked by: ≥500 users

Epsilon-greedy exploration (ε=0.25 new users, ε=0.05 established)
LightFM hybrid CF model with switching strategy
Category-level negative suppression
Retrain LightGBM with dismissals as negative labels

Appendix: Infrastructure Status

Component	Status	Details
Qdrant Cloud	✅ Live	1.6M papers, BGE-M3 1024-dim, BQ enabled, HNSW m=32
Zilliz Cloud	✅ Live	1.6M papers, BGE-M3 sparse vectors, collection `arxiv_bgem3_sparse`
Turso (libSQL)	✅ Live	1.23 GB arXiv metadata + citations, `arxiv-data` DB, `papers` table, unique index on `arxiv_id`
SQLite	✅ Live	interactions, paper_metadata (local cache), user_profiles, user_clusters
HF Spaces	✅ Deployed	Docker SDK, free tier, port 7860 — https://siddhm11-researchit.hf.space
Render	⚠️ Previous target (512MB RAM too small for BGE-M3)	May still be used for non-ML services
arXiv API	✅ Fallback only	Keyword search + metadata for papers not in Turso
BGE-M3 Model	✅ Live	Pre-baked in Docker image, warm-up at startup
Groq API	✅ Live + HF Secret	`app/groq_svc.py` — 2s timeout, academic heuristic skip
Notebooks	✅ Organized	`notebooks/` — 01-upload, 02-test, 03-search-benchmark

Credentials Status

Credential	Status	Env Var	Notes
Qdrant Cloud	✅ In `.env`	`QDRANT_URL`, `QDRANT_API_KEY`	Already wired
Zilliz Cloud	✅ In `.env`	`ZILLIZ_URI`, `ZILLIZ_TOKEN`	Phase 3, wired
Turso (libSQL)	✅ In `.env` + HF	`TURSO_URL`, `TURSO_DB_TOKEN`	Phase 3.5, wired + deployed
Groq	✅ In `.env` + HF	`GROQ_API_KEY`	Phase 3, wired + deployed
HF Spaces	✅ Deployed	Secrets panel	All env vars set ✔

Appendix: Test Suite

Test File	Count	Status
`tests/test_profiles.py`	11	✅ Passing
`tests/test_clustering.py`	21	✅ Passing
`tests/test_reranker_diversity.py`	13	✅ Passing
`tests/test_reranker_integration.py`	7	✅ Passing
`tests/test_phase6_feature_wiring.py`	9	✅ Passing
`tests/test_fusion.py`	20	✅ Passing
`tests/test_db.py`	19	✅ Passing
`tests/test_qdrant_svc.py`	—	✅ Passing
`tests/test_arxiv_svc.py`	—	✅ Passing
`tests/test_integration.py`	—	✅ Passing
`tests/test_user_state.py`	—	✅ Passing
`tests/test_saved.py`	—	✅ Passing
`tests/test_hybrid_search.py`	21	✅ Passing
`tests/test_search_router.py`	6	✅ Passing
`tests/test_live_search.py`	8	✅ Passing
Total	203+	✅
`test_e2e_recs.py` (standalone)	1	✅ E2E simulation

Appendix: Doc 06 Corrections — Tracking

Correction	Status	Where
α_long 0.10 → 0.03	✅ Applied	`app/recommend/profiles.py:30`
L2-normalize before Ward clustering	✅ Applied	`app/recommend/clustering.py`
Medoid not centroid	✅ Applied	`app/recommend/clustering.py` → `_find_medoid()`
Negative EWMA wired into reranking	✅ Applied	`app/recommend/reranker.py` → Feature 5
RRF → quota fusion for recommendations	✅ Applied	`app/recommend/fusion.py` (Phase 4.1)
Hungarian cluster matching	✅ Applied	`app/recommend/clustering.py` → `stabilize_cluster_ids()` (Phase 4.3)
Per-item short-term negative decay	[!] Backlog	Phase 6 (LightGBM feature)
Category-level suppression	✅ Applied	`app/db.py` → `get_suppressed_categories()` (Phase 4.4)
BGE-reranker NEVER in hot path	✅ Followed	Heuristic scorer used instead