Spaces:

siddhm11
/

ResearchIT

Running

siddhm11 commited on 26 days ago

Commit

d2f0bed

1 Parent(s): 239539e

Phase 6.5: Documentation finalization

- CLAUDE.md: Add Rule 3.11 (interaction instrumentation invariants),
update phase status to 6.5 COMPLETE, bump last-updated date
- TASK-TRACKER.md: Add full Phase 6.5 section with all completed items
- recommendations.py: Bump _RANKER_VERSION to v6.5_lightgbm_real_cosines

Tests: 203 passed, 0 failures

Files changed (3) hide show

CLAUDE.md +11 -2
app/routers/recommendations.py +1 -1
docs/TASK-TRACKER.md +43 -2

CLAUDE.md CHANGED Viewed

@@ -160,11 +160,20 @@ ArXiv IDs can have leading zeros (e.g., `0704.0001`). **Treat all arXiv IDs as s
 The per-cluster origin of each retrieved candidate is preserved end-to-end via `paper_cluster_map: dict[str, int]` (built in `recommendations.py` before `merge_quota_results()`). This mapping flows through to the reranker as per-candidate `cluster_importance` (N,) and `cluster_medoid` (N, 1024) arrays. **Do not re-introduce dominant-cluster shortcuts as "simplifications"** — LightGBM feature slot 24 (`cluster_distance_to_medoid`) depends on per-candidate medoids to correctly score papers from minority-interest clusters.
 ---
 ## 4. What is in scope vs out of scope right now
-**Current phase: Phase 6 COMPLETE; Phase 7 (Evaluation Framework) next.** Phase 2 (a, b, c) is complete with Doc 06 corrections applied. Phase 3 (Hybrid Semantic Search) and Phase 3.5 (Turso metadata DB) are implemented and tested.
 **What has been built (Phases 1-2c):**
 - Qdrant BEST_SCORE recommend API (Tier 3 fallback)
@@ -459,4 +468,4 @@ If a topic is too large for a 06 changelog entry, create `docs/research/07-[topi
 ---
-*Last updated: 2026-05-03. Update this date when CLAUDE.md changes.*

 The per-cluster origin of each retrieved candidate is preserved end-to-end via `paper_cluster_map: dict[str, int]` (built in `recommendations.py` before `merge_quota_results()`). This mapping flows through to the reranker as per-candidate `cluster_importance` (N,) and `cluster_medoid` (N, 1024) arrays. **Do not re-introduce dominant-cluster shortcuts as "simplifications"** — LightGBM feature slot 24 (`cluster_distance_to_medoid`) depends on per-candidate medoids to correctly score papers from minority-interest clusters.
+### 3.11 Interaction instrumentation invariants (Phase 6.5)
+Every interaction logged via `db.log_interaction()` must carry **`query_id`**, **`propensity`**, and **`policy_id`**. These are required for Phase 7 evaluation:
+- `query_id` (UUID): links all papers in a single feed request for per-feed CTR.
+- `propensity` (float): probability the serving policy chose to show this paper (1.0 for deterministic, `n_explore/pool_size` for exploration).
+- `policy_id` (string): identifies the pipeline version (`_RANKER_VERSION`).
+**When adding a new recommendation tier or call path**, always include these three fields in the `paper_tags` dict. The round-trip is: `recommendations.py` → paper dict → `action_buttons.html` `hx-vals` → `events.py` Form params → `db.log_interaction()`.
 ---
 ## 4. What is in scope vs out of scope right now
+**Current phase: Phase 6.5 COMPLETE; Phase 7 (Evaluation Framework) next.** Phase 2 (a, b, c) is complete with Doc 06 corrections applied. Phase 3 (Hybrid Semantic Search) and Phase 3.5 (Turso metadata DB) are implemented and tested.
 **What has been built (Phases 1-2c):**
 - Qdrant BEST_SCORE recommend API (Tier 3 fallback)
 ---
+*Last updated: 2026-05-05. Update this date when CLAUDE.md changes.*

app/routers/recommendations.py CHANGED Viewed

@@ -39,7 +39,7 @@ router = APIRouter(prefix="/api")
 # Phase 4.5: Pipeline version tag for instrumentation.  Bump this on any
 # change to the ranking logic so A/B attribution is possible.
-_RANKER_VERSION = "v4.1_quota_hungarian_suppression"
 # Minimum EWMA interactions before switching from ID-based to vector-based recs
 _MIN_EWMA_INTERACTIONS = 3

 # Phase 4.5: Pipeline version tag for instrumentation.  Bump this on any
 # change to the ranking logic so A/B attribution is possible.
+_RANKER_VERSION = "v6.5_lightgbm_real_cosines"
 # Minimum EWMA interactions before switching from ID-based to vector-based recs
 _MIN_EWMA_INTERACTIONS = 3

docs/TASK-TRACKER.md CHANGED Viewed

@@ -1,8 +1,8 @@
 # ResearchIT — Master Task Tracker
 > **Purpose**: Single source of truth for all completed, in-progress, and upcoming work.
-> **Last updated**: 2026-05-03
-> **Current phase**: Phase 6 (LightGBM Reranker) — COMPLETE ✔ | Phase 7 next
 ---
@@ -402,6 +402,47 @@
 - [~] Real-user retrain at 100-user threshold — target: +90d or threshold
 - [~] HF model card backfill (library_name, pipeline_tag, metrics, schema)
 ### Test suite
 - `tests/test_reranker_integration.py` — 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)
 - `tests/test_phase6_feature_wiring.py` — 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)

 # ResearchIT — Master Task Tracker
 > **Purpose**: Single source of truth for all completed, in-progress, and upcoming work.
+> **Last updated**: 2026-05-05
+> **Current phase**: Phase 6.5 (Instrumentation) — COMPLETE ✔ | Phase 7 next
 ---
 - [~] Real-user retrain at 100-user threshold — target: +90d or threshold
 - [~] HF model card backfill (library_name, pipeline_tag, metrics, schema)
+## Phase 6.5: Instrumentation ✅ COMPLETE
+> **Purpose**: Stabilize the recommendation pipeline and prepare telemetry substrate for Phase 7 evaluation.
+### A1 — Real Qdrant cosine scores
+- [x] Switch `search_by_vector()` → `search_by_vector_with_scores()` in per-cluster + short-term searches
+- [x] Build `qdrant_score_map` from real cosines (replaces fake `1.0 - rank*0.01` linear decay)
+- [x] Feature 0 (`qdrant_cosine_score`) now receives actual cosine similarities
+### A2 — Deployment verification
+- [x] `curl /healthz/reranker` → `model_loaded=true, n_trees=141, fallback_active=false`
+- [x] Verification timestamp added to `PHASE6-Reranker-Framing.md`
+### B1 — query_id linkage
+- [x] Generate `query_id` (UUID) once per feed request in `get_recommendations()`
+- [x] Thread through all 4 tiers: trending, Tier 1, Tier 2, Tier 3
+- [x] Generate `query_id` in `search.py` per search request
+- [x] Add `query_id` + `position` to `action_buttons.html` hx-vals
+### B2 — Propensity logging
+- [x] Add `propensity REAL` + `policy_id TEXT` migration to `interactions` table
+- [x] Extend `db.log_interaction()` with propensity + policy_id params
+- [x] Compute propensity: 1.0 (deterministic) vs `n_explore/pool_size` (exploration)
+- [x] Thread through templates + `events.py` Form params
+### B3 — Cluster snapshot versioning
+- [x] Add `cluster_snapshots` table (append-only, content-addressed via `paper_ids_hash`)
+- [x] `save_cluster_snapshot()` called after each `save_clusters_to_db()`
+- [x] `prune_old_snapshots(30)` on startup in `main.py` lifespan
+### B4 — S2 author import (Phase 5.1)
+- [x] `app/s2_svc.py`: parse S2 URL / raw ID / ORCID, fetch author papers from S2 API
+- [x] `POST /api/onboarding/import-author` endpoint in `onboarding.py`
+- [x] Quick-import form added to `seed_search.html` template
+### Documentation
+- [x] `CLAUDE.md`: Rule 3.11 — interaction instrumentation invariants
+- [x] `_RANKER_VERSION` bumped to `v6.5_lightgbm_real_cosines`
+- [x] Phase status updated to 6.5 COMPLETE
+- [x] Tests: 203+ passing
 ### Test suite
 - `tests/test_reranker_integration.py` — 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)
 - `tests/test_phase6_feature_wiring.py` — 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)