siddhm11 commited on
Commit
d2f0bed
Β·
1 Parent(s): 239539e

Phase 6.5: Documentation finalization

Browse files

- CLAUDE.md: Add Rule 3.11 (interaction instrumentation invariants),
update phase status to 6.5 COMPLETE, bump last-updated date
- TASK-TRACKER.md: Add full Phase 6.5 section with all completed items
- recommendations.py: Bump _RANKER_VERSION to v6.5_lightgbm_real_cosines

Tests: 203 passed, 0 failures

CLAUDE.md CHANGED
@@ -160,11 +160,20 @@ ArXiv IDs can have leading zeros (e.g., `0704.0001`). **Treat all arXiv IDs as s
160
 
161
  The per-cluster origin of each retrieved candidate is preserved end-to-end via `paper_cluster_map: dict[str, int]` (built in `recommendations.py` before `merge_quota_results()`). This mapping flows through to the reranker as per-candidate `cluster_importance` (N,) and `cluster_medoid` (N, 1024) arrays. **Do not re-introduce dominant-cluster shortcuts as "simplifications"** β€” LightGBM feature slot 24 (`cluster_distance_to_medoid`) depends on per-candidate medoids to correctly score papers from minority-interest clusters.
162
 
 
 
 
 
 
 
 
 
 
163
  ---
164
 
165
  ## 4. What is in scope vs out of scope right now
166
 
167
- **Current phase: Phase 6 COMPLETE; Phase 7 (Evaluation Framework) next.** Phase 2 (a, b, c) is complete with Doc 06 corrections applied. Phase 3 (Hybrid Semantic Search) and Phase 3.5 (Turso metadata DB) are implemented and tested.
168
 
169
  **What has been built (Phases 1-2c):**
170
  - Qdrant BEST_SCORE recommend API (Tier 3 fallback)
@@ -459,4 +468,4 @@ If a topic is too large for a 06 changelog entry, create `docs/research/07-[topi
459
 
460
  ---
461
 
462
- *Last updated: 2026-05-03. Update this date when CLAUDE.md changes.*
 
160
 
161
  The per-cluster origin of each retrieved candidate is preserved end-to-end via `paper_cluster_map: dict[str, int]` (built in `recommendations.py` before `merge_quota_results()`). This mapping flows through to the reranker as per-candidate `cluster_importance` (N,) and `cluster_medoid` (N, 1024) arrays. **Do not re-introduce dominant-cluster shortcuts as "simplifications"** β€” LightGBM feature slot 24 (`cluster_distance_to_medoid`) depends on per-candidate medoids to correctly score papers from minority-interest clusters.
162
 
163
+ ### 3.11 Interaction instrumentation invariants (Phase 6.5)
164
+
165
+ Every interaction logged via `db.log_interaction()` must carry **`query_id`**, **`propensity`**, and **`policy_id`**. These are required for Phase 7 evaluation:
166
+ - `query_id` (UUID): links all papers in a single feed request for per-feed CTR.
167
+ - `propensity` (float): probability the serving policy chose to show this paper (1.0 for deterministic, `n_explore/pool_size` for exploration).
168
+ - `policy_id` (string): identifies the pipeline version (`_RANKER_VERSION`).
169
+
170
+ **When adding a new recommendation tier or call path**, always include these three fields in the `paper_tags` dict. The round-trip is: `recommendations.py` β†’ paper dict β†’ `action_buttons.html` `hx-vals` β†’ `events.py` Form params β†’ `db.log_interaction()`.
171
+
172
  ---
173
 
174
  ## 4. What is in scope vs out of scope right now
175
 
176
+ **Current phase: Phase 6.5 COMPLETE; Phase 7 (Evaluation Framework) next.** Phase 2 (a, b, c) is complete with Doc 06 corrections applied. Phase 3 (Hybrid Semantic Search) and Phase 3.5 (Turso metadata DB) are implemented and tested.
177
 
178
  **What has been built (Phases 1-2c):**
179
  - Qdrant BEST_SCORE recommend API (Tier 3 fallback)
 
468
 
469
  ---
470
 
471
+ *Last updated: 2026-05-05. Update this date when CLAUDE.md changes.*
app/routers/recommendations.py CHANGED
@@ -39,7 +39,7 @@ router = APIRouter(prefix="/api")
39
 
40
  # Phase 4.5: Pipeline version tag for instrumentation. Bump this on any
41
  # change to the ranking logic so A/B attribution is possible.
42
- _RANKER_VERSION = "v4.1_quota_hungarian_suppression"
43
 
44
  # Minimum EWMA interactions before switching from ID-based to vector-based recs
45
  _MIN_EWMA_INTERACTIONS = 3
 
39
 
40
  # Phase 4.5: Pipeline version tag for instrumentation. Bump this on any
41
  # change to the ranking logic so A/B attribution is possible.
42
+ _RANKER_VERSION = "v6.5_lightgbm_real_cosines"
43
 
44
  # Minimum EWMA interactions before switching from ID-based to vector-based recs
45
  _MIN_EWMA_INTERACTIONS = 3
docs/TASK-TRACKER.md CHANGED
@@ -1,8 +1,8 @@
1
  # ResearchIT β€” Master Task Tracker
2
 
3
  > **Purpose**: Single source of truth for all completed, in-progress, and upcoming work.
4
- > **Last updated**: 2026-05-03
5
- > **Current phase**: Phase 6 (LightGBM Reranker) β€” COMPLETE βœ” | Phase 7 next
6
 
7
  ---
8
 
@@ -402,6 +402,47 @@
402
  - [~] Real-user retrain at 100-user threshold β€” target: +90d or threshold
403
  - [~] HF model card backfill (library_name, pipeline_tag, metrics, schema)
404
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
405
  ### Test suite
406
  - `tests/test_reranker_integration.py` β€” 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)
407
  - `tests/test_phase6_feature_wiring.py` β€” 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)
 
1
  # ResearchIT β€” Master Task Tracker
2
 
3
  > **Purpose**: Single source of truth for all completed, in-progress, and upcoming work.
4
+ > **Last updated**: 2026-05-05
5
+ > **Current phase**: Phase 6.5 (Instrumentation) β€” COMPLETE βœ” | Phase 7 next
6
 
7
  ---
8
 
 
402
  - [~] Real-user retrain at 100-user threshold β€” target: +90d or threshold
403
  - [~] HF model card backfill (library_name, pipeline_tag, metrics, schema)
404
 
405
+ ## Phase 6.5: Instrumentation βœ… COMPLETE
406
+
407
+ > **Purpose**: Stabilize the recommendation pipeline and prepare telemetry substrate for Phase 7 evaluation.
408
+
409
+ ### A1 β€” Real Qdrant cosine scores
410
+ - [x] Switch `search_by_vector()` β†’ `search_by_vector_with_scores()` in per-cluster + short-term searches
411
+ - [x] Build `qdrant_score_map` from real cosines (replaces fake `1.0 - rank*0.01` linear decay)
412
+ - [x] Feature 0 (`qdrant_cosine_score`) now receives actual cosine similarities
413
+
414
+ ### A2 β€” Deployment verification
415
+ - [x] `curl /healthz/reranker` β†’ `model_loaded=true, n_trees=141, fallback_active=false`
416
+ - [x] Verification timestamp added to `PHASE6-Reranker-Framing.md`
417
+
418
+ ### B1 β€” query_id linkage
419
+ - [x] Generate `query_id` (UUID) once per feed request in `get_recommendations()`
420
+ - [x] Thread through all 4 tiers: trending, Tier 1, Tier 2, Tier 3
421
+ - [x] Generate `query_id` in `search.py` per search request
422
+ - [x] Add `query_id` + `position` to `action_buttons.html` hx-vals
423
+
424
+ ### B2 β€” Propensity logging
425
+ - [x] Add `propensity REAL` + `policy_id TEXT` migration to `interactions` table
426
+ - [x] Extend `db.log_interaction()` with propensity + policy_id params
427
+ - [x] Compute propensity: 1.0 (deterministic) vs `n_explore/pool_size` (exploration)
428
+ - [x] Thread through templates + `events.py` Form params
429
+
430
+ ### B3 β€” Cluster snapshot versioning
431
+ - [x] Add `cluster_snapshots` table (append-only, content-addressed via `paper_ids_hash`)
432
+ - [x] `save_cluster_snapshot()` called after each `save_clusters_to_db()`
433
+ - [x] `prune_old_snapshots(30)` on startup in `main.py` lifespan
434
+
435
+ ### B4 β€” S2 author import (Phase 5.1)
436
+ - [x] `app/s2_svc.py`: parse S2 URL / raw ID / ORCID, fetch author papers from S2 API
437
+ - [x] `POST /api/onboarding/import-author` endpoint in `onboarding.py`
438
+ - [x] Quick-import form added to `seed_search.html` template
439
+
440
+ ### Documentation
441
+ - [x] `CLAUDE.md`: Rule 3.11 β€” interaction instrumentation invariants
442
+ - [x] `_RANKER_VERSION` bumped to `v6.5_lightgbm_real_cosines`
443
+ - [x] Phase status updated to 6.5 COMPLETE
444
+ - [x] Tests: 203+ passing
445
+
446
  ### Test suite
447
  - `tests/test_reranker_integration.py` β€” 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)
448
  - `tests/test_phase6_feature_wiring.py` β€” 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)