Spaces:
Running
Running
siddhm11 commited on
Commit Β·
d2f0bed
1
Parent(s): 239539e
Phase 6.5: Documentation finalization
Browse files- CLAUDE.md: Add Rule 3.11 (interaction instrumentation invariants),
update phase status to 6.5 COMPLETE, bump last-updated date
- TASK-TRACKER.md: Add full Phase 6.5 section with all completed items
- recommendations.py: Bump _RANKER_VERSION to v6.5_lightgbm_real_cosines
Tests: 203 passed, 0 failures
- CLAUDE.md +11 -2
- app/routers/recommendations.py +1 -1
- docs/TASK-TRACKER.md +43 -2
CLAUDE.md
CHANGED
|
@@ -160,11 +160,20 @@ ArXiv IDs can have leading zeros (e.g., `0704.0001`). **Treat all arXiv IDs as s
|
|
| 160 |
|
| 161 |
The per-cluster origin of each retrieved candidate is preserved end-to-end via `paper_cluster_map: dict[str, int]` (built in `recommendations.py` before `merge_quota_results()`). This mapping flows through to the reranker as per-candidate `cluster_importance` (N,) and `cluster_medoid` (N, 1024) arrays. **Do not re-introduce dominant-cluster shortcuts as "simplifications"** β LightGBM feature slot 24 (`cluster_distance_to_medoid`) depends on per-candidate medoids to correctly score papers from minority-interest clusters.
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
---
|
| 164 |
|
| 165 |
## 4. What is in scope vs out of scope right now
|
| 166 |
|
| 167 |
-
**Current phase: Phase 6 COMPLETE; Phase 7 (Evaluation Framework) next.** Phase 2 (a, b, c) is complete with Doc 06 corrections applied. Phase 3 (Hybrid Semantic Search) and Phase 3.5 (Turso metadata DB) are implemented and tested.
|
| 168 |
|
| 169 |
**What has been built (Phases 1-2c):**
|
| 170 |
- Qdrant BEST_SCORE recommend API (Tier 3 fallback)
|
|
@@ -459,4 +468,4 @@ If a topic is too large for a 06 changelog entry, create `docs/research/07-[topi
|
|
| 459 |
|
| 460 |
---
|
| 461 |
|
| 462 |
-
*Last updated: 2026-05-
|
|
|
|
| 160 |
|
| 161 |
The per-cluster origin of each retrieved candidate is preserved end-to-end via `paper_cluster_map: dict[str, int]` (built in `recommendations.py` before `merge_quota_results()`). This mapping flows through to the reranker as per-candidate `cluster_importance` (N,) and `cluster_medoid` (N, 1024) arrays. **Do not re-introduce dominant-cluster shortcuts as "simplifications"** β LightGBM feature slot 24 (`cluster_distance_to_medoid`) depends on per-candidate medoids to correctly score papers from minority-interest clusters.
|
| 162 |
|
| 163 |
+
### 3.11 Interaction instrumentation invariants (Phase 6.5)
|
| 164 |
+
|
| 165 |
+
Every interaction logged via `db.log_interaction()` must carry **`query_id`**, **`propensity`**, and **`policy_id`**. These are required for Phase 7 evaluation:
|
| 166 |
+
- `query_id` (UUID): links all papers in a single feed request for per-feed CTR.
|
| 167 |
+
- `propensity` (float): probability the serving policy chose to show this paper (1.0 for deterministic, `n_explore/pool_size` for exploration).
|
| 168 |
+
- `policy_id` (string): identifies the pipeline version (`_RANKER_VERSION`).
|
| 169 |
+
|
| 170 |
+
**When adding a new recommendation tier or call path**, always include these three fields in the `paper_tags` dict. The round-trip is: `recommendations.py` β paper dict β `action_buttons.html` `hx-vals` β `events.py` Form params β `db.log_interaction()`.
|
| 171 |
+
|
| 172 |
---
|
| 173 |
|
| 174 |
## 4. What is in scope vs out of scope right now
|
| 175 |
|
| 176 |
+
**Current phase: Phase 6.5 COMPLETE; Phase 7 (Evaluation Framework) next.** Phase 2 (a, b, c) is complete with Doc 06 corrections applied. Phase 3 (Hybrid Semantic Search) and Phase 3.5 (Turso metadata DB) are implemented and tested.
|
| 177 |
|
| 178 |
**What has been built (Phases 1-2c):**
|
| 179 |
- Qdrant BEST_SCORE recommend API (Tier 3 fallback)
|
|
|
|
| 468 |
|
| 469 |
---
|
| 470 |
|
| 471 |
+
*Last updated: 2026-05-05. Update this date when CLAUDE.md changes.*
|
app/routers/recommendations.py
CHANGED
|
@@ -39,7 +39,7 @@ router = APIRouter(prefix="/api")
|
|
| 39 |
|
| 40 |
# Phase 4.5: Pipeline version tag for instrumentation. Bump this on any
|
| 41 |
# change to the ranking logic so A/B attribution is possible.
|
| 42 |
-
_RANKER_VERSION = "
|
| 43 |
|
| 44 |
# Minimum EWMA interactions before switching from ID-based to vector-based recs
|
| 45 |
_MIN_EWMA_INTERACTIONS = 3
|
|
|
|
| 39 |
|
| 40 |
# Phase 4.5: Pipeline version tag for instrumentation. Bump this on any
|
| 41 |
# change to the ranking logic so A/B attribution is possible.
|
| 42 |
+
_RANKER_VERSION = "v6.5_lightgbm_real_cosines"
|
| 43 |
|
| 44 |
# Minimum EWMA interactions before switching from ID-based to vector-based recs
|
| 45 |
_MIN_EWMA_INTERACTIONS = 3
|
docs/TASK-TRACKER.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
# ResearchIT β Master Task Tracker
|
| 2 |
|
| 3 |
> **Purpose**: Single source of truth for all completed, in-progress, and upcoming work.
|
| 4 |
-
> **Last updated**: 2026-05-
|
| 5 |
-
> **Current phase**: Phase 6 (
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
@@ -402,6 +402,47 @@
|
|
| 402 |
- [~] Real-user retrain at 100-user threshold β target: +90d or threshold
|
| 403 |
- [~] HF model card backfill (library_name, pipeline_tag, metrics, schema)
|
| 404 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 405 |
### Test suite
|
| 406 |
- `tests/test_reranker_integration.py` β 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)
|
| 407 |
- `tests/test_phase6_feature_wiring.py` β 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)
|
|
|
|
| 1 |
# ResearchIT β Master Task Tracker
|
| 2 |
|
| 3 |
> **Purpose**: Single source of truth for all completed, in-progress, and upcoming work.
|
| 4 |
+
> **Last updated**: 2026-05-05
|
| 5 |
+
> **Current phase**: Phase 6.5 (Instrumentation) β COMPLETE β | Phase 7 next
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
|
|
| 402 |
- [~] Real-user retrain at 100-user threshold β target: +90d or threshold
|
| 403 |
- [~] HF model card backfill (library_name, pipeline_tag, metrics, schema)
|
| 404 |
|
| 405 |
+
## Phase 6.5: Instrumentation β
COMPLETE
|
| 406 |
+
|
| 407 |
+
> **Purpose**: Stabilize the recommendation pipeline and prepare telemetry substrate for Phase 7 evaluation.
|
| 408 |
+
|
| 409 |
+
### A1 β Real Qdrant cosine scores
|
| 410 |
+
- [x] Switch `search_by_vector()` β `search_by_vector_with_scores()` in per-cluster + short-term searches
|
| 411 |
+
- [x] Build `qdrant_score_map` from real cosines (replaces fake `1.0 - rank*0.01` linear decay)
|
| 412 |
+
- [x] Feature 0 (`qdrant_cosine_score`) now receives actual cosine similarities
|
| 413 |
+
|
| 414 |
+
### A2 β Deployment verification
|
| 415 |
+
- [x] `curl /healthz/reranker` β `model_loaded=true, n_trees=141, fallback_active=false`
|
| 416 |
+
- [x] Verification timestamp added to `PHASE6-Reranker-Framing.md`
|
| 417 |
+
|
| 418 |
+
### B1 β query_id linkage
|
| 419 |
+
- [x] Generate `query_id` (UUID) once per feed request in `get_recommendations()`
|
| 420 |
+
- [x] Thread through all 4 tiers: trending, Tier 1, Tier 2, Tier 3
|
| 421 |
+
- [x] Generate `query_id` in `search.py` per search request
|
| 422 |
+
- [x] Add `query_id` + `position` to `action_buttons.html` hx-vals
|
| 423 |
+
|
| 424 |
+
### B2 β Propensity logging
|
| 425 |
+
- [x] Add `propensity REAL` + `policy_id TEXT` migration to `interactions` table
|
| 426 |
+
- [x] Extend `db.log_interaction()` with propensity + policy_id params
|
| 427 |
+
- [x] Compute propensity: 1.0 (deterministic) vs `n_explore/pool_size` (exploration)
|
| 428 |
+
- [x] Thread through templates + `events.py` Form params
|
| 429 |
+
|
| 430 |
+
### B3 β Cluster snapshot versioning
|
| 431 |
+
- [x] Add `cluster_snapshots` table (append-only, content-addressed via `paper_ids_hash`)
|
| 432 |
+
- [x] `save_cluster_snapshot()` called after each `save_clusters_to_db()`
|
| 433 |
+
- [x] `prune_old_snapshots(30)` on startup in `main.py` lifespan
|
| 434 |
+
|
| 435 |
+
### B4 β S2 author import (Phase 5.1)
|
| 436 |
+
- [x] `app/s2_svc.py`: parse S2 URL / raw ID / ORCID, fetch author papers from S2 API
|
| 437 |
+
- [x] `POST /api/onboarding/import-author` endpoint in `onboarding.py`
|
| 438 |
+
- [x] Quick-import form added to `seed_search.html` template
|
| 439 |
+
|
| 440 |
+
### Documentation
|
| 441 |
+
- [x] `CLAUDE.md`: Rule 3.11 β interaction instrumentation invariants
|
| 442 |
+
- [x] `_RANKER_VERSION` bumped to `v6.5_lightgbm_real_cosines`
|
| 443 |
+
- [x] Phase status updated to 6.5 COMPLETE
|
| 444 |
+
- [x] Tests: 203+ passing
|
| 445 |
+
|
| 446 |
### Test suite
|
| 447 |
- `tests/test_reranker_integration.py` β 7 tests (smoke, features, heuristic, E2E, latency, backward compat, comparison)
|
| 448 |
- `tests/test_phase6_feature_wiring.py` β 9 tests (per-candidate arrays, broadcast medoid, model accessors, aggregate activation)
|