Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

.gitattributes +1 -0
QUALITY_SCORE_ARCHITECTURE.md +164 -0
data/quality_scores.jsonl +3 -0
log.log +2 -2
scripts/compute_quality_score.py +545 -0

.gitattributes CHANGED Viewed

@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 log.log filter=lfs diff=lfs merge=lfs -text
 store/74c/74c70007-cccd-4669-bfd4-e25f8348ad8c/all_1_35_2/primary.cidx filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 log.log filter=lfs diff=lfs merge=lfs -text
 store/74c/74c70007-cccd-4669-bfd4-e25f8348ad8c/all_1_35_2/primary.cidx filter=lfs diff=lfs merge=lfs -text
+data/quality_scores.jsonl filter=lfs diff=lfs merge=lfs -text

QUALITY_SCORE_ARCHITECTURE.md ADDED Viewed

	@@ -0,0 +1,164 @@

+# Token Quality / Health Score (q) - Architecture
+This document defines the "quality/health" scalar `q` used by Apollo.
+## 1) What problem this solves
+We want a single number that captures **how healthy / organic vs controlled** a token looks, so a downstream trading policy (e.g., RL agent) can treat it as a **risk/health input**.
+Key points:
+- This is **not model confidence**.
+- `q` is computed **offline** using a token's **full lifetime** (for labels / training targets).
+- At **inference**, the model predicts `q` from **partial observations**.
+- We avoid hard thresholds and raw-scale features (USD, SOL, counts) by using **within-regime distributions**.
+## 2) Core idea (distribution-first, not rules-first)
+Raw totals (fees, volume, holders) are mostly **scale** and are extremely heavy-tailed. Using them directly:
+- makes the signal unstable across regimes,
+- makes it sensitive to market-wide shifts,
+- and invites hand-tuned weights ("human bias").
+Instead we map each metric to a **percentile** within a comparable peer group, then aggregate.
+## 3) Return bucketing (why it is required)
+The dataset is highly imbalanced: most tokens die early (<2-3x), while a tiny tail produces 10x-1000x outcomes.
+If you compute percentiles globally:
+- 100x tokens will always dominate "good" percentiles for scale metrics,
+- and "quality" will collapse into "return magnitude".
+So we compute distributions **within return regimes**.
+### 3.1 Bucket definition (example)
+Let `R_max` be the token's lifetime max return multiple (e.g., ATH / launch).
+Use coarse buckets for the bulk and finer buckets for the tail, e.g.:
+- B0: `R_max < 3`
+- B1: `3 <= R_max < 10`
+- B2: `10 <= R_max < 20`
+- B3: `20 <= R_max < 100`
+- B4: `100 <= R_max < 10_000`
+Notes:
+- If a bucket has too few samples, merge with a neighbor.
+- For the extreme tail you can also replace fixed buckets with **quantile buckets** on `log(R_max)` to keep sample counts stable.
+Interpretation (important):
+- `q` is **relative within the bucket**.
+- The "best garbage" can have high `q` in B0.
+- A 100x token can have low `q` in B4 if it looks worst vs other 100x+ tokens.
+This is intentional: return and quality are different axes.
+## 4) Feature set and sign conventions
+We want `q` to increase for "healthy/organic" structure and decrease for "controlled/manipulated" structure.
+All features below are evaluated **within the token's return bucket**.
+### 4.1 Scale / activity (high is usually better within-bucket)
+Use log transforms for stability before percentiles:
+- `log1p(total_volume_usd)`
+- `log1p(total_fees_sol)`
+- `log1p(unique_holders)`
+- `log1p(time_to_ath_sec)` (optional; see note below)
+Ratio features (less pure scale):
+- `fees_per_volume = total_fees_sol / (total_volume_usd + eps)`
+- `fees_per_trade  = total_fees_sol / (n_trades + eps)` (if `n_trades` exists)
+- `holders_per_trade = unique_holders / (n_trades + eps)` (if `n_trades` exists)
+- `holders_per_volume = unique_holders / (total_volume_usd + eps)`
+Rationale:
+- Fees and fee-per-* help separate "real urgency / competition" from "cheap wash".
+- Holders and holders-per-* help separate broad participation from concentrated looping.
+### 4.2 Manipulation / control (high is worse; flip sign)
+These are typically "the higher, the less healthy":
+- `snipers_pct_supply_top70`
+- `bundled_pct_supply`
+- `dev_hold_pct_supply`
+- `insiders_pct_supply`
+We treat exceptions as rare; the model can learn edge cases from context, but the label should reflect the dominant interpretation.
+### 4.3 Time-to-ATH note
+`time_to_ath_sec` can behave differently across return buckets.
+- In high-return buckets, very short times can look like a single spike / control.
+- In low-return buckets, many tokens have near-zero times because they never move.
+Include it only if it improves downstream behavior; keep it **bucket-relative** either way.
+## 5) Turning raw metrics into a signed scalar
+We want a single `q` in `[-1, +1]` with direction:
+- `+1` = looks healthiest vs peers in the same return bucket
+- `-1` = looks most unhealthy vs peers in the same return bucket
+### 5.1 Within-bucket percentile (ECDF)
+For each feature value `x_i`:
+- compute percentile `p_i = ECDF_b(x_i)` using only tokens in bucket `b`
+- `p_i` is in `[0, 1]`
+Implementation detail:
+- Use a rank-based ECDF with a small offset to avoid exact 0/1 if desired:
+  - `p_i = (rank(x_i) - 0.5) / n`
+### 5.2 Signed percentile
+Convert to signed value:
+- `s_i = 2 * p_i - 1`  (now `s_i` is in `[-1, +1]`)
+If "high is bad" for that feature, flip it:
+- `s_i := -s_i`
+This gives direction + magnitude in a single number.
+### 5.3 Aggregate without hand weights
+To avoid hand-tuned weights, use a symmetric aggregator:
+- `q_raw = mean_i(s_i)`
+Optional robustness:
+- clip each `s_i` to `[-0.99, 0.99]` before averaging (limits extreme leverage)
+- use a trimmed mean (drop top/bottom k% of `s_i`) if a single metric can be noisy
+### 5.4 Optional: re-rank the aggregate (final calibration)
+If you want the final `q` to be strictly comparable across time / retrains and more uniform within bucket:
+- `q = 2 * ECDF_b(q_raw) - 1`
+This keeps the "relative within bucket" meaning while stabilizing scale.
+## 6) Training vs inference (how it is used)
+Offline labeling (training target):
+1) Compute `R_max` from full lifetime.
+2) Assign return bucket `b`.
+3) Compute all chosen metrics from full lifetime.
+4) Convert metrics -> signed percentiles -> `q`.
+Inference (model output):
+- The model only sees partial history and must predict the *final* `q` (computed above).
+- The trading policy uses predicted return signals + predicted `q` to decide position sizing / risk.
+## 7) Practical notes
+- Use `eps` (e.g., `1e-9`) in denominators to avoid divide-by-zero.
+- If a metric is missing for a token, drop it from the mean for that token (or impute with bucket median).
+- When bucket sample counts drift, prefer merging buckets rather than letting ECDF be noisy.
+- Recompute distributions on the same "source-of-truth" dataset used for training (not ad-hoc caches).
+## 8) Summary
+`q` is a **return-regime-relative**, **distribution-normalized**, **signed** health score:
+- It is not a threshold classifier.
+- It avoids raw-scale dependence and hand weighting.
+- It cleanly separates "made money" (return) from "looks healthy" (quality).

data/quality_scores.jsonl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:484a4722dcc9d5bf4928e0926be256df7abecfe74cfbf7f75b04aeab91c2ca23
+size 11849315

log.log CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:47df087d32a6eacfb7758e1bcb7f7f82eb11905edbe984e3d1d5fe1fd905a155
-size 233598

 version https://git-lfs.github.com/spec/v1
+oid sha256:a7e8559fa0dfc6a9356d4078d582a479a5a3cbf8a3348183b3baf336ef73db25
+size 2302

scripts/compute_quality_score.py ADDED Viewed

	@@ -0,0 +1,545 @@

+import os
+import sys
+import json
+import math
+import argparse
+from typing import Dict, List, Tuple
+from clickhouse_driver import Client as ClickHouseClient
+# Add parent to path
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from models.vocabulary import RETURN_THRESHOLDS
+CLICKHOUSE_HOST = os.getenv("CLICKHOUSE_HOST", "localhost")
+CLICKHOUSE_PORT = int(os.getenv("CLICKHOUSE_PORT", 9000))
+CLICKHOUSE_USER = os.getenv("CLICKHOUSE_USER", "default")
+CLICKHOUSE_PASSWORD = os.getenv("CLICKHOUSE_PASSWORD", "")
+CLICKHOUSE_DATABASE = os.getenv("CLICKHOUSE_DATABASE", "default")
+LAUNCH_PRICE_USD = 0.000004
+EPS = 1e-9
+def get_client():
+    return ClickHouseClient(
+        host=CLICKHOUSE_HOST,
+        port=CLICKHOUSE_PORT,
+        user=CLICKHOUSE_USER,
+        password=CLICKHOUSE_PASSWORD,
+        database=CLICKHOUSE_DATABASE,
+    )
+def _midrank_percentiles(items: List[Tuple[str, float]]) -> Dict[str, float]:
+    """
+    Compute midrank percentiles for a list of (token, value).
+    Returns p in (0,1) via (rank - 0.5) / n. Ties get the same midrank.
+    """
+    if not items:
+        return {}
+    items_sorted = sorted(items, key=lambda x: x[1])
+    n = len(items_sorted)
+    out = {}
+    i = 0
+    while i < n:
+        j = i
+        v = items_sorted[i][1]
+        while j + 1 < n and items_sorted[j + 1][1] == v:
+            j += 1
+        # midrank is average of ranks i..j (1-based)
+        rank_lo = i + 1
+        rank_hi = j + 1
+        midrank = 0.5 * (rank_lo + rank_hi)
+        p = (midrank - 0.5) / n
+        for k in range(i, j + 1):
+            out[items_sorted[k][0]] = p
+        i = j + 1
+    return out
+def _bucket_id(ret_val: float) -> int:
+    for i in range(len(RETURN_THRESHOLDS) - 1):
+        lower = RETURN_THRESHOLDS[i]
+        upper = RETURN_THRESHOLDS[i + 1]
+        if ret_val >= lower and ret_val < upper:
+            return i
+    return -1
+def fetch_token_metrics(client) -> List[dict]:
+    """
+    Fetches lifetime metrics needed for quality scoring.
+    Returns a list of dicts keyed by token_address.
+    """
+    query = f"""
+    WITH
+        trade_agg AS (
+            SELECT
+                base_address,
+                sum(priority_fee + coin_creator_fee) AS fees_sol,
+                sum(total_usd) AS volume_usd,
+                count() AS n_trades,
+                min(timestamp) AS t0,
+                argMax(timestamp, price_usd) AS t_ath
+            FROM trades
+            GROUP BY base_address
+        ),
+        ret_agg AS (
+            SELECT
+                token_address,
+                (argMax(ath_price_usd, updated_at) / {LAUNCH_PRICE_USD}) AS ret,
+                argMax(unique_holders, updated_at) AS unique_holders
+            FROM token_metrics
+            GROUP BY token_address
+        ),
+        snipers AS (
+            SELECT
+                m.base_address AS token_address,
+                (m.val / t.total_supply * 100) AS snipers_pct
+            FROM (
+                SELECT
+                    base_address,
+                    sumIf(base_amount, buyer_rank <= 70) AS val
+                FROM (
+                    SELECT
+                        base_address,
+                        base_amount,
+                        dense_rank() OVER (PARTITION BY base_address ORDER BY min_slot, min_idx) AS buyer_rank
+                    FROM (
+                        SELECT
+                            base_address,
+                            maker,
+                            min(slot) AS min_slot,
+                            min(transaction_index) AS min_idx,
+                            sum(base_amount) AS base_amount
+                        FROM trades
+                        WHERE trade_type = 0
+                        GROUP BY base_address, maker
+                    )
+                )
+                GROUP BY base_address
+            ) m
+            JOIN (
+                SELECT token_address, argMax(total_supply, updated_at) AS total_supply
+                FROM tokens
+                GROUP BY token_address
+            ) t ON m.base_address = t.token_address
+            WHERE t.total_supply > 0
+        ),
+        bundled AS (
+            SELECT
+                m.base_address AS token_address,
+                (m.val / t.total_supply * 100) AS bundled_pct
+            FROM (
+                SELECT
+                    t.base_address,
+                    sum(t.base_amount) AS val
+                FROM trades t
+                JOIN (
+                    SELECT base_address, min(slot) AS min_slot
+                    FROM trades
+                    GROUP BY base_address
+                ) m ON t.base_address = m.base_address AND t.slot = m.min_slot
+                WHERE t.trade_type = 0
+                GROUP BY t.base_address
+            ) m
+            JOIN (
+                SELECT token_address, argMax(total_supply, updated_at) AS total_supply
+                FROM tokens
+                GROUP BY token_address
+            ) t ON m.base_address = t.token_address
+            WHERE t.total_supply > 0
+        ),
+        dev_hold AS (
+            SELECT
+                t.token_address AS token_address,
+                (wh.current_balance / (t.total_supply / pow(10, t.decimals)) * 100) AS dev_hold_pct
+            FROM (
+                SELECT
+                    token_address,
+                    argMax(creator_address, updated_at) AS creator_address,
+                    argMax(total_supply, updated_at) AS total_supply,
+                    argMax(decimals, updated_at) AS decimals
+                FROM tokens
+                GROUP BY token_address
+            ) t
+            JOIN (
+                SELECT mint_address, wallet_address, argMax(current_balance, updated_at) AS current_balance
+                FROM wallet_holdings
+                GROUP BY mint_address, wallet_address
+            ) wh ON t.token_address = wh.mint_address AND t.creator_address = wh.wallet_address
+            WHERE t.total_supply > 0
+        ),
+        insiders AS (
+            SELECT
+                wh.mint_address AS token_address,
+                (sum(wh.current_balance) / (t.total_supply / pow(10, t.decimals)) * 100) AS insiders_pct
+            FROM (
+                SELECT mint_address, wallet_address, argMax(current_balance, updated_at) AS current_balance
+                FROM wallet_holdings
+                GROUP BY mint_address, wallet_address
+            ) wh
+            JOIN (
+                SELECT
+                    wallet_address,
+                    argMax(total_buys_count, updated_at) AS buys,
+                    argMax(transfers_in_count, updated_at) AS transfers,
+                    argMax(spl_transfers_in_count, updated_at) AS spl_transfers
+                FROM wallet_profile_metrics
+                GROUP BY wallet_address
+            ) wpm ON wh.wallet_address = wpm.wallet_address
+            JOIN (
+                SELECT token_address, argMax(total_supply, updated_at) AS total_supply, argMax(decimals, updated_at) AS decimals
+                FROM tokens
+                GROUP BY token_address
+            ) t ON wh.mint_address = t.token_address
+            WHERE wpm.buys = 0 AND (wpm.transfers > 0 OR wpm.spl_transfers > 0) AND t.total_supply > 0
+            GROUP BY wh.mint_address, t.total_supply, t.decimals
+        )
+    SELECT
+        r.token_address,
+        r.ret,
+        r.unique_holders,
+        f.fees_sol,
+        f.volume_usd,
+        f.n_trades,
+        (f.t_ath - f.t0) AS time_to_ath_sec,
+        s.snipers_pct,
+        b.bundled_pct,
+        d.dev_hold_pct,
+        i.insiders_pct
+    FROM ret_agg r
+    LEFT JOIN trade_agg f ON r.token_address = f.base_address
+    LEFT JOIN snipers s ON r.token_address = s.token_address
+    LEFT JOIN bundled b ON r.token_address = b.token_address
+    LEFT JOIN dev_hold d ON r.token_address = d.token_address
+    LEFT JOIN insiders i ON r.token_address = i.token_address
+    """
+    rows = client.execute(query)
+    cols = [
+        "token_address",
+        "ret",
+        "unique_holders",
+        "fees_sol",
+        "volume_usd",
+        "n_trades",
+        "time_to_ath_sec",
+        "snipers_pct",
+        "bundled_pct",
+        "dev_hold_pct",
+        "insiders_pct",
+    ]
+    out = []
+    for r in rows:
+        out.append(dict(zip(cols, r)))
+    return out
+def _compute_quality_scores(
+    client,
+    max_ret: float = 10000.0,
+    rerank: bool = True,
+    with_debug: bool = False,
+):
+    data = fetch_token_metrics(client)
+    # feature spec: (name, getter, positive_when_high)
+    feature_defs = [
+        ("fees_log", lambda d: math.log1p(d["fees_sol"]) if d["fees_sol"] is not None else None, True),
+        ("volume_log", lambda d: math.log1p(d["volume_usd"]) if d["volume_usd"] is not None else None, True),
+        ("holders_log", lambda d: math.log1p(d["unique_holders"]) if d["unique_holders"] is not None else None, True),
+        ("time_to_ath_log", lambda d: math.log1p(d["time_to_ath_sec"]) if d["time_to_ath_sec"] is not None else None, True),
+        ("fees_per_volume", lambda d: (d["fees_sol"] / (d["volume_usd"] + EPS)) if d["fees_sol"] is not None and d["volume_usd"] is not None else None, True),
+        ("fees_per_trade", lambda d: (d["fees_sol"] / (d["n_trades"] + EPS)) if d["fees_sol"] is not None and d["n_trades"] is not None else None, True),
+        ("holders_per_trade", lambda d: (d["unique_holders"] / (d["n_trades"] + EPS)) if d["unique_holders"] is not None and d["n_trades"] is not None else None, True),
+        ("holders_per_volume", lambda d: (d["unique_holders"] / (d["volume_usd"] + EPS)) if d["unique_holders"] is not None and d["volume_usd"] is not None else None, True),
+        ("snipers_pct", lambda d: d["snipers_pct"], False),
+        ("bundled_pct", lambda d: d["bundled_pct"], False),
+        ("dev_hold_pct", lambda d: d["dev_hold_pct"], False),
+        ("insiders_pct", lambda d: d["insiders_pct"], False),
+    ]
+    raw_metrics = ["snipers_pct", "bundled_pct", "dev_hold_pct", "insiders_pct"]
+    debug = None
+    if with_debug:
+        debug = {
+            "q_raw": [],
+            "feature_pairs": {f[0]: [] for f in feature_defs},
+            "raw_pairs": {m: [] for m in raw_metrics},
+        }
+    # Build bucket mapping
+    buckets: Dict[int, List[dict]] = {}
+    for d in data:
+        ret_val = d.get("ret")
+        if ret_val is None or ret_val <= 0 or ret_val > max_ret:
+            continue
+        b = _bucket_id(ret_val)
+        if b == -1:
+            continue
+        d["bucket_id"] = b
+        buckets.setdefault(b, []).append(d)
+    # Compute percentiles per bucket + feature
+    token_scores = []
+    for b, items in buckets.items():
+        # Precompute percentiles per feature
+        feature_percentiles: Dict[str, Dict[str, float]] = {}
+        for fname, fget, _pos in feature_defs:
+            vals = []
+            for d in items:
+                v = fget(d)
+                if v is None or (isinstance(v, float) and (math.isnan(v) or math.isinf(v))):
+                    continue
+                vals.append((d["token_address"], v))
+            feature_percentiles[fname] = _midrank_percentiles(vals)
+        # Compute q_raw for each token
+        q_raw_map = {}
+        for d in items:
+            s_vals = []
+            s_map = {}
+            for fname, _fget, pos in feature_defs:
+                p = feature_percentiles[fname].get(d["token_address"])
+                if p is None:
+                    continue
+                s = 2.0 * p - 1.0
+                if not pos:
+                    s = -s
+                # clip
+                if s > 0.99:
+                    s = 0.99
+                elif s < -0.99:
+                    s = -0.99
+                s_vals.append(s)
+                s_map[fname] = s
+            if not s_vals:
+                continue
+            q_raw = sum(s_vals) / len(s_vals)
+            q_raw_map[d["token_address"]] = q_raw
+            if with_debug:
+                debug["q_raw"].append(q_raw)
+                for fname, s in s_map.items():
+                    debug["feature_pairs"][fname].append((q_raw, s))
+                for metric in raw_metrics:
+                    raw_val = d.get(metric)
+                    if raw_val is None:
+                        continue
+                    debug["raw_pairs"][metric].append((q_raw, raw_val))
+        # Optional re-rank within bucket
+        if rerank:
+            q_items = [(t, q) for t, q in q_raw_map.items()]
+            q_p = _midrank_percentiles(q_items)
+            for d in items:
+                t = d["token_address"]
+                if t not in q_raw_map:
+                    continue
+                q_final = 2.0 * q_p[t] - 1.0
+                token_scores.append(
+                    {
+                        "token_address": t,
+                        "bucket_id": b,
+                        "ret": d["ret"],
+                        "q_raw": q_raw_map[t],
+                        "q": q_final,
+                    }
+                )
+        else:
+            for d in items:
+                t = d["token_address"]
+                if t not in q_raw_map:
+                    continue
+                token_scores.append(
+                    {
+                        "token_address": t,
+                        "bucket_id": b,
+                        "ret": d["ret"],
+                        "q_raw": q_raw_map[t],
+                        "q": q_raw_map[t],
+                    }
+                )
+    if with_debug:
+        return token_scores, debug
+    return token_scores
+def compute_quality_scores(
+    client,
+    max_ret: float = 10000.0,
+    rerank: bool = True,
+) -> List[dict]:
+    return _compute_quality_scores(client, max_ret=max_ret, rerank=rerank, with_debug=False)
+def write_jsonl(path: str, rows: List[dict]) -> None:
+    os.makedirs(os.path.dirname(path), exist_ok=True)
+    with open(path, "w", encoding="utf-8") as f:
+        for r in rows:
+            f.write(json.dumps(r) + "\n")
+def _percentile(sorted_vals: List[float], p: float) -> float:
+    if not sorted_vals:
+        return float("nan")
+    n = len(sorted_vals)
+    if n == 1:
+        return sorted_vals[0]
+    pos = p * (n - 1)
+    lo = int(math.floor(pos))
+    hi = int(math.ceil(pos))
+    if lo == hi:
+        return sorted_vals[lo]
+    frac = pos - lo
+    return sorted_vals[lo] * (1 - frac) + sorted_vals[hi] * frac
+def _summary_stats(vals: List[float]) -> Dict[str, float]:
+    if not vals:
+        return {}
+    vals_sorted = sorted(vals)
+    return {
+        "mean": sum(vals_sorted) / len(vals_sorted),
+        "min": vals_sorted[0],
+        "max": vals_sorted[-1],
+        "p10": _percentile(vals_sorted, 0.10),
+        "p50": _percentile(vals_sorted, 0.50),
+        "p90": _percentile(vals_sorted, 0.90),
+        "p99": _percentile(vals_sorted, 0.99),
+    }
+def _pearson_corr(xs: List[float], ys: List[float]) -> float:
+    if not xs or not ys or len(xs) != len(ys) or len(xs) < 2:
+        return float("nan")
+    n = len(xs)
+    mean_x = sum(xs) / n
+    mean_y = sum(ys) / n
+    num = 0.0
+    den_x = 0.0
+    den_y = 0.0
+    for i in range(n):
+        dx = xs[i] - mean_x
+        dy = ys[i] - mean_y
+        num += dx * dy
+        den_x += dx * dx
+        den_y += dy * dy
+    denom = math.sqrt(den_x * den_y)
+    if denom == 0.0:
+        return float("nan")
+    return num / denom
+def _bucket_label(b: int) -> str:
+    lower = RETURN_THRESHOLDS[b]
+    upper = RETURN_THRESHOLDS[b + 1] if b + 1 < len(RETURN_THRESHOLDS) else None
+    if upper is None:
+        return f">= {lower}x"
+    return f"{lower}x - {upper}x"
+def print_summary(scores: List[dict]) -> None:
+    print("=== QUALITY SCORE SUMMARY ===")
+    print(f"Total tokens scored: {len(scores)}")
+    if not scores:
+        return
+    overall_q = [s["q"] for s in scores if "q" in s]
+    overall_q_raw = [s["q_raw"] for s in scores if "q_raw" in s]
+    for name, series in [("q", overall_q), ("q_raw", overall_q_raw)]:
+        stats = _summary_stats(series)
+        if not stats:
+            continue
+        print(f"\nOverall {name}:")
+        print(f"  Mean: {stats['mean']:.4f} | Min: {stats['min']:.4f} | Max: {stats['max']:.4f}")
+        print(f"  Q: p10={stats['p10']:.2f} p50={stats['p50']:.2f} p90={stats['p90']:.2f} p99={stats['p99']:.2f}")
+    # Per-bucket summaries
+    buckets: Dict[int, List[dict]] = {}
+    for s in scores:
+        buckets.setdefault(s["bucket_id"], []).append(s)
+    for b in sorted(buckets.keys()):
+        items = buckets[b]
+        q_vals = [i["q"] for i in items if "q" in i]
+        q_raw_vals = [i["q_raw"] for i in items if "q_raw" in i]
+        print(f"\nSEGMENT: {b}. {_bucket_label(b)}")
+        print(f"Tokens in segment: {len(items)}")
+        stats_q = _summary_stats(q_vals)
+        stats_q_raw = _summary_stats(q_raw_vals)
+        if stats_q:
+            print("  q:")
+            print(f"    Mean: {stats_q['mean']:.4f} | Min: {stats_q['min']:.4f} | Max: {stats_q['max']:.4f}")
+            print(f"    Q: p10={stats_q['p10']:.2f} p50={stats_q['p50']:.2f} p90={stats_q['p90']:.2f} p99={stats_q['p99']:.2f}")
+        if stats_q_raw:
+            print("  q_raw:")
+            print(f"    Mean: {stats_q_raw['mean']:.4f} | Min: {stats_q_raw['min']:.4f} | Max: {stats_q_raw['max']:.4f}")
+            print(f"    Q: p10={stats_q_raw['p10']:.2f} p50={stats_q_raw['p50']:.2f} p90={stats_q_raw['p90']:.2f} p99={stats_q_raw['p99']:.2f}")
+def print_diagnostics(debug: dict) -> None:
+    if not debug:
+        return
+    q_raw_vals = debug.get("q_raw", [])
+    if not q_raw_vals:
+        return
+    print("\n=== QUALITY SCORE DIAGNOSTICS ===")
+    feature_pairs = debug.get("feature_pairs", {})
+    if feature_pairs:
+        print("Correlation with q_raw (signed features):")
+        for fname in sorted(feature_pairs.keys()):
+            pairs = feature_pairs[fname]
+            xs = [p[0] for p in pairs]
+            ys = [p[1] for p in pairs]
+            corr = _pearson_corr(xs, ys)
+            print(f"  {fname}: {corr:.4f} (n={len(pairs)})")
+    raw_pairs = debug.get("raw_pairs", {})
+    if raw_pairs:
+        q_sorted = sorted(q_raw_vals)
+        p10 = _percentile(q_sorted, 0.10)
+        p90 = _percentile(q_sorted, 0.90)
+        print("\nTop/bottom decile raw means (by q_raw):")
+        for metric in sorted(raw_pairs.keys()):
+            pairs = raw_pairs[metric]
+            lows = [v for q, v in pairs if q <= p10]
+            highs = [v for q, v in pairs if q >= p90]
+            if not lows or not highs:
+                continue
+            low_mean = sum(lows) / len(lows)
+            high_mean = sum(highs) / len(highs)
+            print(f"  {metric}: bottom_mean={low_mean:.4f} top_mean={high_mean:.4f} (n_low={len(lows)}, n_high={len(highs)})")
+def main():
+    parser = argparse.ArgumentParser(description="Compute token quality/health score.")
+    parser.add_argument("--max-ret", type=float, default=10000.0, help="Max return to include")
+    parser.add_argument("--no-rerank", action="store_true", help="Disable final rerank within bucket")
+    parser.add_argument("--no-summary", action="store_true", help="Disable summary logging")
+    parser.add_argument("--no-diagnostics", action="store_true", help="Disable diagnostics logging")
+    args = parser.parse_args()
+    client = get_client()
+    if args.no_diagnostics:
+        scores = compute_quality_scores(client, max_ret=args.max_ret, rerank=not args.no_rerank)
+        debug = None
+    else:
+        scores, debug = _compute_quality_scores(
+            client,
+            max_ret=args.max_ret,
+            rerank=not args.no_rerank,
+            with_debug=True,
+        )
+    if not args.no_summary:
+        print_summary(scores)
+    if not args.no_diagnostics:
+        print_diagnostics(debug)
+if __name__ == "__main__":
+    main()