Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

QUALITY_SCORE_ARCHITECTURE.md +125 -1
data/context_targets.py +93 -0
data/data_collator.py +2 -0
data/data_loader.py +38 -3
log.log +2 -2
models/model.py +15 -0
pre_cache.sh +1 -1
scripts/cache_dataset.py +207 -34
scripts/evaluate_sample.py +606 -120
train.py +136 -4

QUALITY_SCORE_ARCHITECTURE.md CHANGED Viewed

@@ -10,4 +10,128 @@ T_cutoff at trade 700 → returns are -90%
 T_cutoff at trade 900 → returns are -95%
 So even for class 5 tokens, 80%+ of the cached training samples have negative Ground Truth labels. The model is correctly learning that at any random moment, even a "good" token is most likely going down. The class balancing doesn't change the fact that the actual Y labels are overwhelmingly negative across all classes.
-The model isn't broken — it learned exactly what the data showed it. The issue is that the training setup doesn't teach it to recognize the pre-pump moment specifically.

 T_cutoff at trade 900 → returns are -95%
 So even for class 5 tokens, 80%+ of the cached training samples have negative Ground Truth labels. The model is correctly learning that at any random moment, even a "good" token is most likely going down. The class balancing doesn't change the fact that the actual Y labels are overwhelmingly negative across all classes.
+The model isn't broken — it learned exactly what the data showed it. The issue is that the training setup doesn't teach it to recognize the pre-pump moment specifically.
+**Main Issue**
+Your main problem was never just “bad checkpoint choice.” The core issue is training/data misalignment:
+- token `class_id` is token-level
+- the prediction target is context-level from random `T_cutoff`
+- even a good token produces many bad windows
+- so balanced token classes do not mean balanced future-return labels
+- the model then learns an over-negative prior
+A second major issue was cache construction:
+- cache was wasting disk/time on overwhelming numbers of garbage-token samples
+- later training weights cannot fix that upstream waste
+**What We Figured Out**
+- The model is not useless.
+- Wallet signal is real: ablations showed wallet removal hurts predictions materially.
+- OHLC matters, but mostly as a coarse summary, not real chart-pattern intelligence.
+- No obvious future leakage was found in OHLC construction.
+- Social looked basically unused.
+- Graph looked weaker than expected.
+- The movement head idea is valid, but only if labels are placed correctly in the pipeline.
+- Movement labels should come from the data loader, not be derived later in collator/training.
+- Cache balancing should not depend on fragile movement thresholds.
+- A single “movement class” for cache weighting was wrong because:
+  - thresholds were unresolved
+  - movement differs across horizons inside the same sample
+**Where You Corrected the Direction**
+You pushed on several important bad assumptions:
+- `return > 0` is too noisy as a label
+- movement class names should be threshold-agnostic
+- threshold-based movement balancing was premature
+- SQL/global distribution threshold inference was conceptually wrong because labels depend on sampled `T_cutoff`
+- cache should not be filtered by class map in a destructive way
+- cache balancing must happen at cache generation time, not be delegated to train weights
+- positive balancing should not be forced on garbage classes
+- exact class sample counts matter more than approximate expected weighting
+- `T_cutoff` does not need to be deterministic or pre-fixed
+- if cache balancing uses movement-like signals, use threshold-free binary polarity first
+Those corrections materially improved the design.
+**Proposed Methods Over the Chat**
+These were the main methods proposed, in order of evolution:
+1. Forward time validation and token-grouped splits
+- to reduce misleading val results and leakage risk
+2. Auxiliary head ideas
+- first fixed pump heads
+- then all-horizon direction head
+- then movement-type multiclass head
+- final stable view: one multi-horizon movement head is reasonable, but labels must be created correctly
+3. Runtime/loader-side label derivation
+- final agreed direction:
+  - labels belong in the data loader
+  - collator should only stack them
+  - model should just consume them
+4. Cache-time balancing instead of train-time rescue
+- because disk/time waste happens before training starts
+- so train weights alone are too late
+5. Class-id-based cache expansion
+- proposed because class `0` dominates raw token counts
+- later refined because exact quotas matter more than soft weighting
+6. Movement-class-based cache balancing
+- proposed, then rejected correctly
+- because it depended on unresolved thresholds and collapsed multi-horizon information incorrectly
+7. Binary polarity cache balancing
+- final better version:
+  - use `positive` if max valid horizon return in a sample is `> 0`
+  - else `negative`
+- this is threshold-free and less brittle
+8. Exact class quotas + class-conditional polarity quotas
+- final strongest cache proposal:
+  - exact equal sample budget per class
+  - positive/negative balancing only for classes that can realistically produce positive contexts
+  - keep class `0` mostly negative
+  - keep `T_cutoff` random
+**Current Best Design**
+The strongest design we converged toward is:
+- keep `T_cutoff` random
+- make cache generation deterministic at the planning level
+- assign exact sample budgets per token class
+- for higher classes, request positive/negative context ratios
+- for low garbage classes, do not force positive quotas
+- build labels in the data loader
+- keep the model’s main task as future return prediction
+- use the movement head only as auxiliary supervision
+So:
+- token-type balance is controlled explicitly
+- context-type bias is controlled explicitly
+- disk usage is controlled at cache creation time
+- training does not need to rescue a broken corpus
+**What We Learned About the Model**
+- backbone has useful signal
+- wallet features matter
+- OHLC is being used, but not in the rich pattern-detection way you hoped
+- current main failure mode is much more about data construction and sampling than about the backbone learning nothing
+**High-Level Conclusions**
+1. The main issue is data/sample construction, not just optimizer/checkpoint behavior.
+2. Cache balancing is the right place to solve the waste and prior-bias problem.
+3. Threshold-dependent movement balancing was too early and too brittle.
+4. Binary short-horizon max-return polarity is a better first cache-balancing signal.
+5. Exact per-class sample quotas plus class-conditional polarity balancing is the most coherent cache design we arrived at.
+If you want, I can next turn this into a short final blueprint with only:
+- `Main issue`
+- `Accepted design`
+- `Rejected ideas`
+- `Next implementation order`

data/context_targets.py ADDED Viewed

	@@ -0,0 +1,93 @@

+from __future__ import annotations
+from typing import Dict, List, Sequence
+MOVEMENT_STRONG_DOWN_THRESHOLD = -0.40
+MOVEMENT_DOWN_THRESHOLD = -0.30
+MOVEMENT_PUMP_50_THRESHOLD = 0.50
+MOVEMENT_PUMP_100_THRESHOLD = 1.00
+MOVEMENT_PUMP_300_THRESHOLD = 3.00
+MOVEMENT_CLASS_NAMES = [
+    "strong_down",
+    "down",
+    "flat",
+    "up",
+    "strong_up",
+    "extreme_up",
+]
+MOVEMENT_CLASS_TO_ID = {name: idx for idx, name in enumerate(MOVEMENT_CLASS_NAMES)}
+MOVEMENT_ID_TO_CLASS = {idx: name for name, idx in MOVEMENT_CLASS_TO_ID.items()}
+DEFAULT_MOVEMENT_LABEL_CONFIG = {
+    "strong_down_threshold": MOVEMENT_STRONG_DOWN_THRESHOLD,
+    "down_threshold": MOVEMENT_DOWN_THRESHOLD,
+    "pump_50_threshold": MOVEMENT_PUMP_50_THRESHOLD,
+    "pump_100_threshold": MOVEMENT_PUMP_100_THRESHOLD,
+    "pump_300_threshold": MOVEMENT_PUMP_300_THRESHOLD,
+}
+def classify_movement_return(
+    return_value: float,
+    movement_label_config: Dict[str, float] | None = None,
+) -> int:
+    cfg = dict(DEFAULT_MOVEMENT_LABEL_CONFIG)
+    if movement_label_config:
+        cfg.update({k: float(v) for k, v in movement_label_config.items() if k in cfg})
+    strong_down_threshold = min(cfg["strong_down_threshold"], cfg["down_threshold"])
+    down_threshold = cfg["down_threshold"]
+    pump_50_threshold = cfg["pump_50_threshold"]
+    pump_100_threshold = cfg["pump_100_threshold"]
+    pump_300_threshold = cfg["pump_300_threshold"]
+    if return_value <= strong_down_threshold:
+        return MOVEMENT_CLASS_TO_ID["strong_down"]
+    if return_value < down_threshold:
+        return MOVEMENT_CLASS_TO_ID["down"]
+    if return_value < pump_50_threshold:
+        return MOVEMENT_CLASS_TO_ID["flat"]
+    if return_value < pump_100_threshold:
+        return MOVEMENT_CLASS_TO_ID["up"]
+    if return_value < pump_300_threshold:
+        return MOVEMENT_CLASS_TO_ID["strong_up"]
+    return MOVEMENT_CLASS_TO_ID["extreme_up"]
+def derive_movement_targets(
+    horizon_returns: Sequence[float],
+    horizon_mask: Sequence[float],
+    movement_label_config: Dict[str, float] | None = None,
+) -> Dict[str, List[int]]:
+    class_targets: List[int] = []
+    class_mask: List[int] = []
+    class_names: List[str] = []
+    usable = min(len(horizon_returns), len(horizon_mask))
+    for idx in range(usable):
+        if float(horizon_mask[idx]) <= 0:
+            class_targets.append(MOVEMENT_CLASS_TO_ID["flat"])
+            class_mask.append(0)
+            class_names.append("masked")
+            continue
+        class_id = classify_movement_return(
+            float(horizon_returns[idx]),
+            movement_label_config=movement_label_config,
+        )
+        class_targets.append(class_id)
+        class_mask.append(1)
+        class_names.append(MOVEMENT_ID_TO_CLASS[class_id])
+    return {
+        "movement_class_targets": class_targets,
+        "movement_class_mask": class_mask,
+        "movement_class_names": class_names,
+    }
+def compute_movement_label_config(valid_returns: Sequence[float]) -> Dict[str, float]:
+    del valid_returns
+    return dict(DEFAULT_MOVEMENT_LABEL_CONFIG)

data/data_collator.py CHANGED Viewed

@@ -719,6 +719,8 @@ class MemecoinCollator:
             # Labels
             'labels': torch.stack([item['labels'] for item in batch]) if batch and 'labels' in batch[0] else None,
             'labels_mask': torch.stack([item['labels_mask'] for item in batch]) if batch and 'labels_mask' in batch[0] else None,
             'quality_score': torch.stack([item['quality_score'] if isinstance(item['quality_score'], torch.Tensor) else torch.tensor(item['quality_score'], dtype=torch.float32) for item in batch]) if batch and 'quality_score' in batch[0] else None,
             'class_id': torch.tensor([item.get('class_id', 0) for item in batch], dtype=torch.long),
             # Debug info

             # Labels
             'labels': torch.stack([item['labels'] for item in batch]) if batch and 'labels' in batch[0] else None,
             'labels_mask': torch.stack([item['labels_mask'] for item in batch]) if batch and 'labels_mask' in batch[0] else None,
+            'movement_class_targets': torch.stack([item['movement_class_targets'] for item in batch]) if batch and 'movement_class_targets' in batch[0] else None,
+            'movement_class_mask': torch.stack([item['movement_class_mask'] for item in batch]) if batch and 'movement_class_mask' in batch[0] else None,
             'quality_score': torch.stack([item['quality_score'] if isinstance(item['quality_score'], torch.Tensor) else torch.tensor(item['quality_score'], dtype=torch.float32) for item in batch]) if batch and 'quality_score' in batch[0] else None,
             'class_id': torch.tensor([item.get('class_id', 0) for item in batch], dtype=torch.long),
             # Debug info

data/data_loader.py CHANGED Viewed

@@ -17,6 +17,7 @@ import json
 import models.vocabulary as vocab
 from models.multi_modal_processor import MultiModalEncoder
 from data.data_fetcher import DataFetcher # NEW: Import the DataFetcher
 from requests.adapters import HTTPAdapter
 from urllib3.util.retry import Retry
@@ -128,7 +129,8 @@ class OracleDataset(Dataset):
                  start_date: Optional[datetime.datetime] = None,
                  min_trade_usd: float = 0.0,
                  max_seq_len: int = 8192,
-                 p99_clamps: Optional[Dict[str, float]] = None):
         self.max_seq_len = max_seq_len
@@ -315,6 +317,7 @@ class OracleDataset(Dataset):
         self.min_trade_usd = min_trade_usd
         self._uri_fail_counts: Dict[str, int] = {}
     def _init_http_session(self) -> None:
         # Configure robust HTTP session
@@ -1199,6 +1202,23 @@ class OracleDataset(Dataset):
             # This is fully deterministic - no runtime sampling or processing
             _timings['total'] = _time.perf_counter() - _total_start
             if idx % 100 == 0:
                 print(f"[Sample {idx}] CONTEXT mode | cache_load: {_timings['cache_load']*1000:.1f}ms | "
                       f"total: {_timings['total']*1000:.1f}ms | events: {len(cached_data.get('event_sequence', []))}")
@@ -2449,6 +2469,11 @@ class OracleDataset(Dataset):
         if not all_trades:
             # No valid trades for label computation
             return {
                 'event_sequence': event_sequence,
                 'wallets': wallet_data,
@@ -2457,7 +2482,9 @@ class OracleDataset(Dataset):
                 'embedding_pooler': pooler,
                 'labels': torch.zeros(len(self.horizons_seconds), dtype=torch.float32),
                 'labels_mask': torch.zeros(len(self.horizons_seconds), dtype=torch.float32),
-                'quality_score': torch.tensor(quality_score if quality_score is not None else 0.0, dtype=torch.float32)
             }
         # Ensure sorted
@@ -2537,6 +2564,12 @@ class OracleDataset(Dataset):
         # DEBUG: Mask summaries removed after validation
         return {
             'sample_idx': sample_idx if sample_idx is not None else -1,  # Debug trace
             'token_address': token_address,  # For debugging
@@ -2548,7 +2581,9 @@ class OracleDataset(Dataset):
             'embedding_pooler': pooler,
             'labels': torch.tensor(label_values, dtype=torch.float32),
             'labels_mask': torch.tensor(mask_values, dtype=torch.float32),
-            'quality_score': torch.tensor(quality_score if quality_score is not None else 0.0, dtype=torch.float32)
         }
     def _embed_context(self, context: Dict[str, Any], encoder: Any) -> None:

 import models.vocabulary as vocab
 from models.multi_modal_processor import MultiModalEncoder
 from data.data_fetcher import DataFetcher # NEW: Import the DataFetcher
+from data.context_targets import derive_movement_targets
 from requests.adapters import HTTPAdapter
 from urllib3.util.retry import Retry
                  start_date: Optional[datetime.datetime] = None,
                  min_trade_usd: float = 0.0,
                  max_seq_len: int = 8192,
+                 p99_clamps: Optional[Dict[str, float]] = None,
+                 movement_label_config: Optional[Dict[str, float]] = None):
         self.max_seq_len = max_seq_len
         self.min_trade_usd = min_trade_usd
         self._uri_fail_counts: Dict[str, int] = {}
+        self.movement_label_config = movement_label_config
     def _init_http_session(self) -> None:
         # Configure robust HTTP session
             # This is fully deterministic - no runtime sampling or processing
             _timings['total'] = _time.perf_counter() - _total_start
+            if 'movement_class_targets' not in cached_data and 'labels' in cached_data and 'labels_mask' in cached_data:
+                labels = cached_data['labels']
+                labels_mask = cached_data['labels_mask']
+                movement_targets = derive_movement_targets(
+                    labels.tolist() if isinstance(labels, torch.Tensor) else labels,
+                    labels_mask.tolist() if isinstance(labels_mask, torch.Tensor) else labels_mask,
+                    movement_label_config=self.movement_label_config,
+                )
+                cached_data['movement_class_targets'] = torch.tensor(
+                    movement_targets['movement_class_targets'],
+                    dtype=torch.long,
+                )
+                cached_data['movement_class_mask'] = torch.tensor(
+                    movement_targets['movement_class_mask'],
+                    dtype=torch.long,
+                )
             if idx % 100 == 0:
                 print(f"[Sample {idx}] CONTEXT mode | cache_load: {_timings['cache_load']*1000:.1f}ms | "
                       f"total: {_timings['total']*1000:.1f}ms | events: {len(cached_data.get('event_sequence', []))}")
         if not all_trades:
             # No valid trades for label computation
+            movement_targets = derive_movement_targets(
+                [0.0] * len(self.horizons_seconds),
+                [0.0] * len(self.horizons_seconds),
+                movement_label_config=self.movement_label_config,
+            )
             return {
                 'event_sequence': event_sequence,
                 'wallets': wallet_data,
                 'embedding_pooler': pooler,
                 'labels': torch.zeros(len(self.horizons_seconds), dtype=torch.float32),
                 'labels_mask': torch.zeros(len(self.horizons_seconds), dtype=torch.float32),
+                'quality_score': torch.tensor(quality_score if quality_score is not None else 0.0, dtype=torch.float32),
+                'movement_class_targets': torch.tensor(movement_targets['movement_class_targets'], dtype=torch.long),
+                'movement_class_mask': torch.tensor(movement_targets['movement_class_mask'], dtype=torch.long),
             }
         # Ensure sorted
         # DEBUG: Mask summaries removed after validation
+        movement_targets = derive_movement_targets(
+            label_values,
+            mask_values,
+            movement_label_config=self.movement_label_config,
+        )
         return {
             'sample_idx': sample_idx if sample_idx is not None else -1,  # Debug trace
             'token_address': token_address,  # For debugging
             'embedding_pooler': pooler,
             'labels': torch.tensor(label_values, dtype=torch.float32),
             'labels_mask': torch.tensor(mask_values, dtype=torch.float32),
+            'quality_score': torch.tensor(quality_score if quality_score is not None else 0.0, dtype=torch.float32),
+            'movement_class_targets': torch.tensor(movement_targets['movement_class_targets'], dtype=torch.long),
+            'movement_class_mask': torch.tensor(movement_targets['movement_class_mask'], dtype=torch.long),
         }
     def _embed_context(self, context: Dict[str, Any], encoder: Any) -> None:

log.log CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:11ac013f8e91ad65475b8106a5a072dc42f67e0773ddc4a50825e316c578e0d4
-size 3472

 version https://git-lfs.github.com/spec/v1
+oid sha256:935233e4d7669b2a25173d7ae164317e85f1a5e8b0fc1d8d1832ab0893fca471
+size 19258

models/model.py CHANGED Viewed

@@ -17,6 +17,7 @@ from models.ohlc_embedder import OHLCEmbedder
 from models.HoldersEncoder import HolderDistributionEncoder # NEW
 from models.SocialEncoders import SocialEncoder # NEW
 import models.vocabulary as vocab # For vocab sizes
 class Oracle(nn.Module):
     """
@@ -51,6 +52,7 @@ class Oracle(nn.Module):
         self.quantiles = quantiles
         self.horizons_seconds = horizons_seconds
         self.num_outputs = len(quantiles) * len(horizons_seconds)
         self.dtype = dtype
         # --- 2. Backbone: Llama-style decoder, RANDOM INIT (no pretrained weights) ---
@@ -103,6 +105,11 @@ class Oracle(nn.Module):
             nn.GELU(),
             nn.Linear(self.d_model, 1)
         )
         self.event_type_to_id = event_type_to_id
@@ -1008,9 +1015,11 @@ class Oracle(nn.Module):
             empty_mask = torch.empty(0, L, device=device, dtype=torch.long)
             empty_quantiles = torch.empty(0, self.num_outputs, device=device, dtype=self.dtype)
             empty_quality = torch.empty(0, device=device, dtype=self.dtype)
             return {
                 'quantile_logits': empty_quantiles,
                 'quality_logits': empty_quality,
                 'pooled_states': torch.empty(0, self.d_model, device=device, dtype=self.dtype),
                 'hidden_states': empty_hidden,
                 'attention_mask': empty_mask
@@ -1131,10 +1140,16 @@ class Oracle(nn.Module):
         pooled_states = self._pool_hidden_states(sequence_hidden, hf_attention_mask)
         quantile_logits = self.quantile_head(pooled_states)
         quality_logits = self.quality_head(pooled_states).squeeze(-1)
         return {
             'quantile_logits': quantile_logits,
             'quality_logits': quality_logits,
             'pooled_states': pooled_states,
             'hidden_states': sequence_hidden,
             'attention_mask': hf_attention_mask

 from models.HoldersEncoder import HolderDistributionEncoder # NEW
 from models.SocialEncoders import SocialEncoder # NEW
 import models.vocabulary as vocab # For vocab sizes
+from data.context_targets import MOVEMENT_CLASS_NAMES
 class Oracle(nn.Module):
     """
         self.quantiles = quantiles
         self.horizons_seconds = horizons_seconds
         self.num_outputs = len(quantiles) * len(horizons_seconds)
+        self.num_movement_classes = len(MOVEMENT_CLASS_NAMES)
         self.dtype = dtype
         # --- 2. Backbone: Llama-style decoder, RANDOM INIT (no pretrained weights) ---
             nn.GELU(),
             nn.Linear(self.d_model, 1)
         )
+        self.movement_head = nn.Sequential(
+            nn.Linear(self.d_model, self.d_model),
+            nn.GELU(),
+            nn.Linear(self.d_model, len(self.horizons_seconds) * self.num_movement_classes)
+        )
         self.event_type_to_id = event_type_to_id
             empty_mask = torch.empty(0, L, device=device, dtype=torch.long)
             empty_quantiles = torch.empty(0, self.num_outputs, device=device, dtype=self.dtype)
             empty_quality = torch.empty(0, device=device, dtype=self.dtype)
+            empty_movement = torch.empty(0, len(self.horizons_seconds), self.num_movement_classes, device=device, dtype=self.dtype)
             return {
                 'quantile_logits': empty_quantiles,
                 'quality_logits': empty_quality,
+                'movement_logits': empty_movement,
                 'pooled_states': torch.empty(0, self.d_model, device=device, dtype=self.dtype),
                 'hidden_states': empty_hidden,
                 'attention_mask': empty_mask
         pooled_states = self._pool_hidden_states(sequence_hidden, hf_attention_mask)
         quantile_logits = self.quantile_head(pooled_states)
         quality_logits = self.quality_head(pooled_states).squeeze(-1)
+        movement_logits = self.movement_head(pooled_states).view(
+            pooled_states.shape[0],
+            len(self.horizons_seconds),
+            self.num_movement_classes,
+        )
         return {
             'quantile_logits': quantile_logits,
             'quality_logits': quality_logits,
+            'movement_logits': movement_logits,
             'pooled_states': pooled_states,
             'hidden_states': sequence_hidden,
             'attention_mask': hf_attention_mask

pre_cache.sh CHANGED Viewed

@@ -39,7 +39,7 @@ python3 scripts/cache_dataset.py \
     --num_workers "$NUM_WORKERS" \
     --horizons_seconds "${HORIZONS_SECONDS[@]}" \
     --quantiles "${QUANTILES[@]}" \
-    --max_samples 300000 \
     "$@"
 echo "Done!"

     --num_workers "$NUM_WORKERS" \
     --horizons_seconds "${HORIZONS_SECONDS[@]}" \
     --quantiles "${QUANTILES[@]}" \
+    --max_samples 10 \
     "$@"
 echo "Done!"

scripts/cache_dataset.py CHANGED Viewed

@@ -27,13 +27,154 @@ from scripts.compute_quality_score import get_token_quality_scores, fetch_token_
 from clickhouse_driver import Client as ClickHouseClient
 from neo4j import GraphDatabase
 _worker_dataset = None
 _worker_return_class_map = None
 _worker_quality_scores_map = None
 _worker_encoder = None
 def _init_worker(db_config, dataset_config, return_class_map, quality_scores_map):
     global _worker_dataset, _worker_return_class_map, _worker_quality_scores_map
     from data.data_loader import OracleDataset
@@ -73,7 +214,7 @@ def _init_worker(db_config, dataset_config, return_class_map, quality_scores_map
 def _process_single_token_context(args):
-    idx, mint_addr, samples_per_token, output_dir = args
     global _worker_dataset, _worker_return_class_map, _worker_quality_scores_map
     try:
         class_id = _worker_return_class_map.get(mint_addr)
@@ -87,7 +228,17 @@ def _process_single_token_context(args):
         if encoder is None:
              print(f"ERROR: Worker encoder is None for mint {mint_addr}!", flush=True)
-        contexts = _worker_dataset.__cacheitem_context__(idx, num_samples_per_token=samples_per_token, encoder=encoder)
         if not contexts:
             return {'status': 'skipped', 'reason': 'no valid contexts', 'mint': mint_addr}
         q_score = _worker_quality_scores_map.get(mint_addr)
@@ -102,7 +253,16 @@ def _process_single_token_context(args):
             torch.save(ctx, output_path)
             saved_files.append(filename)
-        return {'status': 'success', 'mint': mint_addr, 'class_id': class_id, 'q_score': q_score, 'n_contexts': len(contexts), 'n_events': len(contexts[0].get('event_sequence', [])) if contexts else 0, 'files': saved_files}
     except Exception as e:
         import traceback
         return {'status': 'error', 'mint': mint_addr, 'error': str(e), 'traceback': traceback.format_exc()}
@@ -132,6 +292,10 @@ def main():
     parser.add_argument("--context_length", type=int, default=8192)
     parser.add_argument("--min_trades", type=int, default=10)
     parser.add_argument("--samples_per_token", type=int, default=1)
     parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[30, 60, 120, 240, 420])
     parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
     parser.add_argument("--num_workers", type=int, default=1)
@@ -223,7 +387,6 @@ def main():
         print(f"INFO: Eligible tokens per class: {dict(sorted(eligible_class_counts.items()))}")
-        # Compute balanced samples_per_token for each class
         num_classes = len(eligible_class_counts)
         if args.max_samples:
             target_total = args.max_samples
@@ -231,44 +394,46 @@ def main():
             target_total = 15000  # Default target: 15k balanced files
         target_per_class = target_total // max(num_classes, 1)
-        class_multipliers = {}
-        class_token_caps = {}
-        for cid, count in eligible_class_counts.items():
-            if count >= target_per_class:
-                # Enough tokens — 1 sample each, cap token count
-                class_multipliers[cid] = 1
-                class_token_caps[cid] = target_per_class
-            else:
-                # Not enough tokens — multi-sample, use all tokens
-                class_multipliers[cid] = min(10, max(1, math.ceil(target_per_class / max(count, 1))))
-                class_token_caps[cid] = count
         print(f"INFO: Target total: {target_total}, Target per class: {target_per_class}")
-        print(f"INFO: Class multipliers: {dict(sorted(class_multipliers.items()))}")
-        print(f"INFO: Class token caps: {dict(sorted(class_token_caps.items()))}")
         # Build balanced task list
         tasks = []
-        for cid, mint_list in mints_by_class.items():
-            random.shuffle(mint_list)
-            cap = class_token_caps.get(cid, len(mint_list))
-            spt = class_multipliers.get(cid, 1)
-            # Override with CLI --samples_per_token if explicitly set > 1
-            if args.samples_per_token > 1:
-                spt = args.samples_per_token
-            for i, m in mint_list[:cap]:
-                mint_addr = m['mint_address']
-                tasks.append((i, mint_addr, spt, str(output_dir)))
         random.shuffle(tasks)  # Shuffle tasks for even load distribution across workers
-        expected_files = sum(
-            class_multipliers.get(cid, 1) * min(class_token_caps.get(cid, len(ml)), len(ml))
-            for cid, ml in mints_by_class.items()
-        )
         print(f"INFO: Total tasks: {len(tasks)} (expected ~{expected_files} output files, target ~{target_total})")
         success_count, skipped_count, error_count = 0, 0, 0
         class_distribution = {}
         # --- Resume support: skip tokens that already have cached files ---
         existing_files = set(f.name for f in output_dir.glob("sample_*.pt"))
@@ -334,6 +499,8 @@ def main():
                 if result['status'] == 'success':
                     success_count += 1
                     class_distribution[result['class_id']] = class_distribution.get(result['class_id'], 0) + 1
                 elif result['status'] == 'skipped':
                     skipped_count += 1
                 else:
@@ -360,6 +527,8 @@ def main():
                         if result['status'] == 'success':
                             success_count += 1
                             class_distribution[result['class_id']] = class_distribution.get(result['class_id'], 0) + 1
                         elif result['status'] == 'skipped':
                             skipped_count += 1
                         else:
@@ -398,10 +567,14 @@ def main():
                 'num_workers': args.num_workers,
                 'horizons_seconds': args.horizons_seconds,
                 'quantiles': args.quantiles,
-                'class_multipliers': {str(k): v for k, v in class_multipliers.items()},
-                'class_token_caps': {str(k): v for k, v in class_token_caps.items()},
                 'target_total': target_total,
                 'target_per_class': target_per_class,
             }, f, indent=2)
         print(f"\n--- Done ---\nSuccess: {success_count}, Skipped: {skipped_count}, Errors: {error_count}\nFiles: {len(file_class_map)}\nLocation: {output_dir.resolve()}")

 from clickhouse_driver import Client as ClickHouseClient
 from neo4j import GraphDatabase
 _worker_dataset = None
 _worker_return_class_map = None
 _worker_quality_scores_map = None
 _worker_encoder = None
+def _to_int_list(values):
+    if values is None:
+        return []
+    if isinstance(values, torch.Tensor):
+        return [int(v) for v in values.tolist()]
+    return [int(v) for v in values]
+def _to_float_list(values):
+    if values is None:
+        return []
+    if isinstance(values, torch.Tensor):
+        return [float(v) for v in values.tolist()]
+    return [float(v) for v in values]
+def _representative_context_polarity(context):
+    labels = _to_float_list(context.get("labels"))
+    mask = _to_int_list(context.get("labels_mask"))
+    valid_returns = [label for label, keep in zip(labels, mask) if keep > 0]
+    if not valid_returns:
+        return "negative"
+    return "positive" if max(valid_returns) > 0.0 else "negative"
+def _select_contexts_by_polarity(contexts, max_keep, desired_positive=None, desired_negative=None):
+    if len(contexts) <= max_keep:
+        polarity_counts = {}
+        for context in contexts:
+            polarity = _representative_context_polarity(context)
+            polarity_counts[polarity] = polarity_counts.get(polarity, 0) + 1
+            context["representative_context_polarity"] = polarity
+        return contexts, polarity_counts
+    positive_bucket = []
+    negative_bucket = []
+    for context in contexts:
+        polarity = _representative_context_polarity(context)
+        context["representative_context_polarity"] = polarity
+        if polarity == "positive":
+            positive_bucket.append(context)
+        else:
+            negative_bucket.append(context)
+    selected = []
+    polarity_counts = {"positive": 0, "negative": 0}
+    desired_positive = max(0, int(desired_positive)) if desired_positive is not None else None
+    desired_negative = max(0, int(desired_negative)) if desired_negative is not None else None
+    if desired_positive is not None or desired_negative is not None:
+        target_positive = min(desired_positive or 0, max_keep, len(positive_bucket))
+        target_negative = min(desired_negative or 0, max_keep - target_positive, len(negative_bucket))
+        while polarity_counts["positive"] < target_positive and positive_bucket:
+            selected.append(positive_bucket.pop())
+            polarity_counts["positive"] += 1
+        while polarity_counts["negative"] < target_negative and negative_bucket:
+            selected.append(negative_bucket.pop())
+            polarity_counts["negative"] += 1
+    prefer_positive = len(positive_bucket) >= len(negative_bucket)
+    while len(selected) < max_keep and (positive_bucket or negative_bucket):
+        if prefer_positive and positive_bucket:
+            selected.append(positive_bucket.pop())
+            polarity_counts["positive"] += 1
+        elif not prefer_positive and negative_bucket:
+            selected.append(negative_bucket.pop())
+            polarity_counts["negative"] += 1
+        elif positive_bucket:
+            selected.append(positive_bucket.pop())
+            polarity_counts["positive"] += 1
+        elif negative_bucket:
+            selected.append(negative_bucket.pop())
+            polarity_counts["negative"] += 1
+        prefer_positive = not prefer_positive
+    return selected[:max_keep], polarity_counts
+def _allocate_class_targets(mints_by_class, target_total, positive_balance_min_class, positive_ratio):
+    from collections import defaultdict
+    import random
+    class_ids = sorted(mints_by_class.keys())
+    if not class_ids:
+        return {}, {}, {}
+    target_per_class = target_total // len(class_ids)
+    remainder = target_total % len(class_ids)
+    token_plans = {}
+    class_targets = {}
+    class_polarity_targets = {}
+    for pos, class_id in enumerate(class_ids):
+        class_target = target_per_class + (1 if pos < remainder else 0)
+        class_targets[class_id] = class_target
+        token_list = list(mints_by_class[class_id])
+        random.shuffle(token_list)
+        if not token_list or class_target <= 0:
+            class_polarity_targets[class_id] = {"positive": 0, "negative": 0}
+            continue
+        if class_id >= positive_balance_min_class:
+            positive_target = int(round(class_target * positive_ratio))
+            positive_target = min(max(positive_target, 0), class_target)
+        else:
+            positive_target = 0
+        negative_target = class_target - positive_target
+        class_polarity_targets[class_id] = {
+            "positive": positive_target,
+            "negative": negative_target,
+        }
+        assigned_positive = 0
+        assigned_negative = 0
+        token_count = len(token_list)
+        for sample_num in range(class_target):
+            token_idx, mint_record = token_list[sample_num % token_count]
+            mint_addr = mint_record["mint_address"]
+            plan_key = (token_idx, mint_addr)
+            if plan_key not in token_plans:
+                token_plans[plan_key] = {
+                    "samples_to_keep": 0,
+                    "desired_positive": 0,
+                    "desired_negative": 0,
+                    "class_id": class_id,
+                }
+            token_plans[plan_key]["samples_to_keep"] += 1
+            if assigned_positive < positive_target:
+                token_plans[plan_key]["desired_positive"] += 1
+                assigned_positive += 1
+            else:
+                token_plans[plan_key]["desired_negative"] += 1
+                assigned_negative += 1
+    return token_plans, class_targets, class_polarity_targets
 def _init_worker(db_config, dataset_config, return_class_map, quality_scores_map):
     global _worker_dataset, _worker_return_class_map, _worker_quality_scores_map
     from data.data_loader import OracleDataset
 def _process_single_token_context(args):
+    idx, mint_addr, samples_per_token, output_dir, oversample_factor, desired_positive, desired_negative = args
     global _worker_dataset, _worker_return_class_map, _worker_quality_scores_map
     try:
         class_id = _worker_return_class_map.get(mint_addr)
         if encoder is None:
              print(f"ERROR: Worker encoder is None for mint {mint_addr}!", flush=True)
+        candidate_contexts = _worker_dataset.__cacheitem_context__(
+            idx,
+            num_samples_per_token=max(samples_per_token, samples_per_token * max(1, oversample_factor)),
+            encoder=encoder,
+        )
+        contexts, polarity_counts = _select_contexts_by_polarity(
+            candidate_contexts,
+            samples_per_token,
+            desired_positive=desired_positive,
+            desired_negative=desired_negative,
+        )
         if not contexts:
             return {'status': 'skipped', 'reason': 'no valid contexts', 'mint': mint_addr}
         q_score = _worker_quality_scores_map.get(mint_addr)
             torch.save(ctx, output_path)
             saved_files.append(filename)
+        return {
+            'status': 'success',
+            'mint': mint_addr,
+            'class_id': class_id,
+            'q_score': q_score,
+            'n_contexts': len(contexts),
+            'n_events': len(contexts[0].get('event_sequence', [])) if contexts else 0,
+            'files': saved_files,
+            'polarity_counts': polarity_counts,
+        }
     except Exception as e:
         import traceback
         return {'status': 'error', 'mint': mint_addr, 'error': str(e), 'traceback': traceback.format_exc()}
     parser.add_argument("--context_length", type=int, default=8192)
     parser.add_argument("--min_trades", type=int, default=10)
     parser.add_argument("--samples_per_token", type=int, default=1)
+    parser.add_argument("--context_oversample_factor", type=int, default=4)
+    parser.add_argument("--cache_balance_mode", type=str, default="hybrid", choices=["class", "uniform", "hybrid"])
+    parser.add_argument("--positive_balance_min_class", type=int, default=2)
+    parser.add_argument("--positive_context_ratio", type=float, default=0.5)
     parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[30, 60, 120, 240, 420])
     parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
     parser.add_argument("--num_workers", type=int, default=1)
         print(f"INFO: Eligible tokens per class: {dict(sorted(eligible_class_counts.items()))}")
         num_classes = len(eligible_class_counts)
         if args.max_samples:
             target_total = args.max_samples
             target_total = 15000  # Default target: 15k balanced files
         target_per_class = target_total // max(num_classes, 1)
+        token_plans, class_targets, class_polarity_targets = _allocate_class_targets(
+            mints_by_class=mints_by_class,
+            target_total=target_total,
+            positive_balance_min_class=args.positive_balance_min_class,
+            positive_ratio=args.positive_context_ratio,
+        )
         print(f"INFO: Target total: {target_total}, Target per class: {target_per_class}")
+        print(f"INFO: Exact class targets: {dict(sorted(class_targets.items()))}")
+        print(f"INFO: Class polarity targets: {dict(sorted(class_polarity_targets.items()))}")
         # Build balanced task list
         tasks = []
+        if args.cache_balance_mode == "uniform":
+            target_tokens = len(filtered_mints)
+            if args.max_samples:
+                target_tokens = min(len(filtered_mints), max(1, math.ceil(args.max_samples / max(args.samples_per_token, 1))))
+            mint_pool = list(enumerate(filtered_mints))
+            random.shuffle(mint_pool)
+            for i, m in mint_pool[:target_tokens]:
+                tasks.append((i, m['mint_address'], args.samples_per_token, str(output_dir), args.context_oversample_factor, 0, args.samples_per_token))
+        else:
+            for (token_idx, mint_addr), plan in token_plans.items():
+                tasks.append((
+                    token_idx,
+                    mint_addr,
+                    plan["samples_to_keep"],
+                    str(output_dir),
+                    args.context_oversample_factor,
+                    plan["desired_positive"],
+                    plan["desired_negative"],
+                ))
         random.shuffle(tasks)  # Shuffle tasks for even load distribution across workers
+        expected_files = sum(task[2] for task in tasks)
         print(f"INFO: Total tasks: {len(tasks)} (expected ~{expected_files} output files, target ~{target_total})")
         success_count, skipped_count, error_count = 0, 0, 0
         class_distribution = {}
+        polarity_distribution = {}
         # --- Resume support: skip tokens that already have cached files ---
         existing_files = set(f.name for f in output_dir.glob("sample_*.pt"))
                 if result['status'] == 'success':
                     success_count += 1
                     class_distribution[result['class_id']] = class_distribution.get(result['class_id'], 0) + 1
+                    for polarity, count in result.get('polarity_counts', {}).items():
+                        polarity_distribution[polarity] = polarity_distribution.get(polarity, 0) + count
                 elif result['status'] == 'skipped':
                     skipped_count += 1
                 else:
                         if result['status'] == 'success':
                             success_count += 1
                             class_distribution[result['class_id']] = class_distribution.get(result['class_id'], 0) + 1
+                            for polarity, count in result.get('polarity_counts', {}).items():
+                                polarity_distribution[polarity] = polarity_distribution.get(polarity, 0) + count
                         elif result['status'] == 'skipped':
                             skipped_count += 1
                         else:
                 'num_workers': args.num_workers,
                 'horizons_seconds': args.horizons_seconds,
                 'quantiles': args.quantiles,
                 'target_total': target_total,
                 'target_per_class': target_per_class,
+                'cache_balance_mode': args.cache_balance_mode,
+                'context_polarity_distribution': polarity_distribution,
+                'class_targets': {str(k): v for k, v in class_targets.items()},
+                'class_polarity_targets': {str(k): v for k, v in class_polarity_targets.items()},
+                'positive_balance_min_class': args.positive_balance_min_class,
+                'positive_context_ratio': args.positive_context_ratio,
             }, f, indent=2)
         print(f"\n--- Done ---\nSuccess: {success_count}, Skipped: {skipped_count}, Errors: {error_count}\nFiles: {len(file_class_map)}\nLocation: {output_dir.resolve()}")

scripts/evaluate_sample.py CHANGED Viewed

@@ -2,6 +2,8 @@ import os
 import sys
 import argparse
 import random
 import torch
 from pathlib import Path
@@ -14,6 +16,7 @@ from torch.utils.data import DataLoader, Subset
 from data.data_loader import OracleDataset
 from data.data_collator import MemecoinCollator
 from models.multi_modal_processor import MultiModalEncoder
 from models.helper_encoders import ContextualTimeEncoder
 from models.token_encoder import TokenEncoder
@@ -29,6 +32,25 @@ from neo4j import GraphDatabase
 from data.data_fetcher import DataFetcher
 from scripts.analyze_distribution import get_return_class_map
 def unlog_transform(tensor):
     """Invert the log1p transform applied during training."""
     # During training: labels = torch.sign(labels) * torch.log1p(torch.abs(labels))
@@ -42,10 +64,418 @@ def parse_args():
     parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[300, 900, 1800, 3600, 7200])
     parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
     parser.add_argument("--seed", type=int, default=None)
-    parser.add_argument("--min_class", type=int, default=5, help="Filter out tokens with return class beneath this ID (e.g., 1 for >= 3x returns)")
-    parser.add_argument("--cutoff_trade_idx", type=int, default=600, help="Force the T_cutoff at this exact trade index (e.g., 10 = right after the 10th trade)")
     return parser.parse_args()
 def get_latest_checkpoint(checkpoint_dir):
     ckpt_dir = Path(checkpoint_dir)
     if ckpt_dir.exists():
@@ -59,6 +489,10 @@ def get_latest_checkpoint(checkpoint_dir):
 def main():
     load_dotenv()
     args = parse_args()
     accelerator = Accelerator(mixed_precision=args.mixed_precision)
     device = accelerator.device
@@ -186,134 +620,186 @@ def main():
     model.eval()
-    # Find a valid sample
-    valid_context_found = False
-    max_retries = 20
     retries = 0
-    raw_sample = None
-    sample_mint_addr = None
-    while not valid_context_found and retries < max_retries:
-        if args.sample_idx is not None:
-             if isinstance(args.sample_idx, str) and not args.sample_idx.isdigit():
-                 found_idx = next((i for i, m in enumerate(dataset.sampled_mints) if m['mint_address'] == args.sample_idx), None)
-                 if found_idx is None:
-                     import datetime
-                     dataset.sampled_mints.append({'mint_address': args.sample_idx, 'creator_address': '', 'timestamp': datetime.datetime.now(datetime.timezone.utc)})
-                     sample_idx = len(dataset.sampled_mints) - 1
-                 else:
-                     sample_idx = found_idx
-             else:
-                 sample_idx = int(args.sample_idx)
-                 if sample_idx >= len(dataset):
-                      raise ValueError(f"Sample index {sample_idx} out of range")
-        else:
-             sample_idx = random.randint(0, len(dataset.sampled_mints) - 1)
         sample_mint_addr = dataset.sampled_mints[sample_idx]['mint_address']
         print(f"Trying Token Address: {sample_mint_addr}")
-        contexts = dataset.__cacheitem_context__(sample_idx, num_samples_per_token=1, encoder=multi_modal_encoder, forced_cutoff_trade_idx=args.cutoff_trade_idx)
-        if not contexts or len(contexts) == 0 or contexts[0] is None:
-             print("  [Failed to generate valid context pattern, skipping...]")
-             retries += 1
-             if args.sample_idx is not None:
-                  print("Specific sample requested but failed to generate context. Exiting.")
-                  return
-             continue
         raw_sample = contexts[0]
-        valid_context_found = True
-    if not valid_context_found:
-         print(f"Could not find a valid context after {max_retries} attempts.")
-         return
-    print(f"\nEvaluating precisely on Token Address: {sample_mint_addr}")
-    batch = collator([raw_sample])
-    # Move batch to device
-    for k, v in batch.items():
-        if isinstance(v, torch.Tensor):
-            batch[k] = v.to(device)
-        elif isinstance(v, list) and len(v) > 0 and isinstance(v[0], torch.Tensor):
-            batch[k] = [t.to(device) for t in v]
-    # Add missing keys needed by model safety checks
-    if 'textual_event_indices' not in batch:
-        B, L = batch['event_type_ids'].shape
-        batch['textual_event_indices'] = torch.zeros((B, L), dtype=torch.long, device=device)
-    if 'textual_event_data' not in batch:
-        batch['textual_event_data'] = []
-    print("\n--- Running Inference ---")
-    with torch.no_grad():
-        outputs = model(batch)
-    preds = outputs["quantile_logits"][0].cpu() # shape [Horizons * Quantiles]
-    quality_preds = outputs["quality_logits"][0].cpu() if "quality_logits" in outputs else None
-    # Raw labels from dataset (these are NOT log-transformed yet)
-    gt_labels = batch["labels"][0].cpu()
-    gt_mask = batch["labels_mask"][0].cpu().bool()
-    # Quality target if available
-    gt_quality = batch["quality_score"][0].item() if "quality_score" in batch else None
-    # Un-log the predictions since model was trained on log-transformed returns
-    # But wait, did the user train with log transformed returns?
-    # Yes, train.py does: labels = torch.sign(labels) * torch.log1p(torch.abs(labels))
-    real_preds = unlog_transform(preds)
-    print("\n================== Results ==================")
-    print(f"Token Address: {batch.get('token_addresses', ['Unknown'])[0]}")
-    if gt_quality is not None:
-         print(f"Quality Score: GT = {gt_quality:.4f} | Pred = {quality_preds.item() if quality_preds is not None else 'N/A'}")
-    print("\nReturns per Horizon:")
-    num_quantiles = len(args.quantiles)
-    # The models outputs all defined horizons, but the dataset labels might be truncated
-    # if it was generated with fewer horizons.
-    num_gt_horizons = len(gt_mask) # Shape is [H]
-    for h_idx, horizon in enumerate(args.horizons_seconds):
-        horizon_min = horizon // 60
-        print(f"\n--- Horizon: {horizon}s ({horizon_min}m) ---")
-        if h_idx >= num_gt_horizons:
-            print("  [No Ground Truth Available for this Horizon - Not in Dataset]")
-            valid = False
-        else:
-            # Mask format is [H]
-            valid = gt_mask[h_idx].item()
-        if not valid:
-             print("  [No Ground Truth Available for this Horizon - Masked]")
-             # We still print predictions even if GT is masked/missing
-             print("  Predictions:")
-             for q_idx, q in enumerate(args.quantiles):
-                  flat_idx = h_idx * num_quantiles + q_idx
-                  pred_ret = real_preds[flat_idx].item()
-                  log_pred = preds[flat_idx].item()
-                  print(f"    - p{int(q*100):02d}: {pred_ret * 100:>8.2f}%  (raw log-val: {log_pred:7.4f})")
-             continue
-        # Ground truth (raw)
-        gt_ret = gt_labels[h_idx].item()
-        print(f"  Ground Truth: {gt_ret * 100:.2f}%")
-        # Predictions
-        print("  Predictions:")
-        for q_idx, q in enumerate(args.quantiles):
-             flat_idx = h_idx * num_quantiles + q_idx
-             pred_ret = real_preds[flat_idx].item()
-             log_pred = preds[flat_idx].item()
-             print(f"    - p{int(q*100):02d}: {pred_ret * 100:>8.2f}%  (raw log-val: {log_pred:7.4f})")
-    print("=============================================\n")
 if __name__ == "__main__":
     main()

 import sys
 import argparse
 import random
+import copy
+import math
 import torch
 from pathlib import Path
 from data.data_loader import OracleDataset
 from data.data_collator import MemecoinCollator
+from data.context_targets import MOVEMENT_ID_TO_CLASS
 from models.multi_modal_processor import MultiModalEncoder
 from models.helper_encoders import ContextualTimeEncoder
 from models.token_encoder import TokenEncoder
 from data.data_fetcher import DataFetcher
 from scripts.analyze_distribution import get_return_class_map
+ABLATION_SWEEP_MODES = [
+    "wallet",
+    "graph",
+    "social",
+    "token",
+    "holder",
+    "ohlc",
+    "ohlc_wallet",
+    "trade",
+    "onchain",
+    "wallet_graph",
+]
+OHLC_PROBE_MODES = [
+    "ohlc_reverse",
+    "ohlc_shuffle_chunks",
+    "ohlc_mask_recent",
+]
 def unlog_transform(tensor):
     """Invert the log1p transform applied during training."""
     # During training: labels = torch.sign(labels) * torch.log1p(torch.abs(labels))
     parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[300, 900, 1800, 3600, 7200])
     parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
     parser.add_argument("--seed", type=int, default=None)
+    parser.add_argument("--min_class", type=int, default=3, help="Filter out tokens with return class beneath this ID (e.g., 1 for >= 3x returns)")
+    parser.add_argument("--cutoff_trade_idx", type=int, default=200, help="Force the T_cutoff at this exact trade index (e.g., 10 = right after the 10th trade)")
+    parser.add_argument("--num_samples", type=int, default=1, help="Number of valid samples to evaluate and aggregate.")
+    parser.add_argument("--max_retries", type=int, default=100, help="Maximum attempts to find valid contexts across samples.")
+    parser.add_argument("--show_each", action="store_true", help="Print per-sample details for every evaluated sample.")
+    parser.add_argument(
+        "--ablation",
+        type=str,
+        default="none",
+        choices=["none", "wallet", "graph", "wallet_graph", "social", "token", "holder", "ohlc", "ohlc_wallet", "trade", "onchain", "all", "sweep", "ohlc_probe"],
+        help="Run inference with selected signal families removed, or use 'sweep' to rank multiple families.",
+    )
     return parser.parse_args()
+def clone_batch(batch):
+    cloned = {}
+    for key, value in batch.items():
+        if isinstance(value, torch.Tensor):
+            cloned[key] = value.clone()
+        else:
+            cloned[key] = copy.deepcopy(value)
+    return cloned
+def _empty_wallet_encoder_inputs(device):
+    return {
+        'username_embed_indices': torch.tensor([], device=device, dtype=torch.long),
+        'profile_rows': [],
+        'social_rows': [],
+        'holdings_batch': [],
+    }
+def _empty_token_encoder_inputs(device):
+    return {
+        'name_embed_indices': torch.tensor([], device=device, dtype=torch.long),
+        'symbol_embed_indices': torch.tensor([], device=device, dtype=torch.long),
+        'image_embed_indices': torch.tensor([], device=device, dtype=torch.long),
+        'protocol_ids': torch.tensor([], device=device, dtype=torch.long),
+        'is_vanity_flags': torch.tensor([], device=device, dtype=torch.bool),
+        '_addresses_for_lookup': [],
+    }
+def apply_ablation(batch, mode, device):
+    if mode == "none":
+        return batch
+    ablated = clone_batch(batch)
+    if mode in {"wallet", "wallet_graph", "ohlc_wallet", "all"}:
+        for key in (
+            "wallet_indices",
+            "dest_wallet_indices",
+            "original_author_indices",
+            "holder_snapshot_indices",
+        ):
+            if key in ablated:
+                ablated[key].zero_()
+        ablated["wallet_encoder_inputs"] = _empty_wallet_encoder_inputs(device)
+        ablated["wallet_addr_to_batch_idx"] = {}
+        ablated["holder_snapshot_raw_data"] = []
+        ablated["graph_updater_links"] = {}
+    if mode in {"graph", "wallet_graph", "all"}:
+        ablated["graph_updater_links"] = {}
+    if mode in {"social", "all"}:
+        if "textual_event_indices" in ablated:
+            ablated["textual_event_indices"].zero_()
+        ablated["textual_event_data"] = []
+    if mode in {"token", "all"}:
+        for key in (
+            "token_indices",
+            "quote_token_indices",
+            "trending_token_indices",
+            "boosted_token_indices",
+        ):
+            if key in ablated:
+                ablated[key].zero_()
+        ablated["token_encoder_inputs"] = _empty_token_encoder_inputs(device)
+    if mode in {"holder", "all"}:
+        if "holder_snapshot_indices" in ablated:
+            ablated["holder_snapshot_indices"].zero_()
+        ablated["holder_snapshot_raw_data"] = []
+    if mode in {"ohlc", "ohlc_wallet", "all"}:
+        if "ohlc_indices" in ablated:
+            ablated["ohlc_indices"].zero_()
+        if "ohlc_price_tensors" in ablated:
+            ablated["ohlc_price_tensors"] = torch.zeros_like(ablated["ohlc_price_tensors"])
+        if "ohlc_interval_ids" in ablated:
+            ablated["ohlc_interval_ids"] = torch.zeros_like(ablated["ohlc_interval_ids"])
+    if mode in {"trade", "all"}:
+        for key in (
+            "trade_numerical_features",
+            "deployer_trade_numerical_features",
+            "smart_wallet_trade_numerical_features",
+            "transfer_numerical_features",
+            "pool_created_numerical_features",
+            "liquidity_change_numerical_features",
+            "fee_collected_numerical_features",
+            "token_burn_numerical_features",
+            "supply_lock_numerical_features",
+            "boosted_token_numerical_features",
+            "trending_token_numerical_features",
+            "dexboost_paid_numerical_features",
+            "global_trending_numerical_features",
+            "chainsnapshot_numerical_features",
+            "lighthousesnapshot_numerical_features",
+            "dexprofile_updated_flags",
+        ):
+            if key in ablated:
+                ablated[key] = torch.zeros_like(ablated[key])
+        for key in (
+            "trade_dex_ids",
+            "trade_direction_ids",
+            "trade_mev_protection_ids",
+            "trade_is_bundle_ids",
+            "pool_created_protocol_ids",
+            "liquidity_change_type_ids",
+            "trending_token_source_ids",
+            "trending_token_timeframe_ids",
+            "lighthousesnapshot_protocol_ids",
+            "lighthousesnapshot_timeframe_ids",
+            "migrated_protocol_ids",
+            "alpha_group_ids",
+            "channel_ids",
+            "exchange_ids",
+        ):
+            if key in ablated:
+                ablated[key] = torch.zeros_like(ablated[key])
+    if mode == "onchain":
+        if "onchain_snapshot_numerical_features" in ablated:
+            ablated["onchain_snapshot_numerical_features"] = torch.zeros_like(ablated["onchain_snapshot_numerical_features"])
+    return ablated
+def _chunk_permutation_indices(length, chunk_size):
+    if length <= 0:
+        return []
+    chunks = [list(range(i, min(i + chunk_size, length))) for i in range(0, length, chunk_size)]
+    if len(chunks) <= 1:
+        return list(range(length))
+    permuted = list(reversed(chunks))
+    out = []
+    for chunk in permuted:
+        out.extend(chunk)
+    return out
+def apply_ohlc_probe(batch, mode):
+    probed = clone_batch(batch)
+    if "ohlc_price_tensors" not in probed or probed["ohlc_price_tensors"].numel() == 0:
+        return probed
+    ohlc = probed["ohlc_price_tensors"].clone()
+    seq_len = ohlc.shape[-1]
+    if mode == "ohlc_reverse":
+        probed["ohlc_price_tensors"] = torch.flip(ohlc, dims=[-1])
+    elif mode == "ohlc_shuffle_chunks":
+        perm = _chunk_permutation_indices(seq_len, chunk_size=30)
+        idx = torch.tensor(perm, device=ohlc.device, dtype=torch.long)
+        probed["ohlc_price_tensors"] = ohlc.index_select(-1, idx)
+    elif mode == "ohlc_mask_recent":
+        keep = max(seq_len - 60, 0)
+        if keep < seq_len and keep > 0:
+            fill = ohlc[..., keep - 1:keep].expand_as(ohlc[..., keep:])
+            ohlc[..., keep:] = fill
+        elif keep == 0:
+            ohlc.zero_()
+        probed["ohlc_price_tensors"] = ohlc
+    return probed
+def run_inference(model, batch):
+    with torch.no_grad():
+        outputs = model(batch)
+    preds = outputs["quantile_logits"][0].detach().cpu()
+    quality_pred = outputs["quality_logits"][0].detach().cpu() if "quality_logits" in outputs else None
+    movement_pred = outputs["movement_logits"][0].detach().cpu() if "movement_logits" in outputs else None
+    return preds, quality_pred, movement_pred
+def print_results(title, batch, preds, quality_pred, movement_pred, gt_labels, gt_mask, gt_quality, horizons_seconds, quantiles, reference_preds=None, reference_quality=None):
+    real_preds = unlog_transform(preds)
+    num_quantiles = len(quantiles)
+    num_gt_horizons = len(gt_mask)
+    print(f"\n================== {title} ==================")
+    print(f"Token Address: {batch.get('token_addresses', ['Unknown'])[0]}")
+    if gt_quality is not None:
+        quality_line = f"Quality Score: GT = {gt_quality:.4f} | Pred = {quality_pred.item() if quality_pred is not None else 'N/A'}"
+        if reference_quality is not None and quality_pred is not None:
+            quality_delta = quality_pred.item() - reference_quality.item()
+            quality_line += f" | Delta vs Full = {quality_delta:+.6f}"
+        print(quality_line)
+    if movement_pred is not None:
+        movement_targets = batch.get("movement_class_targets")
+        movement_mask = batch.get("movement_class_mask")
+        print("Movement Classes:")
+        for h_idx, horizon in enumerate(horizons_seconds):
+            if h_idx >= movement_pred.shape[0]:
+                break
+            target_txt = "N/A"
+            if movement_targets is not None and movement_mask is not None and bool(movement_mask[0, h_idx].item()):
+                target_txt = MOVEMENT_ID_TO_CLASS.get(int(movement_targets[0, h_idx].item()), "unknown")
+            pred_class = int(movement_pred[h_idx].argmax().item())
+            pred_name = MOVEMENT_ID_TO_CLASS.get(pred_class, "unknown")
+            pred_prob = float(torch.softmax(movement_pred[h_idx], dim=-1)[pred_class].item())
+            print(
+                f"  {horizon:>4}s GT = {target_txt:<12} | "
+                f"Pred = {pred_name:<12} | "
+                f"Conf = {pred_prob:.4f}"
+            )
+    if "context_class_name" in batch:
+        print(f"Context Class: {batch['context_class_name'][0]}")
+    print("\nReturns per Horizon:")
+    for h_idx, horizon in enumerate(horizons_seconds):
+        horizon_min = horizon // 60
+        print(f"\n--- Horizon: {horizon}s ({horizon_min}m) ---")
+        if h_idx >= num_gt_horizons:
+            print("  [No Ground Truth Available for this Horizon - Not in Dataset]")
+            valid = False
+        else:
+            valid = gt_mask[h_idx].item()
+        if not valid:
+            print("  [No Ground Truth Available for this Horizon - Masked]")
+        else:
+            gt_ret = gt_labels[h_idx].item()
+            print(f"  Ground Truth: {gt_ret * 100:.2f}%")
+        print("  Predictions:")
+        for q_idx, q in enumerate(quantiles):
+            flat_idx = h_idx * num_quantiles + q_idx
+            pred_ret = real_preds[flat_idx].item()
+            log_pred = preds[flat_idx].item()
+            line = f"    - p{int(q*100):02d}: {pred_ret * 100:>8.2f}%  (raw log-val: {log_pred:7.4f})"
+            if reference_preds is not None:
+                ref_ret = unlog_transform(reference_preds)[flat_idx].item()
+                line += f" | Delta vs Full: {(pred_ret - ref_ret) * 100:+7.2f}%"
+            print(line)
+    print("=============================================\n")
+def resolve_sample_index(dataset, sample_idx_arg, rng):
+    if sample_idx_arg is not None:
+        if isinstance(sample_idx_arg, str) and not sample_idx_arg.isdigit():
+            found_idx = next((i for i, m in enumerate(dataset.sampled_mints) if m['mint_address'] == sample_idx_arg), None)
+            if found_idx is None:
+                raise ValueError(f"Mint address {sample_idx_arg} not found in filtered dataset")
+            return found_idx
+        resolved = int(sample_idx_arg)
+        if resolved >= len(dataset):
+            raise ValueError(f"Sample index {resolved} out of range")
+        return resolved
+    return rng.randint(0, len(dataset.sampled_mints) - 1)
+def move_batch_to_device(batch, device):
+    for k, v in batch.items():
+        if isinstance(v, torch.Tensor):
+            batch[k] = v.to(device)
+        elif isinstance(v, list) and len(v) > 0 and isinstance(v[0], torch.Tensor):
+            batch[k] = [t.to(device) for t in v]
+    if 'textual_event_indices' not in batch:
+        B, L = batch['event_type_ids'].shape
+        batch['textual_event_indices'] = torch.zeros((B, L), dtype=torch.long, device=device)
+    if 'textual_event_data' not in batch:
+        batch['textual_event_data'] = []
+    return batch
+def init_aggregate(horizons_seconds, quantiles):
+    return {
+        "count": 0,
+        "quality_full_sum": 0.0,
+        "quality_abl_sum": 0.0,
+        "quality_delta_sum": 0.0,
+        "gt_quality_sum": 0.0,
+        "per_hq": {
+            (h, q): {
+                "full_sum": 0.0,
+                "abl_sum": 0.0,
+                "delta_sum": 0.0,
+                "abs_delta_sum": 0.0,
+                "gt_sum": 0.0,
+                "valid_count": 0,
+            }
+            for h in horizons_seconds for q in quantiles
+        },
+    }
+def update_aggregate(stats, full_preds, gt_labels, gt_mask, gt_quality, horizons_seconds, quantiles, ablated_preds=None, full_quality=None, ablated_quality=None):
+    stats["count"] += 1
+    if gt_quality is not None:
+        stats["gt_quality_sum"] += float(gt_quality)
+    if full_quality is not None:
+        stats["quality_full_sum"] += float(full_quality.item())
+    if ablated_quality is not None:
+        stats["quality_abl_sum"] += float(ablated_quality.item())
+    if full_quality is not None and ablated_quality is not None:
+        stats["quality_delta_sum"] += float(ablated_quality.item() - full_quality.item())
+    full_real = unlog_transform(full_preds)
+    ablated_real = unlog_transform(ablated_preds) if ablated_preds is not None else None
+    num_quantiles = len(quantiles)
+    for h_idx, horizon in enumerate(horizons_seconds):
+        valid = h_idx < len(gt_mask) and bool(gt_mask[h_idx].item())
+        gt_ret = float(gt_labels[h_idx].item()) if valid else math.nan
+        for q_idx, q in enumerate(quantiles):
+            flat_idx = h_idx * num_quantiles + q_idx
+            bucket = stats["per_hq"][(horizon, q)]
+            full_val = float(full_real[flat_idx].item())
+            bucket["full_sum"] += full_val
+            if ablated_real is not None:
+                abl_val = float(ablated_real[flat_idx].item())
+                delta = abl_val - full_val
+                bucket["abl_sum"] += abl_val
+                bucket["delta_sum"] += delta
+                bucket["abs_delta_sum"] += abs(delta)
+            if valid:
+                bucket["gt_sum"] += gt_ret
+                bucket["valid_count"] += 1
+def print_aggregate_summary(stats, horizons_seconds, quantiles, ablation_mode):
+    n = stats["count"]
+    print("\n================== Aggregate Summary ==================")
+    print(f"Evaluated Samples: {n}")
+    if n == 0:
+        print("No valid samples collected.")
+        print("=======================================================\n")
+        return
+    if ablation_mode != "none":
+        print(
+            f"Quality Mean: full={stats['quality_full_sum'] / n:.6f} | "
+            f"ablated={stats['quality_abl_sum'] / n:.6f} | "
+            f"delta={stats['quality_delta_sum'] / n:+.6f}"
+        )
+    for horizon in horizons_seconds:
+        horizon_min = horizon // 60
+        print(f"\n--- Horizon: {horizon}s ({horizon_min}m) ---")
+        valid_counts = [stats["per_hq"][(horizon, q)]["valid_count"] for q in quantiles]
+        valid_count = max(valid_counts) if valid_counts else 0
+        if valid_count > 0:
+            gt_mean = stats["per_hq"][(horizon, quantiles[0])]["gt_sum"] / valid_count
+            print(f"  Mean Ground Truth over valid labels: {gt_mean * 100:.2f}% (n={valid_count})")
+        else:
+            print("  Mean Ground Truth over valid labels: N/A")
+        for q in quantiles:
+            bucket = stats["per_hq"][(horizon, q)]
+            full_mean = bucket["full_sum"] / n
+            line = f"  p{int(q*100):02d} mean full: {full_mean * 100:>8.2f}%"
+            if ablation_mode != "none":
+                abl_mean = bucket["abl_sum"] / n
+                delta_mean = bucket["delta_sum"] / n
+                abs_delta_mean = bucket["abs_delta_sum"] / n
+                line += (
+                    f" | ablated: {abl_mean * 100:>8.2f}%"
+                    f" | delta: {delta_mean * 100:+8.2f}%"
+                    f" | mean|delta|: {abs_delta_mean * 100:>8.2f}%"
+                )
+            print(line)
+    print("=======================================================\n")
+def summarize_influence_score(stats, horizons_seconds, quantiles):
+    n = stats["count"]
+    if n == 0:
+        return 0.0
+    total = 0.0
+    denom = 0
+    for horizon in horizons_seconds:
+        for q in quantiles:
+            total += stats["per_hq"][(horizon, q)]["abs_delta_sum"] / n
+            denom += 1
+    return total / max(denom, 1)
+def print_probe_summary(mode_to_stats, horizons_seconds, quantiles):
+    rankings = []
+    for mode in OHLC_PROBE_MODES:
+        score = summarize_influence_score(mode_to_stats[mode], horizons_seconds, quantiles)
+        rankings.append((mode, score))
+    rankings.sort(key=lambda x: x[1], reverse=True)
+    print("\n================== OHLC Probe Ranking ==================")
+    for rank, (mode, score) in enumerate(rankings, start=1):
+        print(f"{rank:>2}. {mode:<20} mean|delta| = {score * 100:8.2f}%")
+    print("========================================================\n")
+    for mode, _ in rankings:
+        print_aggregate_summary(mode_to_stats[mode], horizons_seconds, quantiles, mode)
 def get_latest_checkpoint(checkpoint_dir):
     ckpt_dir = Path(checkpoint_dir)
     if ckpt_dir.exists():
 def main():
     load_dotenv()
     args = parse_args()
+    rng = random.Random(args.seed)
+    if args.seed is not None:
+        random.seed(args.seed)
+        torch.manual_seed(args.seed)
     accelerator = Accelerator(mixed_precision=args.mixed_precision)
     device = accelerator.device
     model.eval()
+    stats = init_aggregate(args.horizons_seconds, args.quantiles)
+    selected_modes = [] if args.ablation == "none" else (ABLATION_SWEEP_MODES if args.ablation == "sweep" else ([] if args.ablation == "ohlc_probe" else [args.ablation]))
+    mode_to_stats = {mode: init_aggregate(args.horizons_seconds, args.quantiles) for mode in selected_modes}
+    probe_to_stats = {mode: init_aggregate(args.horizons_seconds, args.quantiles) for mode in OHLC_PROBE_MODES} if args.ablation == "ohlc_probe" else {}
+    max_target_samples = max(1, args.num_samples)
     retries = 0
+    collected = 0
+    seen_indices = set()
+    while collected < max_target_samples and retries < args.max_retries:
+        sample_idx = resolve_sample_index(dataset, args.sample_idx, rng)
+        if args.sample_idx is None and sample_idx in seen_indices and len(seen_indices) < len(dataset.sampled_mints):
+            retries += 1
+            continue
+        seen_indices.add(sample_idx)
         sample_mint_addr = dataset.sampled_mints[sample_idx]['mint_address']
         print(f"Trying Token Address: {sample_mint_addr}")
+        contexts = dataset.__cacheitem_context__(
+            sample_idx,
+            num_samples_per_token=1,
+            encoder=multi_modal_encoder,
+            forced_cutoff_trade_idx=args.cutoff_trade_idx,
+        )
+        if not contexts or contexts[0] is None:
+            print("  [Failed to generate valid context pattern, skipping...]")
+            retries += 1
+            if args.sample_idx is not None:
+                print("Specific sample requested but failed to generate context. Exiting.")
+                return
+            continue
         raw_sample = contexts[0]
+        batch = move_batch_to_device(collator([raw_sample]), device)
+        gt_labels = batch["labels"][0].cpu()
+        gt_mask = batch["labels_mask"][0].cpu().bool()
+        gt_quality = batch["quality_score"][0].item() if "quality_score" in batch else None
+        if collected == 0 or args.show_each:
+            print(f"\nEvaluating sample {collected + 1}/{max_target_samples} on Token Address: {sample_mint_addr}")
+            print("\n--- Running Inference ---")
+        full_preds, full_quality, full_direction = run_inference(model, batch)
+        ablation_outputs = {}
+        for mode in selected_modes:
+            ablated_batch = apply_ablation(batch, mode, device)
+            ablated_preds, ablated_quality, ablated_direction = run_inference(model, ablated_batch)
+            ablation_outputs[mode] = (ablated_batch, ablated_preds, ablated_quality, ablated_direction)
+        probe_outputs = {}
+        if args.ablation == "ohlc_probe":
+            for mode in OHLC_PROBE_MODES:
+                probe_batch = apply_ohlc_probe(batch, mode)
+                probe_preds, probe_quality, probe_direction = run_inference(model, probe_batch)
+                probe_outputs[mode] = (probe_batch, probe_preds, probe_quality, probe_direction)
+        if collected == 0 or args.show_each:
+            print_results(
+                title="Full Results",
+                batch=batch,
+                preds=full_preds,
+                quality_pred=full_quality,
+                direction_pred=full_direction,
+                gt_labels=gt_labels,
+                gt_mask=gt_mask,
+                gt_quality=gt_quality,
+                horizons_seconds=args.horizons_seconds,
+                quantiles=args.quantiles,
+            )
+            if args.ablation != "none":
+                if args.ablation == "sweep":
+                    print(f"Collected full predictions for {len(selected_modes)} ablation families on this sample. Aggregate ranking will be printed at the end.")
+                elif args.ablation == "ohlc_probe":
+                    for mode in OHLC_PROBE_MODES:
+                        probe_batch, probe_preds, probe_quality, probe_direction = probe_outputs[mode]
+                        print_results(
+                            title=f"OHLC Probe ({mode})",
+                            batch=probe_batch,
+                            preds=probe_preds,
+                            quality_pred=probe_quality,
+                            direction_pred=probe_direction,
+                            gt_labels=gt_labels,
+                            gt_mask=gt_mask,
+                            gt_quality=gt_quality,
+                            horizons_seconds=args.horizons_seconds,
+                            quantiles=args.quantiles,
+                            reference_preds=full_preds,
+                            reference_quality=full_quality,
+                        )
+                else:
+                    ablated_batch, ablated_preds, ablated_quality, ablated_direction = ablation_outputs[args.ablation]
+                    print_results(
+                        title=f"Ablation Results ({args.ablation})",
+                        batch=ablated_batch,
+                        preds=ablated_preds,
+                        quality_pred=ablated_quality,
+                        direction_pred=ablated_direction,
+                        gt_labels=gt_labels,
+                        gt_mask=gt_mask,
+                        gt_quality=gt_quality,
+                        horizons_seconds=args.horizons_seconds,
+                        quantiles=args.quantiles,
+                        reference_preds=full_preds,
+                        reference_quality=full_quality,
+                    )
+        update_aggregate(
+            stats=stats,
+            full_preds=full_preds,
+            gt_labels=gt_labels,
+            gt_mask=gt_mask,
+            gt_quality=gt_quality,
+            horizons_seconds=args.horizons_seconds,
+            quantiles=args.quantiles,
+            full_quality=full_quality,
+        )
+        for mode, (_, ablated_preds, ablated_quality, _) in ablation_outputs.items():
+            update_aggregate(
+                stats=mode_to_stats[mode],
+                full_preds=full_preds,
+                gt_labels=gt_labels,
+                gt_mask=gt_mask,
+                gt_quality=gt_quality,
+                horizons_seconds=args.horizons_seconds,
+                quantiles=args.quantiles,
+                ablated_preds=ablated_preds,
+                full_quality=full_quality,
+                ablated_quality=ablated_quality,
+            )
+        for mode, (_, probe_preds, probe_quality, _) in probe_outputs.items():
+            update_aggregate(
+                stats=probe_to_stats[mode],
+                full_preds=full_preds,
+                gt_labels=gt_labels,
+                gt_mask=gt_mask,
+                gt_quality=gt_quality,
+                horizons_seconds=args.horizons_seconds,
+                quantiles=args.quantiles,
+                ablated_preds=probe_preds,
+                full_quality=full_quality,
+                ablated_quality=probe_quality,
+            )
+        collected += 1
+        retries += 1
+        if args.sample_idx is not None:
+            break
+    if collected == 0:
+        print(f"Could not find a valid context after {args.max_retries} attempts.")
+        return
+    if collected < max_target_samples:
+        print(f"WARNING: Requested {max_target_samples} samples but only evaluated {collected}.")
+    if args.ablation == "none":
+        print_aggregate_summary(stats, args.horizons_seconds, args.quantiles, args.ablation)
+        return
+    if args.ablation == "ohlc_probe":
+        print_probe_summary(probe_to_stats, args.horizons_seconds, args.quantiles)
+        return
+    if args.ablation == "sweep":
+        rankings = []
+        for mode in selected_modes:
+            score = summarize_influence_score(mode_to_stats[mode], args.horizons_seconds, args.quantiles)
+            rankings.append((mode, score))
+        rankings.sort(key=lambda x: x[1], reverse=True)
+        print("\n================== Influence Ranking ==================")
+        for rank, (mode, score) in enumerate(rankings, start=1):
+            print(f"{rank:>2}. {mode:<12} mean|delta| = {score * 100:8.2f}%")
+        print("=======================================================\n")
+        for mode, _ in rankings:
+            print_aggregate_summary(mode_to_stats[mode], args.horizons_seconds, args.quantiles, mode)
+    else:
+        print_aggregate_summary(mode_to_stats[args.ablation], args.horizons_seconds, args.quantiles, args.ablation)
 if __name__ == "__main__":
     main()

train.py CHANGED Viewed

@@ -52,6 +52,7 @@ from neo4j import GraphDatabase
 from data.data_fetcher import DataFetcher
 from data.data_loader import OracleDataset
 from data.data_collator import MemecoinCollator
 from models.multi_modal_processor import MultiModalEncoder
 from models.helper_encoders import ContextualTimeEncoder
 from models.token_encoder import TokenEncoder
@@ -148,6 +149,89 @@ def quantile_pinball_loss_per_sample(
     return per_sample_num / per_sample_den
 def create_balanced_split(dataset, n_val_per_class: int = 1, seed: int = 42):
     """
     Create train/val split with balanced classes in validation set.
@@ -207,6 +291,8 @@ def run_validation(model, val_dataloader, accelerator, quantiles, quality_loss_f
     total_loss = 0.0
     total_return_loss = 0.0
     total_quality_loss = 0.0
     n_batches = 0
     # Per-class metrics
@@ -228,9 +314,12 @@ def run_validation(model, val_dataloader, accelerator, quantiles, quality_loss_f
             preds = outputs["quantile_logits"]
             quality_preds = outputs["quality_logits"]
             labels = batch["labels"]
             labels_mask = batch["labels_mask"]
             quality_targets = batch["quality_score"].to(accelerator.device, dtype=quality_preds.dtype)
             if labels_mask.sum() == 0:
                 return_loss = torch.tensor(0.0, device=accelerator.device)
@@ -240,11 +329,22 @@ def run_validation(model, val_dataloader, accelerator, quantiles, quality_loss_f
                 return_loss = quantile_pinball_loss(preds, labels, labels_mask, quantiles)
             quality_loss = quality_loss_fn(quality_preds, quality_targets)
-            loss = return_loss + quality_loss
             total_loss += loss.item()
             total_return_loss += return_loss.item()
             total_quality_loss += quality_loss.item()
             n_batches += 1
             # Track per-class losses
@@ -264,6 +364,8 @@ def run_validation(model, val_dataloader, accelerator, quantiles, quality_loss_f
         'val/loss': total_loss / n_batches,
         'val/return_loss': total_return_loss / n_batches,
         'val/quality_loss': total_quality_loss / n_batches,
         'val/n_batches': n_batches,
         'class_losses': {k: v['loss'] / max(v['count'], 1) for k, v in class_losses.items()}
     }
@@ -446,6 +548,7 @@ def parse_args() -> argparse.Namespace:
     parser.add_argument("--resume_from_checkpoint", type=str, default=None, help="Path to checkpoint or 'latest'")
     parser.add_argument("--val_samples_per_class", type=int, default=1, help="Number of validation samples per class (default 1)")
     parser.add_argument("--val_every", type=int, default=1000, help="Run validation every N steps (default 1000)")
     return parser.parse_args()
@@ -815,6 +918,10 @@ def main() -> None:
     # --- 7. Training Loop ---
     quality_loss_fn = nn.MSELoss()
     logger.info("***** Running training *****")
     logger.info(f"  Num examples = {len(dataset)}")
@@ -865,6 +972,7 @@ def main() -> None:
                 preds = outputs["quantile_logits"]
                 quality_preds = outputs["quality_logits"]
                 labels = batch["labels"]
                 labels_mask = batch["labels_mask"]
                 if "quality_score" not in batch:
@@ -920,6 +1028,21 @@ def main() -> None:
                     per_sample_return = quantile_pinball_loss_per_sample(preds, labels, labels_mask, quantiles)
                 quality_loss = quality_loss_fn(quality_preds, quality_targets)
                 per_sample_quality = (quality_preds - quality_targets).pow(2)
                 # Apply per-sample class weighting to return loss
@@ -928,10 +1051,10 @@ def main() -> None:
                     sample_weights = class_loss_weights[batch_class_ids]  # [B]
                     # Scale return loss by mean class weight for this batch
                     class_weight_factor = sample_weights.mean()
-                    loss = return_loss * class_weight_factor + quality_loss
                 else:
                     class_weight_factor = torch.tensor(1.0, device=accelerator.device, dtype=return_loss.dtype)
-                    loss = return_loss + quality_loss
                 per_sample_total = per_sample_return * class_weight_factor + per_sample_quality
                 if not torch.isfinite(loss).all().item():
@@ -957,8 +1080,10 @@ def main() -> None:
                             "loss": loss.unsqueeze(0),
                             "return_loss": return_loss.unsqueeze(0),
                             "quality_loss": quality_loss.unsqueeze(0),
                             "preds": preds,
                             "quality_preds": quality_preds,
                             "labels_raw": batch.get("labels"),
                             "labels_log": labels,
                             "labels_mask": labels_mask,
@@ -1065,6 +1190,7 @@ def main() -> None:
                 current_loss = loss.item()
                 current_return_loss = return_loss.item()
                 current_quality_loss = quality_loss.item()
                 current_class_weight_factor = float(class_weight_factor.item()) if isinstance(class_weight_factor, torch.Tensor) else float(class_weight_factor)
                 current_mask_coverage = float(labels_mask.float().mean().item()) if labels_mask is not None else 0.0
                 epoch_loss += current_loss
@@ -1080,6 +1206,8 @@ def main() -> None:
                         "train/loss": current_loss,
                         "train/return_loss": current_return_loss,
                         "train/quality_loss": current_quality_loss,
                         "train/class_weight_factor": current_class_weight_factor,
                         "train/mask_coverage": current_mask_coverage,
                         "train/loss_ema": loss_ema if loss_ema is not None else current_loss,
@@ -1149,7 +1277,9 @@ def main() -> None:
                         logger.info(
                             f"Validation - Loss: {val_metrics['val/loss']:.4f} | "
                             f"Return: {val_metrics['val/return_loss']:.4f} | "
-                            f"Quality: {val_metrics['val/quality_loss']:.4f}"
                         )
                         # Log per-class losses
                         class_loss_str = " | ".join(
@@ -1162,6 +1292,8 @@ def main() -> None:
                             "val/loss": val_metrics['val/loss'],
                             "val/return_loss": val_metrics['val/return_loss'],
                             "val/quality_loss": val_metrics['val/quality_loss'],
                         }, step=total_steps)
                         # Log per-class losses to tensorboard
                         for class_id, class_loss in val_metrics['class_losses'].items():

 from data.data_fetcher import DataFetcher
 from data.data_loader import OracleDataset
 from data.data_collator import MemecoinCollator
+from data.context_targets import MOVEMENT_CLASS_NAMES
 from models.multi_modal_processor import MultiModalEncoder
 from models.helper_encoders import ContextualTimeEncoder
 from models.token_encoder import TokenEncoder
     return per_sample_num / per_sample_den
+def masked_movement_cross_entropy(
+    logits: torch.Tensor,
+    targets: torch.Tensor,
+    mask: torch.Tensor,
+    class_weights: Optional[torch.Tensor] = None,
+) -> torch.Tensor:
+    if mask.sum() == 0:
+        return torch.tensor(0.0, device=logits.device, dtype=logits.dtype)
+    flat_logits = logits.reshape(-1, logits.shape[-1])
+    flat_targets = targets.reshape(-1)
+    flat_mask = mask.reshape(-1).bool()
+    valid_logits = flat_logits[flat_mask]
+    valid_targets = flat_targets[flat_mask]
+    if valid_logits.numel() == 0:
+        return torch.tensor(0.0, device=logits.device, dtype=logits.dtype)
+    return nn.functional.cross_entropy(valid_logits, valid_targets, weight=class_weights)
+def movement_accuracy(
+    logits: torch.Tensor,
+    targets: torch.Tensor,
+    mask: torch.Tensor,
+) -> float:
+    if mask.sum().item() == 0:
+        return 0.0
+    preds = logits.argmax(dim=-1)
+    valid = mask.bool()
+    correct = (preds[valid] == targets[valid]).float()
+    if correct.numel() == 0:
+        return 0.0
+    return float(correct.mean().item())
+def estimate_movement_class_weights(
+    dataset,
+    indices,
+    movement_label_config: Optional[Dict[str, float]] = None,
+    sample_cap: int = 4096,
+) -> torch.Tensor:
+    del movement_label_config
+    counts = torch.ones(len(MOVEMENT_CLASS_NAMES), dtype=torch.float32)
+    if not indices:
+        return counts
+    for idx in indices[: min(len(indices), sample_cap)]:
+        try:
+            item = dataset[idx]
+        except Exception:
+            continue
+        if not item:
+            continue
+        labels = item.get("labels")
+        labels_mask = item.get("labels_mask")
+        movement_targets = item.get("movement_class_targets")
+        movement_mask = item.get("movement_class_mask")
+        if movement_targets is None or movement_mask is None:
+            if labels is None or labels_mask is None:
+                continue
+            batch_targets = collator_like_targets(labels, labels_mask)
+            targets = batch_targets["movement_class_targets"]
+            mask = batch_targets["movement_class_mask"]
+        else:
+            targets = movement_targets.tolist() if isinstance(movement_targets, torch.Tensor) else movement_targets
+            mask = movement_mask.tolist() if isinstance(movement_mask, torch.Tensor) else movement_mask
+        for target, target_mask in zip(targets, mask):
+            if int(target_mask) > 0:
+                counts[int(target)] += 1.0
+    weights = counts.sum() / counts.clamp_min(1.0)
+    return weights / weights.mean().clamp_min(1e-6)
+def collator_like_targets(labels, labels_mask, movement_label_config: Optional[Dict[str, float]] = None):
+    from data.context_targets import derive_movement_targets
+    labels_list = labels.tolist() if isinstance(labels, torch.Tensor) else labels
+    mask_list = labels_mask.tolist() if isinstance(labels_mask, torch.Tensor) else labels_mask
+    return derive_movement_targets(labels_list, mask_list, movement_label_config=movement_label_config)
 def create_balanced_split(dataset, n_val_per_class: int = 1, seed: int = 42):
     """
     Create train/val split with balanced classes in validation set.
     total_loss = 0.0
     total_return_loss = 0.0
     total_quality_loss = 0.0
+    total_movement_loss = 0.0
+    total_movement_acc = 0.0
     n_batches = 0
     # Per-class metrics
             preds = outputs["quantile_logits"]
             quality_preds = outputs["quality_logits"]
+            movement_logits = outputs.get("movement_logits")
             labels = batch["labels"]
             labels_mask = batch["labels_mask"]
             quality_targets = batch["quality_score"].to(accelerator.device, dtype=quality_preds.dtype)
+            movement_targets = batch.get("movement_class_targets")
+            movement_mask = batch.get("movement_class_mask")
             if labels_mask.sum() == 0:
                 return_loss = torch.tensor(0.0, device=accelerator.device)
                 return_loss = quantile_pinball_loss(preds, labels, labels_mask, quantiles)
             quality_loss = quality_loss_fn(quality_preds, quality_targets)
+            movement_loss = torch.tensor(0.0, device=accelerator.device)
+            movement_acc = 0.0
+            if movement_logits is not None and movement_targets is not None and movement_mask is not None:
+                movement_targets = movement_targets.to(accelerator.device)
+                movement_mask = movement_mask.to(accelerator.device)
+                movement_loss = masked_movement_cross_entropy(
+                    movement_logits, movement_targets, movement_mask
+                )
+                movement_acc = movement_accuracy(movement_logits, movement_targets, movement_mask)
+            loss = return_loss + quality_loss + movement_loss
             total_loss += loss.item()
             total_return_loss += return_loss.item()
             total_quality_loss += quality_loss.item()
+            total_movement_loss += movement_loss.item()
+            total_movement_acc += movement_acc
             n_batches += 1
             # Track per-class losses
         'val/loss': total_loss / n_batches,
         'val/return_loss': total_return_loss / n_batches,
         'val/quality_loss': total_quality_loss / n_batches,
+        'val/movement_loss': total_movement_loss / n_batches,
+        'val/movement_acc': total_movement_acc / n_batches,
         'val/n_batches': n_batches,
         'class_losses': {k: v['loss'] / max(v['count'], 1) for k, v in class_losses.items()}
     }
     parser.add_argument("--resume_from_checkpoint", type=str, default=None, help="Path to checkpoint or 'latest'")
     parser.add_argument("--val_samples_per_class", type=int, default=1, help="Number of validation samples per class (default 1)")
     parser.add_argument("--val_every", type=int, default=1000, help="Run validation every N steps (default 1000)")
+    parser.add_argument("--movement_loss_weight", type=float, default=1.0, help="Auxiliary loss weight for movement classification head.")
     return parser.parse_args()
     # --- 7. Training Loop ---
     quality_loss_fn = nn.MSELoss()
+    movement_class_weights = estimate_movement_class_weights(
+        dataset,
+        train_indices,
+    ).to(accelerator.device)
     logger.info("***** Running training *****")
     logger.info(f"  Num examples = {len(dataset)}")
                 preds = outputs["quantile_logits"]
                 quality_preds = outputs["quality_logits"]
+                movement_logits = outputs.get("movement_logits")
                 labels = batch["labels"]
                 labels_mask = batch["labels_mask"]
                 if "quality_score" not in batch:
                     per_sample_return = quantile_pinball_loss_per_sample(preds, labels, labels_mask, quantiles)
                 quality_loss = quality_loss_fn(quality_preds, quality_targets)
+                movement_targets = batch.get("movement_class_targets")
+                movement_mask = batch.get("movement_class_mask")
+                if movement_logits is not None and movement_targets is not None and movement_mask is not None:
+                    movement_targets = movement_targets.to(accelerator.device)
+                    movement_mask = movement_mask.to(accelerator.device)
+                    movement_loss = masked_movement_cross_entropy(
+                        movement_logits,
+                        movement_targets,
+                        movement_mask,
+                        class_weights=movement_class_weights,
+                    )
+                    current_movement_acc = movement_accuracy(movement_logits, movement_targets, movement_mask)
+                else:
+                    movement_loss = torch.tensor(0.0, device=accelerator.device, dtype=quality_loss.dtype)
+                    current_movement_acc = 0.0
                 per_sample_quality = (quality_preds - quality_targets).pow(2)
                 # Apply per-sample class weighting to return loss
                     sample_weights = class_loss_weights[batch_class_ids]  # [B]
                     # Scale return loss by mean class weight for this batch
                     class_weight_factor = sample_weights.mean()
+                    loss = return_loss * class_weight_factor + quality_loss + (args.movement_loss_weight * movement_loss)
                 else:
                     class_weight_factor = torch.tensor(1.0, device=accelerator.device, dtype=return_loss.dtype)
+                    loss = return_loss + quality_loss + (args.movement_loss_weight * movement_loss)
                 per_sample_total = per_sample_return * class_weight_factor + per_sample_quality
                 if not torch.isfinite(loss).all().item():
                             "loss": loss.unsqueeze(0),
                             "return_loss": return_loss.unsqueeze(0),
                             "quality_loss": quality_loss.unsqueeze(0),
+                            "movement_loss": movement_loss.unsqueeze(0),
                             "preds": preds,
                             "quality_preds": quality_preds,
+                            "movement_logits": movement_logits,
                             "labels_raw": batch.get("labels"),
                             "labels_log": labels,
                             "labels_mask": labels_mask,
                 current_loss = loss.item()
                 current_return_loss = return_loss.item()
                 current_quality_loss = quality_loss.item()
+                current_movement_loss = movement_loss.item()
                 current_class_weight_factor = float(class_weight_factor.item()) if isinstance(class_weight_factor, torch.Tensor) else float(class_weight_factor)
                 current_mask_coverage = float(labels_mask.float().mean().item()) if labels_mask is not None else 0.0
                 epoch_loss += current_loss
                         "train/loss": current_loss,
                         "train/return_loss": current_return_loss,
                         "train/quality_loss": current_quality_loss,
+                        "train/movement_loss": current_movement_loss,
+                        "train/movement_acc": current_movement_acc,
                         "train/class_weight_factor": current_class_weight_factor,
                         "train/mask_coverage": current_mask_coverage,
                         "train/loss_ema": loss_ema if loss_ema is not None else current_loss,
                         logger.info(
                             f"Validation - Loss: {val_metrics['val/loss']:.4f} | "
                             f"Return: {val_metrics['val/return_loss']:.4f} | "
+                            f"Quality: {val_metrics['val/quality_loss']:.4f} | "
+                            f"Movement: {val_metrics['val/movement_loss']:.4f} | "
+                            f"MoveAcc: {val_metrics['val/movement_acc']:.4f}"
                         )
                         # Log per-class losses
                         class_loss_str = " | ".join(
                             "val/loss": val_metrics['val/loss'],
                             "val/return_loss": val_metrics['val/return_loss'],
                             "val/quality_loss": val_metrics['val/quality_loss'],
+                            "val/movement_loss": val_metrics['val/movement_loss'],
+                            "val/movement_acc": val_metrics['val/movement_acc'],
                         }, step=total_steps)
                         # Log per-class losses to tensorboard
                         for class_id, class_loss in val_metrics['class_losses'].items():