Upload folder using huggingface_hub
Browse files- QUALITY_SCORE_ARCHITECTURE.md +9 -161
- access/masking_policies.list +0 -0
- access/quotas.list +0 -0
- access/roles.list +0 -0
- access/row_policies.list +0 -0
- access/settings_profiles.list +0 -0
- access/users.list +0 -0
- data/data_loader.py +10 -7
- ingest.sh +4 -4
- log.log +2 -2
- models/multi_modal_processor.py +2 -1
- scripts/evaluate_sample.py +90 -75
- status +3 -3
- wget-log.1 +11 -2
QUALITY_SCORE_ARCHITECTURE.md
CHANGED
|
@@ -1,165 +1,13 @@
|
|
| 1 |
-
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
-
- This is **not model confidence**.
|
| 11 |
-
- `q` is computed **offline** using a token's **full lifetime** (for labels / training targets).
|
| 12 |
-
- At **inference**, the model predicts `q` from **partial observations**.
|
| 13 |
-
- We avoid hard thresholds and raw-scale features (USD, SOL, counts) by using **within-regime distributions**.
|
| 14 |
-
|
| 15 |
-
## 2) Core idea (distribution-first, not rules-first)
|
| 16 |
-
|
| 17 |
-
Raw totals (fees, volume, holders) are mostly **scale** and are extremely heavy-tailed. Using them directly:
|
| 18 |
-
- makes the signal unstable across regimes,
|
| 19 |
-
- makes it sensitive to market-wide shifts,
|
| 20 |
-
- and invites hand-tuned weights ("human bias").
|
| 21 |
-
|
| 22 |
-
Instead we map each metric to a **percentile** within a comparable peer group, then aggregate.
|
| 23 |
-
|
| 24 |
-
## 3) Return bucketing (why it is required)
|
| 25 |
-
|
| 26 |
-
The dataset is highly imbalanced: most tokens die early (<2-3x), while a tiny tail produces 10x-1000x outcomes.
|
| 27 |
-
|
| 28 |
-
If you compute percentiles globally:
|
| 29 |
-
- 100x tokens will always dominate "good" percentiles for scale metrics,
|
| 30 |
-
- and "quality" will collapse into "return magnitude".
|
| 31 |
-
|
| 32 |
-
So we compute distributions **within return regimes**.
|
| 33 |
-
|
| 34 |
-
### 3.1 Bucket definition (example)
|
| 35 |
-
|
| 36 |
-
Let `R_max` be the token's lifetime max return multiple (e.g., ATH / launch).
|
| 37 |
-
|
| 38 |
-
Use coarse buckets for the bulk and finer buckets for the tail, e.g.:
|
| 39 |
-
- B0: `R_max < 3`
|
| 40 |
-
- B1: `3 <= R_max < 5`
|
| 41 |
-
- B2: `5 <= R_max < 10`
|
| 42 |
-
- B3: `10 <= R_max < 20`
|
| 43 |
-
- B4: `20 <= R_max < 100`
|
| 44 |
-
- B5: `100 <= R_max < 10_000`
|
| 45 |
-
|
| 46 |
-
Notes:
|
| 47 |
-
- If a bucket has too few samples, merge with a neighbor.
|
| 48 |
-
- For the extreme tail you can also replace fixed buckets with **quantile buckets** on `log(R_max)` to keep sample counts stable.
|
| 49 |
-
|
| 50 |
-
Interpretation (important):
|
| 51 |
-
- `q` is **relative within the bucket**.
|
| 52 |
-
- The "best garbage" can have high `q` in B0.
|
| 53 |
-
- A 100x token can have low `q` in B4 if it looks worst vs other 100x+ tokens.
|
| 54 |
-
|
| 55 |
-
This is intentional: return and quality are different axes.
|
| 56 |
-
|
| 57 |
-
## 4) Feature set and sign conventions
|
| 58 |
-
|
| 59 |
-
We want `q` to increase for "healthy/organic" structure and decrease for "controlled/manipulated" structure.
|
| 60 |
-
|
| 61 |
-
All features below are evaluated **within the token's return bucket**.
|
| 62 |
-
|
| 63 |
-
### 4.1 Scale / activity (high is usually better within-bucket)
|
| 64 |
-
|
| 65 |
-
Use log transforms for stability before percentiles:
|
| 66 |
-
- `log1p(total_volume_usd)`
|
| 67 |
-
- `log1p(total_fees_sol)`
|
| 68 |
-
- `log1p(unique_holders)`
|
| 69 |
-
- `log1p(time_to_ath_sec)` (optional; see note below)
|
| 70 |
-
|
| 71 |
-
Ratio features (less pure scale):
|
| 72 |
-
- `fees_per_volume = total_fees_sol / (total_volume_usd + eps)`
|
| 73 |
-
- `fees_per_trade = total_fees_sol / (n_trades + eps)` (if `n_trades` exists)
|
| 74 |
-
- `holders_per_trade = unique_holders / (n_trades + eps)` (if `n_trades` exists)
|
| 75 |
-
- `holders_per_volume = unique_holders / (total_volume_usd + eps)`
|
| 76 |
-
|
| 77 |
-
Rationale:
|
| 78 |
-
- Fees and fee-per-* help separate "real urgency / competition" from "cheap wash".
|
| 79 |
-
- Holders and holders-per-* help separate broad participation from concentrated looping.
|
| 80 |
-
|
| 81 |
-
### 4.2 Manipulation / control (high is worse; flip sign)
|
| 82 |
-
|
| 83 |
-
These are typically "the higher, the less healthy":
|
| 84 |
-
- `snipers_pct_supply_top70`
|
| 85 |
-
- `bundled_pct_supply`
|
| 86 |
-
- `dev_hold_pct_supply`
|
| 87 |
-
- `insiders_pct_supply`
|
| 88 |
-
|
| 89 |
-
We treat exceptions as rare; the model can learn edge cases from context, but the label should reflect the dominant interpretation.
|
| 90 |
-
|
| 91 |
-
### 4.3 Time-to-ATH note
|
| 92 |
-
|
| 93 |
-
`time_to_ath_sec` can behave differently across return buckets.
|
| 94 |
-
- In high-return buckets, very short times can look like a single spike / control.
|
| 95 |
-
- In low-return buckets, many tokens have near-zero times because they never move.
|
| 96 |
-
|
| 97 |
-
Include it only if it improves downstream behavior; keep it **bucket-relative** either way.
|
| 98 |
-
|
| 99 |
-
## 5) Turning raw metrics into a signed scalar
|
| 100 |
-
|
| 101 |
-
We want a single `q` in `[-1, +1]` with direction:
|
| 102 |
-
- `+1` = looks healthiest vs peers in the same return bucket
|
| 103 |
-
- `-1` = looks most unhealthy vs peers in the same return bucket
|
| 104 |
-
|
| 105 |
-
### 5.1 Within-bucket percentile (ECDF)
|
| 106 |
-
|
| 107 |
-
For each feature value `x_i`:
|
| 108 |
-
- compute percentile `p_i = ECDF_b(x_i)` using only tokens in bucket `b`
|
| 109 |
-
- `p_i` is in `[0, 1]`
|
| 110 |
-
|
| 111 |
-
Implementation detail:
|
| 112 |
-
- Use a rank-based ECDF with a small offset to avoid exact 0/1 if desired:
|
| 113 |
-
- `p_i = (rank(x_i) - 0.5) / n`
|
| 114 |
-
|
| 115 |
-
### 5.2 Signed percentile
|
| 116 |
-
|
| 117 |
-
Convert to signed value:
|
| 118 |
-
- `s_i = 2 * p_i - 1` (now `s_i` is in `[-1, +1]`)
|
| 119 |
-
|
| 120 |
-
If "high is bad" for that feature, flip it:
|
| 121 |
-
- `s_i := -s_i`
|
| 122 |
-
|
| 123 |
-
This gives direction + magnitude in a single number.
|
| 124 |
-
|
| 125 |
-
### 5.3 Aggregate without hand weights
|
| 126 |
-
|
| 127 |
-
To avoid hand-tuned weights, use a symmetric aggregator:
|
| 128 |
-
- `q_raw = mean_i(s_i)`
|
| 129 |
-
|
| 130 |
-
Optional robustness:
|
| 131 |
-
- clip each `s_i` to `[-0.99, 0.99]` before averaging (limits extreme leverage)
|
| 132 |
-
- use a trimmed mean (drop top/bottom k% of `s_i`) if a single metric can be noisy
|
| 133 |
-
|
| 134 |
-
### 5.4 Optional: re-rank the aggregate (final calibration)
|
| 135 |
-
|
| 136 |
-
If you want the final `q` to be strictly comparable across time / retrains and more uniform within bucket:
|
| 137 |
-
- `q = 2 * ECDF_b(q_raw) - 1`
|
| 138 |
-
|
| 139 |
-
This keeps the "relative within bucket" meaning while stabilizing scale.
|
| 140 |
-
|
| 141 |
-
## 6) Training vs inference (how it is used)
|
| 142 |
-
|
| 143 |
-
Offline labeling (training target):
|
| 144 |
-
1) Compute `R_max` from full lifetime.
|
| 145 |
-
2) Assign return bucket `b`.
|
| 146 |
-
3) Compute all chosen metrics from full lifetime.
|
| 147 |
-
4) Convert metrics -> signed percentiles -> `q`.
|
| 148 |
-
|
| 149 |
-
Inference (model output):
|
| 150 |
-
- The model only sees partial history and must predict the *final* `q` (computed above).
|
| 151 |
-
- The trading policy uses predicted return signals + predicted `q` to decide position sizing / risk.
|
| 152 |
-
|
| 153 |
-
## 7) Practical notes
|
| 154 |
-
|
| 155 |
-
- Use `eps` (e.g., `1e-9`) in denominators to avoid divide-by-zero.
|
| 156 |
-
- If a metric is missing for a token, drop it from the mean for that token (or impute with bucket median).
|
| 157 |
-
- When bucket sample counts drift, prefer merging buckets rather than letting ECDF be noisy.
|
| 158 |
-
- Recompute distributions on the same "source-of-truth" dataset used for training (not ad-hoc caches).
|
| 159 |
-
|
| 160 |
-
## 8) Summary
|
| 161 |
-
|
| 162 |
-
`q` is a **return-regime-relative**, **distribution-normalized**, **signed** health score:
|
| 163 |
-
- It is not a threshold classifier.
|
| 164 |
-
- It avoids raw-scale dependence and hand weighting.
|
| 165 |
-
- It cleanly separates "made money" (return) from "looks healthy" (quality).
|
|
|
|
| 1 |
+
OK I think I see the real issue now.
|
| 2 |
|
| 3 |
+
The weighted sampling balances which tokens the model sees equally. But the labels (the actual return values) are determined by the random T_cutoff within each token, not the token's class.
|
| 4 |
|
| 5 |
+
Even a class 5 token (100x return) only pumps in a tiny window of its lifetime. If you have 1000 trades and the pump happens between trade 200-400, then:
|
| 6 |
|
| 7 |
+
T_cutoff at trade 50 → returns might be +500% ✅
|
| 8 |
+
T_cutoff at trade 500 → returns are -80% (post-pump bleed)
|
| 9 |
+
T_cutoff at trade 700 → returns are -90%
|
| 10 |
+
T_cutoff at trade 900 → returns are -95%
|
| 11 |
+
So even for class 5 tokens, 80%+ of the cached training samples have negative Ground Truth labels. The model is correctly learning that at any random moment, even a "good" token is most likely going down. The class balancing doesn't change the fact that the actual Y labels are overwhelmingly negative across all classes.
|
| 12 |
|
| 13 |
+
The model isn't broken — it learned exactly what the data showed it. The issue is that the training setup doesn't teach it to recognize the pre-pump moment specifically.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
access/masking_policies.list
ADDED
|
Binary file (1 Bytes). View file
|
|
|
access/quotas.list
ADDED
|
Binary file (1 Bytes). View file
|
|
|
access/roles.list
ADDED
|
Binary file (1 Bytes). View file
|
|
|
access/row_policies.list
ADDED
|
Binary file (1 Bytes). View file
|
|
|
access/settings_profiles.list
ADDED
|
Binary file (1 Bytes). View file
|
|
|
access/users.list
ADDED
|
Binary file (1 Bytes). View file
|
|
|
data/data_loader.py
CHANGED
|
@@ -124,7 +124,6 @@ class OracleDataset(Dataset):
|
|
| 124 |
max_samples: Optional[int] = None,
|
| 125 |
|
| 126 |
token_allowlist: Optional[List[str]] = None,
|
| 127 |
-
t_cutoff_seconds: int = 60,
|
| 128 |
cache_dir: Optional[Union[str, Path]] = None,
|
| 129 |
start_date: Optional[datetime.datetime] = None,
|
| 130 |
min_trade_usd: float = 0.0,
|
|
@@ -161,10 +160,7 @@ class OracleDataset(Dataset):
|
|
| 161 |
|
| 162 |
|
| 163 |
|
| 164 |
-
|
| 165 |
-
# Otherwise, we are likely in a test mode where __len__ might not be called
|
| 166 |
-
# or is used with a mock length.
|
| 167 |
-
self.t_cutoff_seconds = max(0, int(t_cutoff_seconds or 0))
|
| 168 |
self.token_allowlist = set(token_allowlist) if token_allowlist else None
|
| 169 |
|
| 170 |
if self.cache_dir:
|
|
@@ -2606,7 +2602,7 @@ class OracleDataset(Dataset):
|
|
| 2606 |
pooler.pool_map[key] = {'item': emb.cpu().clone(), 'idx': old_entry['idx']}
|
| 2607 |
|
| 2608 |
|
| 2609 |
-
def __cacheitem_context__(self, idx: int, num_samples_per_token: int = 1, encoder: Optional[Any] = None) -> List[Optional[Dict[str, Any]]]:
|
| 2610 |
"""
|
| 2611 |
Generates fully processed training contexts for caching.
|
| 2612 |
|
|
@@ -2957,7 +2953,14 @@ class OracleDataset(Dataset):
|
|
| 2957 |
results = []
|
| 2958 |
|
| 2959 |
# Sample indices (with replacement if needed)
|
| 2960 |
-
if
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2961 |
sampled_indices = eligible_indices.copy()
|
| 2962 |
else:
|
| 2963 |
sampled_indices = random.sample(eligible_indices, num_samples_per_token)
|
|
|
|
| 124 |
max_samples: Optional[int] = None,
|
| 125 |
|
| 126 |
token_allowlist: Optional[List[str]] = None,
|
|
|
|
| 127 |
cache_dir: Optional[Union[str, Path]] = None,
|
| 128 |
start_date: Optional[datetime.datetime] = None,
|
| 129 |
min_trade_usd: float = 0.0,
|
|
|
|
| 160 |
|
| 161 |
|
| 162 |
|
| 163 |
+
|
|
|
|
|
|
|
|
|
|
| 164 |
self.token_allowlist = set(token_allowlist) if token_allowlist else None
|
| 165 |
|
| 166 |
if self.cache_dir:
|
|
|
|
| 2602 |
pooler.pool_map[key] = {'item': emb.cpu().clone(), 'idx': old_entry['idx']}
|
| 2603 |
|
| 2604 |
|
| 2605 |
+
def __cacheitem_context__(self, idx: int, num_samples_per_token: int = 1, encoder: Optional[Any] = None, forced_cutoff_trade_idx: Optional[int] = None) -> List[Optional[Dict[str, Any]]]:
|
| 2606 |
"""
|
| 2607 |
Generates fully processed training contexts for caching.
|
| 2608 |
|
|
|
|
| 2953 |
results = []
|
| 2954 |
|
| 2955 |
# Sample indices (with replacement if needed)
|
| 2956 |
+
if forced_cutoff_trade_idx is not None:
|
| 2957 |
+
# Forced mode: use the exact trade index provided (for evaluation)
|
| 2958 |
+
if forced_cutoff_trade_idx >= len(all_trades_sorted):
|
| 2959 |
+
print(f" WARN: forced_cutoff_trade_idx={forced_cutoff_trade_idx} >= total trades {len(all_trades_sorted)}, clamping.")
|
| 2960 |
+
forced_cutoff_trade_idx = len(all_trades_sorted) - 2
|
| 2961 |
+
sampled_indices = [forced_cutoff_trade_idx]
|
| 2962 |
+
print(f" Using forced T_cutoff at trade index {forced_cutoff_trade_idx}")
|
| 2963 |
+
elif num_samples_per_token >= len(eligible_indices):
|
| 2964 |
sampled_indices = eligible_indices.copy()
|
| 2965 |
else:
|
| 2966 |
sampled_indices = random.sample(eligible_indices, num_samples_per_token)
|
ingest.sh
CHANGED
|
@@ -20,7 +20,7 @@ error() { echo -e "${RED}[ERROR]${NC} $1"; exit 1; }
|
|
| 20 |
#===============================================================================
|
| 21 |
header "Step 5-6/7: Processing Epochs (Download → Ingest → Delete)"
|
| 22 |
|
| 23 |
-
EPOCHS=(
|
| 24 |
|
| 25 |
|
| 26 |
log "Processing epochs one at a time to minimize disk usage..."
|
|
@@ -39,9 +39,9 @@ for epoch in "${EPOCHS[@]}"; do
|
|
| 39 |
error "Failed to download epoch ${epoch}. Cannot continue."
|
| 40 |
}
|
| 41 |
|
| 42 |
-
# Step 2: Ingest (
|
| 43 |
-
log " [2/3] Ingesting epoch ${epoch} into
|
| 44 |
-
python scripts/ingest_epoch.py --epoch "$epoch" --
|
| 45 |
error "Ingestion failed for epoch ${epoch}. Cannot continue."
|
| 46 |
}
|
| 47 |
|
|
|
|
| 20 |
#===============================================================================
|
| 21 |
header "Step 5-6/7: Processing Epochs (Download → Ingest → Delete)"
|
| 22 |
|
| 23 |
+
EPOCHS=(852 853)
|
| 24 |
|
| 25 |
|
| 26 |
log "Processing epochs one at a time to minimize disk usage..."
|
|
|
|
| 39 |
error "Failed to download epoch ${epoch}. Cannot continue."
|
| 40 |
}
|
| 41 |
|
| 42 |
+
# Step 2: Ingest (ClickHouse only)
|
| 43 |
+
log " [2/3] Ingesting epoch ${epoch} into ClickHouse database..."
|
| 44 |
+
python scripts/ingest_epoch.py --epoch "$epoch" --skip-neo4j || {
|
| 45 |
error "Ingestion failed for epoch ${epoch}. Cannot continue."
|
| 46 |
}
|
| 47 |
|
log.log
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:11ac013f8e91ad65475b8106a5a072dc42f67e0773ddc4a50825e316c578e0d4
|
| 3 |
+
size 3472
|
models/multi_modal_processor.py
CHANGED
|
@@ -81,8 +81,9 @@ class MultiModalEncoder:
|
|
| 81 |
is_text = isinstance(x[0], str)
|
| 82 |
|
| 83 |
autocast_dtype = self.dtype if self.dtype in [torch.float16, torch.bfloat16] else None
|
|
|
|
| 84 |
|
| 85 |
-
with torch.autocast(device_type=
|
| 86 |
try:
|
| 87 |
if is_text:
|
| 88 |
inputs = self.processor(text=x, return_tensors="pt", padding=True, truncation=True).to(self.device)
|
|
|
|
| 81 |
is_text = isinstance(x[0], str)
|
| 82 |
|
| 83 |
autocast_dtype = self.dtype if self.dtype in [torch.float16, torch.bfloat16] else None
|
| 84 |
+
device_str = self.device.type if isinstance(self.device, torch.device) else self.device
|
| 85 |
|
| 86 |
+
with torch.autocast(device_type=device_str, dtype=autocast_dtype, enabled=(autocast_dtype is not None)):
|
| 87 |
try:
|
| 88 |
if is_text:
|
| 89 |
inputs = self.processor(text=x, return_tensors="pt", padding=True, truncation=True).to(self.device)
|
scripts/evaluate_sample.py
CHANGED
|
@@ -23,6 +23,11 @@ from models.ohlc_embedder import OHLCEmbedder
|
|
| 23 |
from models.model import Oracle
|
| 24 |
import models.vocabulary as vocab
|
| 25 |
from train import create_balanced_split
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
def unlog_transform(tensor):
|
| 28 |
"""Invert the log1p transform applied during training."""
|
|
@@ -32,13 +37,13 @@ def unlog_transform(tensor):
|
|
| 32 |
def parse_args():
|
| 33 |
parser = argparse.ArgumentParser()
|
| 34 |
parser.add_argument("--checkpoint", type=str, default="checkpoints/checkpoint-90000", help="Path to checkpoint dir")
|
| 35 |
-
parser.add_argument("--
|
| 36 |
-
parser.add_argument("--sample_idx", type=int, default=None, help="Specific sample index to evaluate")
|
| 37 |
parser.add_argument("--mixed_precision", type=str, default="bf16")
|
| 38 |
parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[300, 900, 1800, 3600, 7200])
|
| 39 |
parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
|
| 40 |
-
parser.add_argument("--seed", type=int, default=
|
| 41 |
-
parser.add_argument("--
|
|
|
|
| 42 |
return parser.parse_args()
|
| 43 |
|
| 44 |
def get_latest_checkpoint(checkpoint_dir):
|
|
@@ -52,6 +57,7 @@ def get_latest_checkpoint(checkpoint_dir):
|
|
| 52 |
return None
|
| 53 |
|
| 54 |
def main():
|
|
|
|
| 55 |
args = parse_args()
|
| 56 |
|
| 57 |
accelerator = Accelerator(mixed_precision=args.mixed_precision)
|
|
@@ -63,76 +69,49 @@ def main():
|
|
| 63 |
elif accelerator.mixed_precision == 'fp16':
|
| 64 |
init_dtype = torch.float16
|
| 65 |
|
| 66 |
-
print(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
dataset = OracleDataset(
|
| 68 |
-
data_fetcher=
|
| 69 |
fetcher_config=None,
|
| 70 |
horizons_seconds=args.horizons_seconds,
|
| 71 |
quantiles=args.quantiles,
|
| 72 |
-
|
| 73 |
-
t_cutoff_seconds=60,
|
| 74 |
-
cache_dir=args.cache_dir
|
| 75 |
)
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
valid_indices = []
|
| 95 |
-
# We search through a shuffled subset to avoid checking the whole dataset
|
| 96 |
-
search_pool = val_indices.copy()
|
| 97 |
-
random.shuffle(search_pool)
|
| 98 |
-
if not search_pool:
|
| 99 |
-
search_pool = list(range(len(dataset)))
|
| 100 |
-
random.shuffle(search_pool)
|
| 101 |
-
|
| 102 |
-
for idx in search_pool:
|
| 103 |
-
sample = dataset[idx]
|
| 104 |
-
if sample is None:
|
| 105 |
-
continue
|
| 106 |
-
mask = sample.get('labels_mask')
|
| 107 |
-
if mask is not None:
|
| 108 |
-
# Based on raw file inspection, mask is shape [H], so we index by h_idx directly
|
| 109 |
-
if h_idx < len(mask) and mask[h_idx] > 0.0:
|
| 110 |
-
valid_indices.append(idx)
|
| 111 |
-
# Once we find a handful of valid ones, we can stop searching
|
| 112 |
-
if len(valid_indices) >= 10:
|
| 113 |
-
break
|
| 114 |
-
|
| 115 |
-
if valid_indices:
|
| 116 |
-
print(f"Found {len(valid_indices)} candidate samples with >= {args.min_horizon}s horizon.")
|
| 117 |
-
val_indices = valid_indices
|
| 118 |
-
else:
|
| 119 |
-
print(f"WARNING: No samples found with ground truth for horizon {args.min_horizon}s. Reverting to random pick.")
|
| 120 |
|
| 121 |
-
if
|
| 122 |
-
|
| 123 |
-
raise ValueError(f"Sample index {args.sample_idx} out of range [0, {len(dataset)-1}]")
|
| 124 |
-
sample_idx = args.sample_idx
|
| 125 |
-
else:
|
| 126 |
-
# Pick a random sample from validation set
|
| 127 |
-
if len(val_indices) > 0:
|
| 128 |
-
sample_idx = random.choice(val_indices)
|
| 129 |
-
else:
|
| 130 |
-
print("No validation indices found. Picking random sample from entire set.")
|
| 131 |
-
sample_idx = random.randint(0, len(dataset) - 1)
|
| 132 |
-
|
| 133 |
-
print(f"\nEvaluating on sample index: {sample_idx}")
|
| 134 |
|
| 135 |
-
# Initialize encoders and model
|
| 136 |
print("Initializing encoders...")
|
| 137 |
multi_modal_encoder = MultiModalEncoder(dtype=init_dtype, device=device)
|
| 138 |
time_encoder = ContextualTimeEncoder(dtype=init_dtype)
|
|
@@ -181,9 +160,6 @@ def main():
|
|
| 181 |
model = accelerator.prepare(model)
|
| 182 |
else:
|
| 183 |
print(f"Loading checkpoint from {ckpt_path}...")
|
| 184 |
-
# Since we use accelerate, the state dict is usually split or in pytorch_model.bin/model.safetensors
|
| 185 |
-
# Using accelerate to load:
|
| 186 |
-
# We need to wrap it if we want to use `accelerator.load_state`
|
| 187 |
model = accelerator.prepare(model)
|
| 188 |
try:
|
| 189 |
accelerator.load_state(ckpt_path)
|
|
@@ -202,7 +178,6 @@ def main():
|
|
| 202 |
else:
|
| 203 |
state_dict = torch.load(model_file, map_location="cpu")
|
| 204 |
|
| 205 |
-
# Unwrap model to load state
|
| 206 |
uw_model = accelerator.unwrap_model(model)
|
| 207 |
uw_model.load_state_dict(state_dict, strict=False)
|
| 208 |
print("Successfully loaded weights directly.")
|
|
@@ -211,11 +186,51 @@ def main():
|
|
| 211 |
|
| 212 |
model.eval()
|
| 213 |
|
| 214 |
-
#
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 219 |
|
| 220 |
batch = collator([raw_sample])
|
| 221 |
|
|
|
|
| 23 |
from models.model import Oracle
|
| 24 |
import models.vocabulary as vocab
|
| 25 |
from train import create_balanced_split
|
| 26 |
+
from dotenv import load_dotenv
|
| 27 |
+
from clickhouse_driver import Client as ClickHouseClient
|
| 28 |
+
from neo4j import GraphDatabase
|
| 29 |
+
from data.data_fetcher import DataFetcher
|
| 30 |
+
from scripts.analyze_distribution import get_return_class_map
|
| 31 |
|
| 32 |
def unlog_transform(tensor):
|
| 33 |
"""Invert the log1p transform applied during training."""
|
|
|
|
| 37 |
def parse_args():
|
| 38 |
parser = argparse.ArgumentParser()
|
| 39 |
parser.add_argument("--checkpoint", type=str, default="checkpoints/checkpoint-90000", help="Path to checkpoint dir")
|
| 40 |
+
parser.add_argument("--sample_idx", type=str, default=None, help="Specific sample index or Mint Address to evaluate")
|
|
|
|
| 41 |
parser.add_argument("--mixed_precision", type=str, default="bf16")
|
| 42 |
parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[300, 900, 1800, 3600, 7200])
|
| 43 |
parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
|
| 44 |
+
parser.add_argument("--seed", type=int, default=None)
|
| 45 |
+
parser.add_argument("--min_class", type=int, default=5, help="Filter out tokens with return class beneath this ID (e.g., 1 for >= 3x returns)")
|
| 46 |
+
parser.add_argument("--cutoff_trade_idx", type=int, default=600, help="Force the T_cutoff at this exact trade index (e.g., 10 = right after the 10th trade)")
|
| 47 |
return parser.parse_args()
|
| 48 |
|
| 49 |
def get_latest_checkpoint(checkpoint_dir):
|
|
|
|
| 57 |
return None
|
| 58 |
|
| 59 |
def main():
|
| 60 |
+
load_dotenv()
|
| 61 |
args = parse_args()
|
| 62 |
|
| 63 |
accelerator = Accelerator(mixed_precision=args.mixed_precision)
|
|
|
|
| 69 |
elif accelerator.mixed_precision == 'fp16':
|
| 70 |
init_dtype = torch.float16
|
| 71 |
|
| 72 |
+
print("INFO: Initializing DB Connections for LIVE evaluation...")
|
| 73 |
+
clickhouse_host = os.getenv("CLICKHOUSE_HOST", "localhost")
|
| 74 |
+
clickhouse_port = int(os.getenv("CLICKHOUSE_PORT", 9000))
|
| 75 |
+
neo4j_uri = os.getenv("NEO4J_URI", "bolt://localhost:7687")
|
| 76 |
+
neo4j_user = os.getenv("NEO4J_USER", "neo4j")
|
| 77 |
+
neo4j_password = os.getenv("NEO4J_PASSWORD", "password")
|
| 78 |
+
|
| 79 |
+
clickhouse_client = ClickHouseClient(host=clickhouse_host, port=clickhouse_port)
|
| 80 |
+
neo4j_driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
|
| 81 |
+
data_fetcher = DataFetcher(clickhouse_client=clickhouse_client, neo4j_driver=neo4j_driver)
|
| 82 |
+
|
| 83 |
+
print(f"Loading live dataset generator...")
|
| 84 |
+
|
| 85 |
+
# We inject the data fetcher directly. No cache directories are used.
|
| 86 |
dataset = OracleDataset(
|
| 87 |
+
data_fetcher=data_fetcher,
|
| 88 |
fetcher_config=None,
|
| 89 |
horizons_seconds=args.horizons_seconds,
|
| 90 |
quantiles=args.quantiles,
|
| 91 |
+
cache_dir=None
|
|
|
|
|
|
|
| 92 |
)
|
| 93 |
|
| 94 |
+
# Filter out manipulated/broken tokens and optionally enforce min_class
|
| 95 |
+
from models.vocabulary import MANIPULATED_CLASS_ID
|
| 96 |
+
print("INFO: Fetching Return Classification Map...")
|
| 97 |
+
return_class_map, _ = get_return_class_map(clickhouse_client)
|
| 98 |
+
|
| 99 |
+
min_class_thresh = args.min_class if args.min_class is not None else 0
|
| 100 |
+
|
| 101 |
+
original_len = len(dataset.sampled_mints)
|
| 102 |
+
dataset.sampled_mints = [
|
| 103 |
+
m for m in dataset.sampled_mints
|
| 104 |
+
if return_class_map.get(m['mint_address']) is not None
|
| 105 |
+
and return_class_map.get(m['mint_address']) != MANIPULATED_CLASS_ID
|
| 106 |
+
and return_class_map.get(m['mint_address']) >= min_class_thresh
|
| 107 |
+
]
|
| 108 |
+
dataset.num_samples = len(dataset.sampled_mints)
|
| 109 |
+
print(f"INFO: Filtered tokens. {original_len} -> {len(dataset.sampled_mints)} valid tokens (class >= {min_class_thresh}).")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
+
if len(dataset) == 0:
|
| 112 |
+
raise ValueError("Dataset is empty. Are ClickHouse data and trade pipelines populated? (Check if min_return filtered everything out)")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
+
# Initialize encoders and model FIRST because we need multi_modal_encoder to compile context
|
| 115 |
print("Initializing encoders...")
|
| 116 |
multi_modal_encoder = MultiModalEncoder(dtype=init_dtype, device=device)
|
| 117 |
time_encoder = ContextualTimeEncoder(dtype=init_dtype)
|
|
|
|
| 160 |
model = accelerator.prepare(model)
|
| 161 |
else:
|
| 162 |
print(f"Loading checkpoint from {ckpt_path}...")
|
|
|
|
|
|
|
|
|
|
| 163 |
model = accelerator.prepare(model)
|
| 164 |
try:
|
| 165 |
accelerator.load_state(ckpt_path)
|
|
|
|
| 178 |
else:
|
| 179 |
state_dict = torch.load(model_file, map_location="cpu")
|
| 180 |
|
|
|
|
| 181 |
uw_model = accelerator.unwrap_model(model)
|
| 182 |
uw_model.load_state_dict(state_dict, strict=False)
|
| 183 |
print("Successfully loaded weights directly.")
|
|
|
|
| 186 |
|
| 187 |
model.eval()
|
| 188 |
|
| 189 |
+
# Find a valid sample
|
| 190 |
+
valid_context_found = False
|
| 191 |
+
max_retries = 20
|
| 192 |
+
retries = 0
|
| 193 |
+
raw_sample = None
|
| 194 |
+
sample_mint_addr = None
|
| 195 |
+
|
| 196 |
+
while not valid_context_found and retries < max_retries:
|
| 197 |
+
if args.sample_idx is not None:
|
| 198 |
+
if isinstance(args.sample_idx, str) and not args.sample_idx.isdigit():
|
| 199 |
+
found_idx = next((i for i, m in enumerate(dataset.sampled_mints) if m['mint_address'] == args.sample_idx), None)
|
| 200 |
+
if found_idx is None:
|
| 201 |
+
import datetime
|
| 202 |
+
dataset.sampled_mints.append({'mint_address': args.sample_idx, 'creator_address': '', 'timestamp': datetime.datetime.now(datetime.timezone.utc)})
|
| 203 |
+
sample_idx = len(dataset.sampled_mints) - 1
|
| 204 |
+
else:
|
| 205 |
+
sample_idx = found_idx
|
| 206 |
+
else:
|
| 207 |
+
sample_idx = int(args.sample_idx)
|
| 208 |
+
if sample_idx >= len(dataset):
|
| 209 |
+
raise ValueError(f"Sample index {sample_idx} out of range")
|
| 210 |
+
else:
|
| 211 |
+
sample_idx = random.randint(0, len(dataset.sampled_mints) - 1)
|
| 212 |
+
|
| 213 |
+
sample_mint_addr = dataset.sampled_mints[sample_idx]['mint_address']
|
| 214 |
+
print(f"Trying Token Address: {sample_mint_addr}")
|
| 215 |
+
|
| 216 |
+
contexts = dataset.__cacheitem_context__(sample_idx, num_samples_per_token=1, encoder=multi_modal_encoder, forced_cutoff_trade_idx=args.cutoff_trade_idx)
|
| 217 |
+
|
| 218 |
+
if not contexts or len(contexts) == 0 or contexts[0] is None:
|
| 219 |
+
print(" [Failed to generate valid context pattern, skipping...]")
|
| 220 |
+
retries += 1
|
| 221 |
+
if args.sample_idx is not None:
|
| 222 |
+
print("Specific sample requested but failed to generate context. Exiting.")
|
| 223 |
+
return
|
| 224 |
+
continue
|
| 225 |
+
|
| 226 |
+
raw_sample = contexts[0]
|
| 227 |
+
valid_context_found = True
|
| 228 |
+
|
| 229 |
+
if not valid_context_found:
|
| 230 |
+
print(f"Could not find a valid context after {max_retries} attempts.")
|
| 231 |
+
return
|
| 232 |
+
|
| 233 |
+
print(f"\nEvaluating precisely on Token Address: {sample_mint_addr}")
|
| 234 |
|
| 235 |
batch = collator([raw_sample])
|
| 236 |
|
status
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
-
PID:
|
| 2 |
-
Started at: 2026-
|
| 3 |
-
Revision:
|
|
|
|
| 1 |
+
PID: 5825
|
| 2 |
+
Started at: 2026-03-06 06:22:33
|
| 3 |
+
Revision: 54510
|
wget-log.1
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0 |
-
|
| 1 |
-
|
|
|
|
|
|
| 1 |
+
--2026-03-06 06:21:59-- https://debian.neo4j.com/neotechnology.gpg.key
|
| 2 |
+
Resolving debian.neo4j.com (debian.neo4j.com)... 3.169.221.47, 3.169.221.34, 3.169.221.16, ...
|
| 3 |
+
Connecting to debian.neo4j.com (debian.neo4j.com)|3.169.221.47|:443... connected.
|
| 4 |
+
HTTP request sent, awaiting response... 200 OK
|
| 5 |
+
Length: 3905 (3.8K) [application/pgp-keys]
|
| 6 |
+
Saving to: ‘STDOUT’
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
2026-03-06 06:21:59 (9.00 MB/s) - written to stdout [3905/3905]
|
| 11 |
+
|