| # Apollo: Oracle Model |
|
|
| ## Project Status |
| **Phase:** Hyperparameter Optimization & Dataset Preparation. |
|
|
| ### Recent Updates (Jan 2026) |
| * **Hyperparameter Tuning**: Analyzed token trade distribution to determine optimal model parameters. |
| * **Max Sequence Length**: Set to **8192**. This covers >2 hours of high-frequency trading activity for high-volume tokens (verified against `HWVY...`) and the full lifecycle for 99% of tokens. |
| * **Prediction Horizons**: Set to **60s, 3m, 5m, 10m, 30m, 1h, 2h**. |
| * **Min Horizon (60s)**: Chosen to accommodate ~20s inference latency while capturing the "meat" of aggressive breakout movers. |
| * **Max Horizon (2h)**: Covers the timeframe where 99% of tokens hit their All-Time High. |
| * **Infrastructure**: |
| * Updated `train.sh` to use these new hyperparameters. |
| * Updated `scripts/cache_dataset.py` to ensure cached datasets are labeled with these horizons. |
| * Verified `DataFetcher` retrieves full trade histories (no hidden limits). |
|
|
| ## Configuration Summary |
|
|
| | Parameter | Value | Rationale | |
| | :--- | :--- | :--- | |
| | **Max Seq Len** | `8192` | Captures >2h of intense pump activity or full rug lifecycle. | |
| | **Horizons** | `60, 180, 300, 600, 1800, 3600, 7200` | From "Scalp/Breakout" (1m) to "Runner/ATH" (2h). | |
| | **Inference Latency** | ~20s | Dictates the 60s minimum horizon. | |
|
|
| ## Usage |
|
|
| ### 1. Cache Dataset |
| Pre-process data into `.pt` files with correct labels. |
| ```bash |
| ./pre_cache.sh |
| ``` |
|
|
| ### 2. Train Model |
| Launch training with updated hyperparameters. |
| ```bash |
| ./train.sh |
| ``` |
|
|
| ## TODO: Future Enhancements |
|
|
| ### Multi-Task Quality Prediction Head |
| Add a secondary head (Head B) that predicts **token quality percentiles** alongside price returns: |
| - **Fees Percentile** — Predicted future fees relative to class median |
| - **Volume Percentile** — Predicted future volume relative to class median |
| - **Holders Percentile** — Predicted future holder count relative to class median |
|
|
| **Rationale:** The `analyze_distribution.py` script currently uses hard thresholds on future metrics to classify tokens as "Manipulated". This head would let the model **learn to predict** those quality metrics from current features, enabling scam detection at inference time without access to future data. |
|
|
| **Approach Options:** |
| 1. Single composite quality score (simpler) |
| 2. Three separate percentile predictions (more interpretable) |
| 3. Three binary classifications (fees_ok, volume_ok, holders_ok) |
| |
| Data Sampling (Context Optimization) |
| Replace hardcoded H/B/H limits with a dynamic sampling strategy that maximizes the model's context window usage. |
| |
| The Problem |
| Currently, the system triggers H/B/H logic based on a fixed 30k trade count and uses hardcoded limits (10k early, 15k recent). This mismatch with the model's max_seq_len (e.g., 8192) leads to inefficient data usage—either truncating valuable data arbitrarily or feeding too little when more could fit. |
| |
| The Solution: Dynamic Context Filling |
| Implementation moves to |
| data_loader.py |
| (since cache contains full history). |
|
|
| Algorithm |
| Input: Full sorted list of events (Trades, Chart Segments, etc.) up to T_cutoff. |
| Check: if |
| len(events) <= max_seq_len |
| , use ALL events. |
| Split: If |
| len(events) > max_seq_len |
| : |
| Reserve space for special tokens (start/end/pad). |
| Calculate Budget: budget = max_seq_len - reserve (e.g., 8100). |
| Dynamic Split: |
| Head (Early): First budget / 2 events. |
| Tail (Recent): Last budget / 2 events. |
| Construct: [HEAD] ... [GAP_TOKEN] ... [TAIL]. |
| Implementation Changes |
| [MODIFY] |
| data_loader.py |
| Remove Constants: Delete HBH_EARLY_EVENT_LIMIT, HBH_RECENT_EVENT_LIMIT. |
| Update |
| _generate_dataset_item |
| : |
| Accept max_seq_len. |
| Implement the split logic defined above before returning event_sequence. |
| |
| |
| |
| |
| Here explained easly: |
| |
| We check all the final events if exeed the total context we have. |
| Then we filter out all the trade events and then check how many non aggregable events we have, for example a burn or a deployer trade etc... |
| Then we take the remaining from context exldued thosoe IMPORTANT events like i show above and we check how many snapshot will fit chart segment, holders snapshot, chain stats etc... |
| Then the remaining after snapshot and important non aggregable events we use them to make the H segments (high definition) and in the middle (Blurry) we keep just the snapshots. |
| |
| This works because 90% of context is taken just by trades and transfers so they are the only thing to compress to help context |
| |
| you dont need new tokens becuase there are already special tokens for it: |
| 'MIDDLE', |
| 'RECENT' |
| |
| so when you switch to blurry <MIDDLE> and when you go back to high definition you use <RECENT> |