Spaces:
Paused
Paused
| # Trenches β TODO | |
| ## Reward System | |
| - [ ] **Event-prediction RL rewards** β when a real-world event occurs and an agent's prior prediction/action aligns with it, grant a positive reward signal. This closes the loop between live data ingestion and agent learning. | |
| - Track agent predictions per turn (e.g., "Iran will retaliate within 2 turns") | |
| - Compare predictions against actual events that fire from RSS/OSINT feeds | |
| - Reward = f(prediction accuracy, lead time, specificity) | |
| - Only **real events** (from live feeds or env-generated stochastic events) impact the reward signal | |
| - [ ] **Chat-injected fake events** β allow manual event injection via the chat panel that influences agent behavior but does **not** affect reward calculations. | |
| - Tag chat-injected events with `source: "manual"` vs real events with `source: "live"` or `source: "env"` | |
| - Agents still react to fake events (observe and act), but the reward function filters them out | |
| - Useful for demos, testing edge cases, and probing agent behavior without polluting the training signal | |
| ## UI / Frontend | |
| - [ ] **Event timeline with time control** β scrubber bar (like a video editor) for navigating, rewinding, and branching the simulation | |
| - **Scrubber bar** at the bottom: drag to jump to any turn/timestamp, play/pause, rewind, fast-forward | |
| - Two event types on the timeline: **predictions** (agent forecasts) and **actuals** (confirmed real events) | |
| - Predictions that matched actual outcomes are visually linked; incorrect ones shown faded | |
| - **Branching**: when a fake scenario is injected via chat, the timeline forks β you can scrub back to before the injection and see the "what if" branch vs the real timeline | |
| - Playback controls: step-by-step (turn by turn), continuous playback at adjustable speed | |
| - Markers on the scrubber for key events (escalations, interventions, injected scenarios) | |
| - Filterable by agent, event type, and time range | |
| - Feeds into the reward system β correct predictions on the timeline = positive RL signal | |
| - [x] Merge tension/stats pills into top bar | |
| - [x] Disable text selection on floating panels | |
| - [x] Remove Mapbox logo | |
| - [x] Clean up README | |
| ## Infrastructure | |
| - [x] Push to HF Space (`AlazarM/trenches`) | |
| - [ ] Add `NEXT_PUBLIC_MAPBOX_TOKEN` as HF Space secret | |
| ## Post-Training | |
| - [x] 6 synthetic seed replay datasets (in `synthetic_historical_replays/`) | |
| - [x] Training CLI with GRPO, hyperparameter args, checkpointing | |
| - [x] Local smoke test (tiny-gpt2, US + Israel) | |
| - [x] HF GPU smoke test on T4 ([trenches-training-smoke](https://huggingface.co/spaces/AlazarM/trenches-training-smoke)) | |
| - [x] All 6 entity models β `Qwen/Qwen3-8B` (no quantization) | |
| - [x] Historical data collection pipeline (GDELT β replay JSON) | |
| - [ ] Run historical collector for all 6 entities | |
| - [ ] Curator-review collected replay data | |
| - [ ] Spin up 6 HF A100 Spaces for production training | |
| - [ ] Evaluation/baseline reporting | |