Spaces:
Paused
Paused
File size: 2,947 Bytes
1794757 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | # Trenches — TODO
## Reward System
- [ ] **Event-prediction RL rewards** — when a real-world event occurs and an agent's prior prediction/action aligns with it, grant a positive reward signal. This closes the loop between live data ingestion and agent learning.
- Track agent predictions per turn (e.g., "Iran will retaliate within 2 turns")
- Compare predictions against actual events that fire from RSS/OSINT feeds
- Reward = f(prediction accuracy, lead time, specificity)
- Only **real events** (from live feeds or env-generated stochastic events) impact the reward signal
- [ ] **Chat-injected fake events** — allow manual event injection via the chat panel that influences agent behavior but does **not** affect reward calculations.
- Tag chat-injected events with `source: "manual"` vs real events with `source: "live"` or `source: "env"`
- Agents still react to fake events (observe and act), but the reward function filters them out
- Useful for demos, testing edge cases, and probing agent behavior without polluting the training signal
## UI / Frontend
- [ ] **Event timeline with time control** — scrubber bar (like a video editor) for navigating, rewinding, and branching the simulation
- **Scrubber bar** at the bottom: drag to jump to any turn/timestamp, play/pause, rewind, fast-forward
- Two event types on the timeline: **predictions** (agent forecasts) and **actuals** (confirmed real events)
- Predictions that matched actual outcomes are visually linked; incorrect ones shown faded
- **Branching**: when a fake scenario is injected via chat, the timeline forks — you can scrub back to before the injection and see the "what if" branch vs the real timeline
- Playback controls: step-by-step (turn by turn), continuous playback at adjustable speed
- Markers on the scrubber for key events (escalations, interventions, injected scenarios)
- Filterable by agent, event type, and time range
- Feeds into the reward system — correct predictions on the timeline = positive RL signal
- [x] Merge tension/stats pills into top bar
- [x] Disable text selection on floating panels
- [x] Remove Mapbox logo
- [x] Clean up README
## Infrastructure
- [x] Push to HF Space (`AlazarM/trenches`)
- [ ] Add `NEXT_PUBLIC_MAPBOX_TOKEN` as HF Space secret
## Post-Training
- [x] 6 synthetic seed replay datasets (in `synthetic_historical_replays/`)
- [x] Training CLI with GRPO, hyperparameter args, checkpointing
- [x] Local smoke test (tiny-gpt2, US + Israel)
- [x] HF GPU smoke test on T4 ([trenches-training-smoke](https://huggingface.co/spaces/AlazarM/trenches-training-smoke))
- [x] All 6 entity models → `Qwen/Qwen3-8B` (no quantization)
- [x] Historical data collection pipeline (GDELT → replay JSON)
- [ ] Run historical collector for all 6 entities
- [ ] Curator-review collected replay data
- [ ] Spin up 6 HF A100 Spaces for production training
- [ ] Evaluation/baseline reporting
|