trenches

Paused

App Files Files Community

trenches / TODO.md

Codex

sync main snapshot for HF Space

1794757 4 days ago

preview code

raw

history blame contribute delete

2.95 kB

	# Trenches — TODO

	## Reward System

	- [ ] Event-prediction RL rewards — when a real-world event occurs and an agent's prior prediction/action aligns with it, grant a positive reward signal. This closes the loop between live data ingestion and agent learning.
	- Track agent predictions per turn (e.g., "Iran will retaliate within 2 turns")
	- Compare predictions against actual events that fire from RSS/OSINT feeds
	- Reward = f(prediction accuracy, lead time, specificity)
	- Only real events (from live feeds or env-generated stochastic events) impact the reward signal

	- [ ] Chat-injected fake events — allow manual event injection via the chat panel that influences agent behavior but does not affect reward calculations.
	- Tag chat-injected events with `source: "manual"` vs real events with `source: "live"` or `source: "env"`
	- Agents still react to fake events (observe and act), but the reward function filters them out
	- Useful for demos, testing edge cases, and probing agent behavior without polluting the training signal

	## UI / Frontend

	- [ ] Event timeline with time control — scrubber bar (like a video editor) for navigating, rewinding, and branching the simulation
	- Scrubber bar at the bottom: drag to jump to any turn/timestamp, play/pause, rewind, fast-forward
	- Two event types on the timeline: predictions (agent forecasts) and actuals (confirmed real events)
	- Predictions that matched actual outcomes are visually linked; incorrect ones shown faded
	- Branching: when a fake scenario is injected via chat, the timeline forks — you can scrub back to before the injection and see the "what if" branch vs the real timeline
	- Playback controls: step-by-step (turn by turn), continuous playback at adjustable speed
	- Markers on the scrubber for key events (escalations, interventions, injected scenarios)
	- Filterable by agent, event type, and time range
	- Feeds into the reward system — correct predictions on the timeline = positive RL signal

	- [x] Merge tension/stats pills into top bar
	- [x] Disable text selection on floating panels
	- [x] Remove Mapbox logo
	- [x] Clean up README

	## Infrastructure

	- [x] Push to HF Space (`AlazarM/trenches`)
	- [ ] Add `NEXT_PUBLIC_MAPBOX_TOKEN` as HF Space secret

	## Post-Training

	- [x] 6 synthetic seed replay datasets (in `synthetic_historical_replays/`)
	- [x] Training CLI with GRPO, hyperparameter args, checkpointing
	- [x] Local smoke test (tiny-gpt2, US + Israel)
	- [x] HF GPU smoke test on T4 ([trenches-training-smoke](https://huggingface.co/spaces/AlazarM/trenches-training-smoke))
	- [x] All 6 entity models → `Qwen/Qwen3-8B` (no quantization)
	- [x] Historical data collection pipeline (GDELT → replay JSON)
	- [ ] Run historical collector for all 6 entities
	- [ ] Curator-review collected replay data
	- [ ] Spin up 6 HF A100 Spaces for production training
	- [ ] Evaluation/baseline reporting