Spaces:
Sleeping
title: CausalStream
emoji: 🌊
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
CausalStream: High-Fidelity SRE Logic Environment
CausalStream is a specialized Reinforcement Learning (RL) environment designed to benchmark and train agents in temporal causal reasoning and resource allocation under uncertainty. It simulates a complex streaming data infrastructure (Kafka/Flink pattern) where agents must diagnose production incidents.
�️ Core Engine Architecture
1. The Stochastic World Clock (Ticked Execution)
Unlike traditional "static" debugging environments, CausalStream operates on a discrete Tick system.
- Temporal Drift: Every action that consumes compute or network resources (e.g., sampling a raw stream) advances the world clock.
- Event Physics: Events are generated with a stochastic latency model:
arrival_time = event_time + base_latency + jitter. - Incident Injection: Incidents are not just boolean flags; they modify the distribution of the stream. For example, a
LATENCY_SPIKEincident increases thebase_latencyvariance, causing "late arrival" data loss in aggregation windows.
2. Tiered Observation Model
Information is not free. Agents interact with a hierarchical observation space:
- Dashboard (L0): Free to read. Provides aggregated metrics (Revenue, Error Rate). High signal but low granularity.
- Stream Samples (L1): Costs 1 Tick. Provides raw JSON event snippets. Necessary for detecting jitter and schema mismatches.
- Lineage Graph (L2): Costs 1 Tick. Provides the SQL logic and dependency chain of the data pipeline.
3. Counterfactual Query Engine (Causal Discovery)
To prove a hypothesis, agents can execute Counterfactual Actions.
- Action:
ask_counterfactual(window_offset=X) - Mechanism: The engine forks the internal state and simulates what the metrics would have been if a variable (like the late-arrival window) was modified.
- Significance: This forces agents to move beyond pattern matching and perform true scientific experimentation.
⚖️ Deterministic Grading & F1 Evidence Scoring
CausalStream solves the "Subjective Reasoning" problem by requiring agents to submit a structured Theory. The environment enforces strict, deterministic grading based on the F1 score of evidence retrieval.
- The Theory: The agent submits a
RootCauseEnum(e.g.,OUT_OF_ORDER) and aList[Evidence](specific JSON tracking keys or IDs). - The Formula:
Precision = True_Positives / (True_Positives + False_Positives)Recall = True_Positives / (True_Positives + False_Negatives)F1_Score = 2 * (Precision * Recall) / (Precision + Recall)
- Scoring Bounds:
- Incorrect cause:
0.0. - Correct cause: Exactly mathematically matches the F1 bound of
(0.0, 1.0].
- Incorrect cause:
🛡️ Anti-Reward Hacking (Exploit Guardrails)
To prevent agents from "loop farming" API endpoints (Reward Hacking), all tools exist within a Stateful Tracking Set. A specific semantic action type is only rewarded once per episode. If an agent loops sample_stream indefinitely, the reward collapses to +0.00.
🚀 Technical Setup
Local Development
pip install -r requirements.txt
python server.py --port 7860
Docker Deployment
The environment is optimized for 2 vCPU / 8GB RAM constraints and exposes a REST API via FastAPI.
docker build -t causal-stream .
docker run -p 7860:7860 causal-stream
Developed for the Meta PyTorch OpenEnv Hackathon 2026. Focus: Causal Intelligence.