Spaces:

skyruh
/

causal-stream

Sleeping

App Files Files Community

causal-stream / README.md

skyruh

docs: define explicit F1 evidence scoring math and reward hacking guardrails

eff8e56 2 months ago

preview code

raw

history blame contribute delete

3.6 kB

metadata

title: CausalStream
emoji: 🌊
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false

CausalStream: High-Fidelity SRE Logic Environment

CausalStream is a specialized Reinforcement Learning (RL) environment designed to benchmark and train agents in temporal causal reasoning and resource allocation under uncertainty. It simulates a complex streaming data infrastructure (Kafka/Flink pattern) where agents must diagnose production incidents.

�️ Core Engine Architecture

1. The Stochastic World Clock (Ticked Execution)

Unlike traditional "static" debugging environments, CausalStream operates on a discrete Tick system.

Temporal Drift: Every action that consumes compute or network resources (e.g., sampling a raw stream) advances the world clock.
Event Physics: Events are generated with a stochastic latency model: arrival_time = event_time + base_latency + jitter.
Incident Injection: Incidents are not just boolean flags; they modify the distribution of the stream. For example, a LATENCY_SPIKE incident increases the base_latency variance, causing "late arrival" data loss in aggregation windows.

2. Tiered Observation Model

Information is not free. Agents interact with a hierarchical observation space:

Dashboard (L0): Free to read. Provides aggregated metrics (Revenue, Error Rate). High signal but low granularity.
Stream Samples (L1): Costs 1 Tick. Provides raw JSON event snippets. Necessary for detecting jitter and schema mismatches.
Lineage Graph (L2): Costs 1 Tick. Provides the SQL logic and dependency chain of the data pipeline.

3. Counterfactual Query Engine (Causal Discovery)

To prove a hypothesis, agents can execute Counterfactual Actions.

Action: ask_counterfactual(window_offset=X)
Mechanism: The engine forks the internal state and simulates what the metrics would have been if a variable (like the late-arrival window) was modified.
Significance: This forces agents to move beyond pattern matching and perform true scientific experimentation.

⚖️ Deterministic Grading & F1 Evidence Scoring

CausalStream solves the "Subjective Reasoning" problem by requiring agents to submit a structured Theory. The environment enforces strict, deterministic grading based on the F1 score of evidence retrieval.

The Theory: The agent submits a RootCauseEnum (e.g., OUT_OF_ORDER) and a List[Evidence] (specific JSON tracking keys or IDs).
The Formula:
- Precision = True_Positives / (True_Positives + False_Positives)
- Recall = True_Positives / (True_Positives + False_Negatives)
- F1_Score = 2 * (Precision * Recall) / (Precision + Recall)
Scoring Bounds:
- Incorrect cause: 0.0.
- Correct cause: Exactly mathematically matches the F1 bound of (0.0, 1.0].

🛡️ Anti-Reward Hacking (Exploit Guardrails)

To prevent agents from "loop farming" API endpoints (Reward Hacking), all tools exist within a Stateful Tracking Set. A specific semantic action type is only rewarded once per episode. If an agent loops sample_stream indefinitely, the reward collapses to +0.00.

🚀 Technical Setup

Local Development

pip install -r requirements.txt
python server.py --port 7860

Docker Deployment

The environment is optimized for 2 vCPU / 8GB RAM constraints and exposes a REST API via FastAPI.

docker build -t causal-stream .
docker run -p 7860:7860 causal-stream

Developed for the Meta PyTorch OpenEnv Hackathon 2026. Focus: Causal Intelligence.