Spaces:
Sleeping
Sleeping
docs: define explicit F1 evidence scoring math and reward hacking guardrails
Browse files
README.md
CHANGED
|
@@ -31,12 +31,21 @@ To prove a hypothesis, agents can execute **Counterfactual Actions**.
|
|
| 31 |
- **Mechanism**: The engine forks the internal state and simulates what the metrics *would have been* if a variable (like the late-arrival window) was modified.
|
| 32 |
- **Significance**: This forces agents to move beyond pattern matching and perform true scientific experimentation.
|
| 33 |
|
| 34 |
-
## ⚖️ Deterministic Grading & F1 Evidence
|
| 35 |
|
| 36 |
-
CausalStream solves the "Subjective Reasoning" problem by requiring agents to submit a structured **Theory**.
|
| 37 |
-
|
| 38 |
-
- **
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## 🚀 Technical Setup
|
| 42 |
|
|
|
|
| 31 |
- **Mechanism**: The engine forks the internal state and simulates what the metrics *would have been* if a variable (like the late-arrival window) was modified.
|
| 32 |
- **Significance**: This forces agents to move beyond pattern matching and perform true scientific experimentation.
|
| 33 |
|
| 34 |
+
## ⚖️ Deterministic Grading & F1 Evidence Scoring
|
| 35 |
|
| 36 |
+
CausalStream solves the "Subjective Reasoning" problem by requiring agents to submit a structured **Theory**. The environment enforces strict, deterministic grading based on the F1 score of evidence retrieval.
|
| 37 |
+
|
| 38 |
+
- **The Theory**: The agent submits a `RootCauseEnum` (e.g., `OUT_OF_ORDER`) and a `List[Evidence]` (specific JSON tracking keys or IDs).
|
| 39 |
+
- **The Formula**:
|
| 40 |
+
- `Precision = True_Positives / (True_Positives + False_Positives)`
|
| 41 |
+
- `Recall = True_Positives / (True_Positives + False_Negatives)`
|
| 42 |
+
- `F1_Score = 2 * (Precision * Recall) / (Precision + Recall)`
|
| 43 |
+
- **Scoring Bounds**:
|
| 44 |
+
- Incorrect cause: `0.0`.
|
| 45 |
+
- Correct cause: Exactly mathematically matches the F1 bound of `(0.0, 1.0]`.
|
| 46 |
+
|
| 47 |
+
### 🛡️ Anti-Reward Hacking (Exploit Guardrails)
|
| 48 |
+
To prevent agents from "loop farming" API endpoints (Reward Hacking), all tools exist within a **Stateful Tracking Set**. A specific semantic action type is only rewarded *once* per episode. If an agent loops `sample_stream` indefinitely, the reward collapses to +0.00.
|
| 49 |
|
| 50 |
## 🚀 Technical Setup
|
| 51 |
|