skyruh commited on
Commit
eff8e56
·
1 Parent(s): 957335e

docs: define explicit F1 evidence scoring math and reward hacking guardrails

Browse files
Files changed (1) hide show
  1. README.md +14 -5
README.md CHANGED
@@ -31,12 +31,21 @@ To prove a hypothesis, agents can execute **Counterfactual Actions**.
31
  - **Mechanism**: The engine forks the internal state and simulates what the metrics *would have been* if a variable (like the late-arrival window) was modified.
32
  - **Significance**: This forces agents to move beyond pattern matching and perform true scientific experimentation.
33
 
34
- ## ⚖️ Deterministic Grading & F1 Evidence
35
 
36
- CausalStream solves the "Subjective Reasoning" problem by requiring agents to submit a structured **Theory**.
37
- - **The Theory**: Consists of a `RootCauseEnum` (e.g., `OUT_OF_ORDER`) and a `List[Evidence]`.
38
- - **F1 Scoring**: The grader calculates the **Precision and Recall** of the submitted evidence tokens.
39
- - To get a 1.0 (Full Credit), an agent must identify the correct cause AND provide exactly the evidence tokens (specific IDs or timestamps) that prove that cause without including "noise" tokens.
 
 
 
 
 
 
 
 
 
40
 
41
  ## 🚀 Technical Setup
42
 
 
31
  - **Mechanism**: The engine forks the internal state and simulates what the metrics *would have been* if a variable (like the late-arrival window) was modified.
32
  - **Significance**: This forces agents to move beyond pattern matching and perform true scientific experimentation.
33
 
34
+ ## ⚖️ Deterministic Grading & F1 Evidence Scoring
35
 
36
+ CausalStream solves the "Subjective Reasoning" problem by requiring agents to submit a structured **Theory**. The environment enforces strict, deterministic grading based on the F1 score of evidence retrieval.
37
+
38
+ - **The Theory**: The agent submits a `RootCauseEnum` (e.g., `OUT_OF_ORDER`) and a `List[Evidence]` (specific JSON tracking keys or IDs).
39
+ - **The Formula**:
40
+ - `Precision = True_Positives / (True_Positives + False_Positives)`
41
+ - `Recall = True_Positives / (True_Positives + False_Negatives)`
42
+ - `F1_Score = 2 * (Precision * Recall) / (Precision + Recall)`
43
+ - **Scoring Bounds**:
44
+ - Incorrect cause: `0.0`.
45
+ - Correct cause: Exactly mathematically matches the F1 bound of `(0.0, 1.0]`.
46
+
47
+ ### 🛡️ Anti-Reward Hacking (Exploit Guardrails)
48
+ To prevent agents from "loop farming" API endpoints (Reward Hacking), all tools exist within a **Stateful Tracking Set**. A specific semantic action type is only rewarded *once* per episode. If an agent loops `sample_stream` indefinitely, the reward collapses to +0.00.
49
 
50
  ## 🚀 Technical Setup
51