agentbench / tests /evaluation /fixtures /rubrics_valid_binary.md
Nomearod's picture
feat(judges): Rubric markdown loader with aggressive validation
7b72b2c
---
dimension: groundedness
scale: binary
reference_based: true
abstain_allowed: true
---
# Groundedness (binary)
Score whether every claim in the answer is supported by the gold source snippets.
## Score 0
Answer contains at least one claim not supported by the snippets.
### Example A β€” answer cites unsupported fact
Question: "What's the default port?"
Snippets: ["The default is 8080."]
Answer: "The default is 8080 and supports TLS."
Score=0 because the TLS claim has no support in the snippet. The
unsupported claim is sufficient to fail groundedness regardless of
how many other claims are correctly grounded β€” this is the binary
rubric's strict-conjunction definition.
## Score 1
Every claim in the answer is supported by at least one snippet.
### Example B β€” fully grounded one-sentence answer
Question: "What's the default port?"
Snippets: ["The default is 8080."]
Answer: "The default port is 8080."
Score=1 because the only claim ("default port is 8080") is directly
supported by the snippet. Paraphrase is allowed; what matters is
factual entailment.