Nomearod Claude Opus 4.6 (1M context) commited on
Commit
2293da9
1 Parent(s): 503f5c4

docs: sharpen zero-hallucination claim, explain Mistral-7B row

Browse files

Reframe the headline metric to "on all API provider configurations"
and add a sentence explaining the self-hosted Mistral-7B benchmark
as a deliberate model-size floor for agentic retrieval.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  ![CI](https://github.com/tyy0811/agent-bench/actions/workflows/ci.yaml/badge.svg)
4
 
5
- Agentic knowledge retrieval system with evaluation benchmark. Custom orchestration pipeline + LangChain baseline, evaluated on the same 27-question golden dataset across 3 providers (OpenAI, Anthropic, self-hosted vLLM on Modal). Zero hallucinated citations in all API configurations.
6
 
7
  `288 tests` 路 `3 providers` 路 `LangChain comparison` 路 `K8s + Terraform` 路 `CI`
8
 
 
2
 
3
  ![CI](https://github.com/tyy0811/agent-bench/actions/workflows/ci.yaml/badge.svg)
4
 
5
+ Agentic knowledge retrieval system with evaluation benchmark. Custom orchestration pipeline + LangChain baseline, evaluated on the same 27-question golden dataset across 3 providers (OpenAI, Anthropic, self-hosted vLLM on Modal). Zero hallucinated citations on all API provider configurations. The separate self-hosted Mistral-7B benchmark is included to show the practical model-size floor where agentic retrieval starts to break down.
6
 
7
  `288 tests` 路 `3 providers` 路 `LangChain comparison` 路 `K8s + Terraform` 路 `CI`
8