AgentSight β Step-Level Hallucination Classifier
AgentSight is the first gradient-trained step-level hallucination classifier
on the AgentHallu benchmark.
It identifies which step in an autonomous agent trajectory is the root cause of a hallucination.
Results (locked test set, one-shot evaluation)
| Metric | Value | 95% CI | Note |
|---|---|---|---|
| Step Localisation Acc | 47.8% (32/67) | [36.3%, 59.5%] | vs Gemini 41.1% β p=0.163, not stat. sig. |
| Judgment Macro-F1 | 54.7% | β | n=105 trajectories |
| Judgment Precision | 58.5% | β | threshold=0.40 |
| Judgment Recall | 55.7% | β |
The +6.7 point margin over Gemini-2.5-Pro is practically meaningful but not statistically significant at n=67. The 95% CI spans 23 points. We report this honestly.
Key Finding
Removing the [ACTION]+[OBS] tool-execution channel from the input drops step localisation accuracy by 31.8 percentage points (48.5% β 16.7% on the validation set). Tool output content is the dominant localisation signal.
Architecture
- Backbone: DeBERTa-v3-base with LoRA adapters (r=16, Ξ±=64) β 1.42% of parameters trained
- Context encoder: 3-layer Pre-LN Transformer (8 heads) over the step sequence
- Head: 2-layer MLP β binary hallucination logit per step
- Input:
[QUERY] task [THOUGHT] reasoning [ACTION] tool_calls [OBS] responses(max 512 tokens) - Decision threshold: 0.40 (tuned on validation set)
Quick Start
from agentsight_sdk import AgentMonitor
# Auto-downloads weights from this repo
monitor = AgentMonitor("YOUR_HF_USERNAME/agentsight")
result = monitor.evaluate_trajectory(raw_trajectory_dict)
print(result["is_hallucinated"]) # True / False
print(result["predicted_root_cause_step"]) # e.g. 3
print(result["step_probabilities"]) # [0.12, 0.08, 0.91, 0.04]
Input Format
{
"question": "What is the capital of France?",
"history": [
{
"step": 1,
"content": "I will search for this.",
"tool_calls": [{"name": "web_search", "arguments": {"query": "capital France"}}],
"tool_responses": ["Paris is the capital of France."]
}
]
}
Training
git clone https://github.com/Minato-sudo/agentsight
cd agentsight
pip install -r requirements.txt
./run_full_pipeline.sh # verify β train β ablate β test (one-shot)
Citation
@misc{agentsight2026,
title={AgentSight: Step-Level Hallucination Localisation in Autonomous Agent Trajectories via Tool-Execution Context},
author={Namikaze, Minato},
year={2026},
note={Paper + code: https://github.com/Minato-sudo/agentsight}
}
Files
| File | Description |
|---|---|
best_agentsight.pth |
PyTorch state dict β load with AgentSightModel() |
best_agentsight_meta.json |
Threshold (0.40), val metrics, best epoch |
tokenizer_config/ |
DeBERTa-v3-base tokeniser config (max_len=512) |
Limitations
- Evaluated on AgentHallu only β generalisation to other benchmarks unknown
- n=67 test set means wide confidence intervals
- Judgment F1 (54.7%) is below frontier models (Gemini 64.6%, GPT-5 70.2%)
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Space using talha1234567/Agentic-Ai 1
Evaluation results
- Step Localisation Accuracy (test, n=67) on AgentHalluself-reported0.478
- Judgment Macro-F1 (test, n=105) on AgentHalluself-reported0.547