Text Classification
Scikit-learn
Joblib
Safetensors
English
hallucination-detection
tool-calling
span-detection
ensemble
Instructions to use jameVee/ToolACE-Hallucination-Detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use jameVee/ToolACE-Hallucination-Detector with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("jameVee/ToolACE-Hallucination-Detector", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: mit | |
| base_model: | |
| - KRLabsOrg/lettucedect-base-modernbert-en-v1 | |
| - Qwen/Qwen2.5-0.5B | |
| datasets: | |
| - jameVee/ToolACE-Hallucination | |
| tags: | |
| - hallucination-detection | |
| - tool-calling | |
| - text-classification | |
| - span-detection | |
| - sklearn | |
| - ensemble | |
| metrics: | |
| - f1 | |
| - roc_auc | |
| pipeline_tag: text-classification | |
| pretty_name: ToolACE Hallucination Detector | |
| # ToolACE Hallucination Detector | |
| A lightweight ensemble classifier for detecting hallucinations in LLM responses that follow tool calls. It identifies three hallucination types — **missing tool reference**, **unsupported overgeneration**, and **tool-output contradiction** — and returns both a sample-level binary label and character-level span predictions. | |
| --- | |
| ## Background | |
| When an LLM answers a user query using tool outputs, it can hallucinate in distinct ways: | |
| | Type | Description | | |
| |---|---| | |
| | `missing_tool` | The assistant suggests using a tool that was not available | | |
| | `overgeneration` | The answer contains plausible but unsupported extra facts | | |
| | `tool_output_contradiction` | The answer contradicts specific facts returned by the tool | | |
| This model was trained on **[jameVee/ToolACE-Hallucination](https://huggingface.co/datasets/jameVee/ToolACE-Hallucination)**, a benchmark of ~4 100 examples derived from [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE). Each of the three hallucination datasets was generated by corrupting ~50 % of clean ToolACE examples with a specific hallucination type (see dataset card for details). | |
| --- | |
| ## Architecture | |
| The detector is a **three-branch soft-voting ensemble**. Each branch independently produces a hallucination probability, and the three scores are averaged to give the final prediction. | |
| ``` | |
| ┌──────────────────────────────────┐ | |
| query + context ──────► │ Branch A: Lexical Verifier │ ──► P_A | |
| + tool output │ (token overlap span detector │ | |
| │ → StandardScaler + LogReg) │ | |
| └──────────────────────────────────┘ | |
| ┌──────────────────────────────────┐ | |
| ───► │ Branch B: LettuceDetect │ ──► P_B | |
| │ (ModernBERT span model │ | |
| │ → StandardScaler + LogReg) │ | |
| └──────────────────────────────────┘ | |
| ┌──────────────────────────────────┐ | |
| ───► │ Branch C: LookBack-style │ ──► P_C | |
| │ (Qwen2.5-0.5B attention ratios │ | |
| │ → StandardScaler + LogReg) │ | |
| └──────────────────────────────────┘ | |
| │ | |
| avg(P_A, P_B, P_C) ≥ 0.5 | |
| │ | |
| hallucinated? | |
| ``` | |
| ### Branch A — Lexical Span Verifier | |
| Checks whether tokens in the assistant's answer are grounded in the (normalized) tool output using lexical overlap. Unsupported token sequences become candidate hallucination spans. Span-level features (count, coverage, max score) are fed into a logistic regression. | |
| ### Branch B — LettuceDetect (supervised span model) | |
| Uses [KRLabsOrg/lettucedect-base-modernbert-en-v1](https://huggingface.co/KRLabsOrg/lettucedect-base-modernbert-en-v1), a ModernBERT-based transformer fine-tuned for grounded hallucination detection. It produces character-level spans scored by confidence. A logistic regression trained on span features converts those into a sample-level probability. **Best single branch** (AUROC 0.721, Span-F1 0.208). | |
| ### Branch C — LookBack-style Detector | |
| Passes the concatenation of context and answer through [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) and computes, for each answer token, the ratio of attention it pays to the context versus to previously generated tokens. Low-grounding tokens (ratio < 0.22) are merged into hallucination spans. Span and ratio features are fed to logistic regression. | |
| --- | |
| ## Files in this repository | |
| | File | Description | | |
| |---|---| | |
| | `lexical_clf.joblib` | Branch A sklearn Pipeline (StandardScaler + LogisticRegression) | | |
| | `lettuce_clf.joblib` | Branch B sklearn Pipeline (StandardScaler + LogisticRegression) | | |
| | `lookback_clf.joblib` | Branch C sklearn Pipeline (StandardScaler + LogisticRegression) | | |
| | `model_meta.json` | Feature column names, backbone IDs, train/test split sizes, test metrics | | |
| | `evaluation_baselines_span_utils.py` | Span extraction helpers required at inference time | | |
| --- | |
| ## Performance (test set, 20 % held-out, grouped split by example_id) | |
| | Method | Accuracy | F1 | AUROC | Span-F1 | | |
| |---|---|---|---|---| | |
| | Lexical span verifier | 0.566 | 0.523 | 0.603 | 0.053 | | |
| | LettuceDetect (Branch B) | **0.678** | 0.632 | **0.721** | **0.208** | | |
| | LookBack (Branch C) | 0.489 | 0.561 | 0.511 | 0.000 | | |
| | **Soft-vote ensemble** | 0.676 | **0.659** | 0.721 | 0.179 | | |
| Per-type F1 (ensemble): | |
| | Corruption type | F1 | | |
| |---|---| | |
| | `missing_tool` | 0.617 | | |
| | `overgeneration` | 0.693 | | |
| | `tool_output_contradiction` | 0.779 | | |
| | `clean` | — | | |
| --- | |
| ## How to use | |
| ### 1. Install dependencies | |
| ```bash | |
| pip install joblib scikit-learn transformers lettucedetect torch | |
| ``` | |
| ### 2. Load the classifiers | |
| ```python | |
| import joblib, json | |
| from pathlib import Path | |
| repo = Path("hallucination_detector") # or your local clone path | |
| lex_clf = joblib.load(repo / "lexical_clf.joblib") | |
| lettuce_clf = joblib.load(repo / "lettuce_clf.joblib") | |
| lbl_clf = joblib.load(repo / "lookback_clf.joblib") | |
| with open(repo / "model_meta.json") as f: | |
| meta = json.load(f) | |
| ``` | |
| ### 3. Load the backbone models | |
| ```python | |
| import torch | |
| from lettucedetect.models.inference import HallucinationDetector | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| lettuce_detector = HallucinationDetector( | |
| method="transformer", | |
| model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1", | |
| ) | |
| DEVICE = "cuda" if torch.cuda.is_available() else "cpu" | |
| lbl_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B", use_fast=True) | |
| lbl_model = AutoModelForCausalLM.from_pretrained( | |
| "Qwen/Qwen2.5-0.5B", torch_dtype=torch.bfloat16, attn_implementation="eager" | |
| ).to(DEVICE).eval() | |
| ``` | |
| ### 4. Run inference | |
| ```python | |
| import sys | |
| sys.path.insert(0, str(repo)) # make span utils importable | |
| from evaluation_baselines_span_utils import ( | |
| add_normalized_context_columns, | |
| aggregate_lookback_features, | |
| aggregate_span_features, | |
| lexical_hallucination_spans, | |
| spans_from_lookback_ratios, | |
| merge_spans, | |
| ) | |
| import pandas as pd, numpy as np | |
| def predict(query: str, context: str, output: str) -> dict: | |
| row = pd.Series({"query": query, "context": context, "output": output}) | |
| # normalize context (converts raw tool JSON to readable text) | |
| row_df = add_normalized_context_columns(pd.DataFrame([row])) | |
| row = row_df.iloc[0] | |
| # Branch A — lexical | |
| lex_spans = lexical_hallucination_spans(row) | |
| lex_feats = aggregate_span_features(lex_spans, len(output)) | |
| p_lex = lex_clf.predict_proba(pd.DataFrame([lex_feats]))[0, 1] | |
| # Branch B — LettuceDetect | |
| raw_spans = lettuce_detector.predict( | |
| context=[row["normalized_context"]], | |
| question=query, answer=output, output_format="spans", | |
| ) | |
| lettuce_spans = merge_spans([ | |
| {"start": int(s["start"]), "end": int(s["end"]), | |
| "text": output[int(s["start"]):int(s["end"])], | |
| "type": "hallucination", "score": float(s.get("score", 0.0))} | |
| for s in raw_spans if int(s.get("end", 0)) > int(s.get("start", 0)) | |
| ]) | |
| lettuce_feats = aggregate_span_features(lettuce_spans, len(output)) | |
| p_lettuce = lettuce_clf.predict_proba(pd.DataFrame([lettuce_feats]))[0, 1] | |
| # Branch C — LookBack | |
| from evaluation_baselines_span_utils import aggregate_lookback_features | |
| # (reuse compute_lookback_ratios from the training notebook) | |
| # p_lbl = lbl_clf.predict_proba(pd.DataFrame([lbl_feats]))[0, 1] | |
| # For a self-contained example we skip Branch C and average A+B only: | |
| p_ensemble = np.mean([p_lex, p_lettuce]) | |
| return { | |
| "hallucinated": bool(p_ensemble >= 0.5), | |
| "score": float(p_ensemble), | |
| "lex_score": float(p_lex), | |
| "lettuce_score": float(p_lettuce), | |
| "lettuce_spans": lettuce_spans, | |
| } | |
| result = predict( | |
| query="What is the current price of AAPL?", | |
| context='Stock API: {"ticker": "AAPL", "price": 189.50, "change": "+1.2%"}', | |
| output="The current price of AAPL is $189.50, up 1.2%. It also hit an all-time high last Tuesday.", | |
| ) | |
| print(result) | |
| ``` | |
| --- | |
| ## Training details | |
| - **Source dataset**: [jameVee/ToolACE-Hallucination](https://huggingface.co/datasets/jameVee/ToolACE-Hallucination) (1 034 base examples × 4 variants = 4 136 rows total) | |
| - **Split**: 80 / 20 grouped by `example_id` (no leakage between clean and corrupted variants of the same query) | |
| - **Classifiers**: scikit-learn `LogisticRegression(max_iter=1000)` wrapped in a `StandardScaler` pipeline | |
| - **Random seed**: 1241 | |
| ### About the training dataset | |
| `jameVee/ToolACE-Hallucination` contains three JSONL files, each derived from [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE): | |
| - `missing_tool_dataset.jsonl` — generated with `unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit` (50 % corruption rate): a sentence is appended that refers to a non-existent tool. | |
| - `overgeneration_dataset.jsonl` — same generator (50 % corruption rate): a plausible but unsupported sentence is appended. | |
| - `tool_output_contradiction_dataset.jsonl` — generated with `openai/gpt-4o-mini` via OpenRouter (all entries attempted, strength 0.9): the answer is rewritten to contradict grounded facts from the tool output. | |
| Each entry carries character-level `hallucination_labels` marking the corrupted span(s). | |
| --- | |
| ## Limitations | |
| - The classifiers are trained on a relatively small dataset (~3 300 training rows). Performance may degrade on domains or tool schemas not represented in ToolACE. | |
| - Branch C (LookBack) shows weak span-level performance at the default threshold (0.22); tuning this on a validation split is recommended. | |
| - The ensemble does not produce type-specific labels — it only predicts binary hallucinated / clean at the sample level. | |
| --- | |
| ## Citation | |
| If you use this model or the associated datasets, please cite: | |
| ```bibtex | |
| @misc{toolace_hallucination_detector, | |
| author = {jameVee}, | |
| title = {ToolACE Hallucination Detector}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| url = {https://huggingface.co/jameVee/ToolACE-Hallucination-Detector} | |
| } | |
| @dataset{toolace_hallucination, | |
| author = {jameVee}, | |
| title = {ToolACE-Hallucination}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| url = {https://huggingface.co/datasets/jameVee/ToolACE-Hallucination} | |
| } | |
| @dataset{toolace, | |
| author = {Team-ACE}, | |
| title = {ToolACE}, | |
| year = {2024}, | |
| publisher = {Hugging Face}, | |
| url = {https://huggingface.co/datasets/Team-ACE/ToolACE} | |
| } | |
| ``` | |