DeBERTa Tool Hallucination Span Detector

This is the final span-level model for the tool hallucination coursework. It predicts RAGTruth-style hallucinated character spans in assistant answers grounded on tool outputs.

Task

Input record fields:

  • query: user question
  • context: tool response or tool evidence
  • output: assistant answer
  • available_tool_names: optional list of available tools

Output:

  • row-level hallucination decision
  • hallucination type
  • predicted character spans in output

Method

The model is a microsoft/deberta-v3-small token classifier. Because some answers are long, inference uses overlapping answer windows. Each window is placed first in the sequence, followed by question, tool responses, and available tool names. Only answer-window tokens are supervised during training.

Decoding configuration

The validation-selected decoding configuration is:

{
  "threshold": 0.5,
  "min_span_chars": 1,
  "min_span_tokens": 1,
  "merge_gap_chars": 1,
  "strip_predicted_span_whitespace": true,
  "drop_spans_without_alnum": true,
  "score_name": "sum_non_O_probability"
}

Held-out test results

{
  "row_accuracy": 0.9516908212560387,
  "row_macro_f1": 0.9553091397849462,
  "binary_f1": 0.9561403508771931,
  "exact_span_f1": 0.8207171314741037,
  "overlap_span_f1_iou_0_01": 0.8685258964143425,
  "overlap_span_f1_iou_0_50": 0.8366533864541832,
  "char_span_precision": 0.971356003950896,
  "char_span_recall": 0.9955169920462762,
  "char_span_f1": 0.9832881016997572
}

Loading

from transformers import AutoTokenizer, AutoModelForTokenClassification

repo_id = "Ali-Bhai/deberta-tool-hallucination-span-detector"
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModelForTokenClassification.from_pretrained(repo_id)

Inference

from transformers import AutoTokenizer, AutoModelForTokenClassification
from span_inference import predict_record

repo_id = "Ali-Bhai/deberta-tool-hallucination-span-detector"
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModelForTokenClassification.from_pretrained(repo_id)

record = {
    "query": "What did the tool return?",
    "context": "... tool output ...",
    "output": "... assistant answer ...",
    "available_tool_names": [],
}

prediction = predict_record(record, model, tokenizer, device="cuda")
print(prediction["predicted_spans"])

Related repo

The earlier row-level classifier is here: https://huggingface.co/Ali-Bhai/deberta-tool-hallucination-detector.

Downloads last month
22
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ali-Bhai/deberta-tool-hallucination-span-detector

Finetuned
(195)
this model

Dataset used to train Ali-Bhai/deberta-tool-hallucination-span-detector