DeBERTa Tool Hallucination Span Detector

This is the final span-level model for the tool hallucination coursework. It predicts RAGTruth-style hallucinated character spans in assistant answers grounded on tool outputs.

Task

Input record fields:

query: user question
context: tool response or tool evidence
output: assistant answer
available_tool_names: optional list of available tools

Output:

row-level hallucination decision
hallucination type
predicted character spans in output

Method

The model is a microsoft/deberta-v3-small token classifier. Because some answers are long, inference uses overlapping answer windows. Each window is placed first in the sequence, followed by question, tool responses, and available tool names. Only answer-window tokens are supervised during training.

Decoding configuration

The validation-selected decoding configuration is:

{
  "threshold": 0.5,
  "min_span_chars": 1,
  "min_span_tokens": 1,
  "merge_gap_chars": 1,
  "strip_predicted_span_whitespace": true,
  "drop_spans_without_alnum": true,
  "score_name": "sum_non_O_probability"
}

Held-out test results

{
  "row_accuracy": 0.9516908212560387,
  "row_macro_f1": 0.9553091397849462,
  "binary_f1": 0.9561403508771931,
  "exact_span_f1": 0.8207171314741037,
  "overlap_span_f1_iou_0_01": 0.8685258964143425,
  "overlap_span_f1_iou_0_50": 0.8366533864541832,
  "char_span_precision": 0.971356003950896,
  "char_span_recall": 0.9955169920462762,
  "char_span_f1": 0.9832881016997572
}

Loading

from transformers import AutoTokenizer, AutoModelForTokenClassification

repo_id = "Ali-Bhai/deberta-tool-hallucination-span-detector"
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModelForTokenClassification.from_pretrained(repo_id)

Inference

from transformers import AutoTokenizer, AutoModelForTokenClassification
from span_inference import predict_record

repo_id = "Ali-Bhai/deberta-tool-hallucination-span-detector"
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = AutoModelForTokenClassification.from_pretrained(repo_id)

record = {
    "query": "What did the tool return?",
    "context": "... tool output ...",
    "output": "... assistant answer ...",
    "available_tool_names": [],
}

prediction = predict_record(record, model, tokenizer, device="cuda")
print(prediction["predicted_spans"])

Related repo

The earlier row-level classifier is here: https://huggingface.co/Ali-Bhai/deberta-tool-hallucination-detector.

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Ali-Bhai/deberta-tool-hallucination-span-detector

Base model

microsoft/deberta-v3-small

Finetuned

(201)

this model

Ali-Bhai
/

deberta-tool-hallucination-span-detector