toolace-halu-modernbert-large

ModernBERT-large token classifier fine-tuned for span-level hallucination detection in tool-calling dialogues. Solves the "Hallucination Detection in Tool Calling" task on top of ToolACE.

  • Base model: KRLabsOrg/lettucedect-large-modernbert-en-v1 (RAGTruth-pretrained ModernBERT, used as warm-start)
  • Training data: Ivan1008/toolace-hallucination-spans, config combined, split train (1955 records)
  • Head: 7-class BIO token classification — O, B-/I-contradiction, B-/I-missing_tool, B-/I-overgeneration
  • Context: 2048 tokens, bf16

Input template

[CTX] <tool output JSON> [TOOLS] <comma-separated tool names> [Q] <user query> [A] <assistant answer>

Loss is computed only on tokens of the [A] <assistant answer> segment.

Training

  • 5 epochs, lr 2e-5, weight-decay 0.01, batch 8, warmup 6%, bf16
  • Best checkpoint selected by token-level macro F1 on combined/validation
  • Single H200, ~15 minutes
epoch val loss val token F1 val precision val recall
1 0.194 0.708 0.879 0.593
2 0.184 0.722 0.926 0.591
3 0.178 0.763 0.851 0.692
4 0.215 0.776 0.890 0.688
5 0.233 0.764 0.820 0.714

Test-set results (sentence-level F1 — the leaderboard metric)

Config Lexical floor LettuceDetect-large (zero-shot) LookBackLens (in-domain) This model + ensemble
combined 0.302 0.361 0.489 0.798 0.871
contradiction 0.231 0.315 0.377 0.763 0.877
missing_tool 0.218 0.330 0.406 0.966 0.993
overgeneration 0.319 0.335 0.508 0.697 0.824

Sentence-level F1 across all 9 evaluated methods (this notebook is the only training source) — see the project repo and notebooks/improve_baselines.ipynb for full code, plots and ensemble.

Usage

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_id = "ArsenyIvanov/toolace-halu-modernbert-large"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

text = (
    "[CTX] " + tool_context_json
    + " [TOOLS] " + ", ".join(tool_names)
    + " [Q] " + user_query
    + " [A] " + assistant_answer
)
enc = tokenizer(text, return_offsets_mapping=True, truncation=True, max_length=2048, return_tensors="pt")
logits = model(input_ids=enc["input_ids"], attention_mask=enc["attention_mask"]).logits
pred_ids = logits.argmax(dim=-1)[0].tolist()
labels = [model.config.id2label[i] for i in pred_ids]
# pair labels with offset_mapping → spans inside `assistant_answer`

Decoding: collapse contiguous runs of the same non-O label, map back to the character offsets of assistant_answer. See improve_baselines.ipynb §3 for the reference predict_to_spans() function.

Hallucination types

Label What it captures
contradiction grounded value replaced by a plausible-but-wrong alternative
missing_tool offers an action that requires a tool not in the available list
overgeneration inserted sentence with claims not supported by the tool output

Limitations

  • Synthetic corruptions only (dataset was constructed by injecting hallucinations into clean ToolACE responses), no naturally occurring cascading errors.
  • Context format is JSON-serialized tool output, not natural language — predictions on prose-style RAG contexts may degrade.
  • Single corruption type per record (RAGTruth strict schema); records with multiple co-occurring hallucinations are partially under-annotated.

License

Apache 2.0 (matches the base model and the training dataset).

Downloads last month
-
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ArsenyIvanov/toolace-halu-modernbert-large

Dataset used to train ArsenyIvanov/toolace-halu-modernbert-large

Paper for ArsenyIvanov/toolace-halu-modernbert-large