MiniCheck-RoBERTa-Large Core ML (ANE) + tokenizer for MMAI faithfulness judge

93a7119 verified 15 days ago

1.4 kB

	---
	license: mit
	base_model: lytang/MiniCheck-RoBERTa-Large
	tags:
	- coreml
	- text-classification
	- fact-checking
	- grounding
	language:
	- en
	---

	# MiniCheck-RoBERTa-Large — Core ML (Apple Neural Engine)

	Core ML conversion of [lytang/MiniCheck-RoBERTa-Large](https://huggingface.co/lytang/MiniCheck-RoBERTa-Large)
	(MIT) — a specialized grounding / fact-verification model — for in-app use on the Apple Neural Engine
	via Core ML. Used by Marvel Mirror AI as a claim-by-claim faithfulness judge: does a source support a claim?

	## Contents
	- `MiniCheckRoBERTa.mlpackage` — the Core ML model (fp16 weights). Inputs: `input_ids`, `attention_mask`
	(int32, length 512). Output: `support_prob` = probability the claim is supported (class 1).
	- RoBERTa fast-tokenizer files (`tokenizer.json`, `vocab.json`, `merges.txt`, `tokenizer_config.json`,
	`special_tokens_map.json`).

	## Input format
	`doc + </s> + claim`, tokenized with the RoBERTa tokenizer (`max_length` 512, padded). `support_prob > 0.5`
	= supported.

	## Provenance
	Converted with coremltools 9.0 (torch 2.7.0 / transformers 4.46.3), targeting CPU + Neural Engine.
	~97% of compute-bearing ops run on the ANE. Verdict-parity with the PyTorch source (max probability
	diff < 0.007); reproduces the source's full-set accuracy (21/21 fabrications caught, incl. all
	meaning- and numeric-inversions). ~60 ms/check on the ANE.