Upload fine-tuned XLM-RoBERTa multi-head CoreML (FP16 + INT8)

a65a996 verified 13 days ago

3.87 kB

	---
	license: mit
	library_name: coreml
	base_model: FacebookAI/xlm-roberta-base
	tags:
	- text-classification
	- sentiment-analysis
	- named-entity-recognition
	- multi-label-classification
	- multi-task
	- coreml
	- russian
	- xlm-roberta
	language:
	- ru
	- en
	pipeline_tag: text-classification
	datasets:
	- RuSentiment
	- google/goEmotions
	- wikiann
	---

	# XLM-RoBERTa Multi-Head Classifier (Fine-Tuned) — CoreML

	Fine-tuned CoreML version of XLM-RoBERTa-base with three classification heads for on-device multilingual text analysis on Apple Silicon. Performs sentiment analysis, multi-label tagging, and named entity recognition in a single forward pass.

	## Model Details

	- Architecture: XLM-RoBERTa-base (12 layers, 768 hidden, 12 heads) + 3 task-specific heads
	- Format: CoreML `.mlpackage` (mlprogram)
	- Variants: FP16 (529 MB), INT8 (266 MB)
	- Sequence length: 128 tokens
	- Input: Tokenized text (`input_ids` + `attention_mask`, int32)
	- Output: Three tensors — sentiment logits, tag logits, NER logits

	## Heads

	### Sentiment (4 classes)
	Single-label classification: `positive`, `neutral`, `risk`, `toxic`

	### Tags (20 multi-label)
	Multi-label phrase tagging: `stress_signal`, `confidence`, `emotional_state`, `trust_indicator`, `defensiveness`, `active_listening`, `rapport_building`, `conflict_signal`, `cooperation`, `clarification`, `pressure_tactic`, `concession`, `information_sharing`, `commitment`, `deadline_mention`, `deception_signal`, `manipulation`, `power_dynamic`, `agreement`, `problem_solving`

	### NER (9 BIO labels, per-token)
	Named entity recognition: `O`, `B-PER`, `I-PER`, `B-ORG`, `I-ORG`, `B-MONEY`, `I-MONEY`, `B-DATE`, `I-DATE`

	## Training

	- Backbone LR: 2e-5 with cosine decay + 10% linear warmup
	- Head LR: 2e-4 (10x backbone)
	- Epochs: 3 (best checkpoint at epoch 2)
	- Batch size: 32
	- Device: Apple M3 Max (MPS)
	- Training data: RuSentiment (50K), GoEmotions mapped to 4-class (50K), WikiAnn-ru NER (50K), synthetic negotiation tags (1.7K)
	- Multi-task loss: weighted CE (sentiment) + BCE (tags) + CE with ignore_index (NER)

	## Metrics (Validation)

	\| Head \| Metric \| Value \|
	\|------\|--------\|-------\|
	\| Sentiment \| Accuracy \| 76.6% \|
	\| Tags \| F1 (macro) \| 57.0% \|
	\| NER \| Accuracy \| 94.8% \|
	\| Combined \| Val Loss \| 0.492 \|

	## Model Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `XLMRobertaMultiHead.mlpackage/` \| 529 MB \| FP16 model \|
	\| `XLMRobertaMultiHead_INT8.mlpackage/` \| 266 MB \| INT8 quantized (recommended) \|
	\| `label_definitions.json` \| 1 KB \| Label mappings for all heads \|
	\| `config.json` \| — \| Model configuration \|

	## Usage

	```swift
	import CoreML

	let model = try MLModel(contentsOf: modelURL)

	// Prepare inputs (use XLM-RoBERTa tokenizer)
	let inputArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
	let maskArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
	// ... fill with tokenized text ...

	let input = try MLDictionaryFeatureProvider(dictionary: [
	"input_ids": MLFeatureValue(multiArray: inputArray),
	"attention_mask": MLFeatureValue(multiArray: maskArray)
	])

	let output = try model.prediction(from: input)
	let sentimentLogits = output.featureValue(for: "sentiment_logits")!.multiArrayValue!
	let tagLogits = output.featureValue(for: "tag_logits")!.multiArrayValue!
	let nerLogits = output.featureValue(for: "ner_logits")!.multiArrayValue!
	```

	## Tokenizer

	This model uses the standard XLM-RoBERTa tokenizer from `FacebookAI/xlm-roberta-base`. CoreML does not include the tokenizer — use `tokenizers` library or bundle `tokenizer.json` separately.

	## Attribution

	Base model [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-base) by Facebook AI. Fine-tuning on Russian/English datasets and CoreML conversion by [@smkrv](https://huggingface.co/smkrv).