XLM-RoBERTa Multi-Head Classifier (Fine-Tuned) — CoreML

Fine-tuned CoreML version of XLM-RoBERTa-base with three classification heads for on-device multilingual text analysis on Apple Silicon. Performs sentiment analysis, multi-label tagging, and named entity recognition in a single forward pass.

Model Details

Architecture: XLM-RoBERTa-base (12 layers, 768 hidden, 12 heads) + 3 task-specific heads
Format: CoreML .mlpackage (mlprogram)
Variants: FP16 (529 MB), INT8 (266 MB)
Sequence length: 128 tokens
Input: Tokenized text (input_ids + attention_mask, int32)
Output: Three tensors — sentiment logits, tag logits, NER logits

Heads

Sentiment (4 classes)

Single-label classification: positive, neutral, risk, toxic

Tags (20 multi-label)

Multi-label phrase tagging: stress_signal, confidence, emotional_state, trust_indicator, defensiveness, active_listening, rapport_building, conflict_signal, cooperation, clarification, pressure_tactic, concession, information_sharing, commitment, deadline_mention, deception_signal, manipulation, power_dynamic, agreement, problem_solving

NER (9 BIO labels, per-token)

Named entity recognition: O, B-PER, I-PER, B-ORG, I-ORG, B-MONEY, I-MONEY, B-DATE, I-DATE

Training

Backbone LR: 2e-5 with cosine decay + 10% linear warmup
Head LR: 2e-4 (10x backbone)
Epochs: 3 (best checkpoint at epoch 2)
Batch size: 32
Device: Apple M3 Max (MPS)
Training data: RuSentiment (50K), GoEmotions mapped to 4-class (50K), WikiAnn-ru NER (50K), synthetic negotiation tags (1.7K)
Multi-task loss: weighted CE (sentiment) + BCE (tags) + CE with ignore_index (NER)

Metrics (Validation)

Head	Metric	Value
Sentiment	Accuracy	76.6%
Tags	F1 (macro)	57.0%
NER	Accuracy	94.8%
Combined	Val Loss	0.492

Model Files

File	Size	Description
`XLMRobertaMultiHead.mlpackage/`	529 MB	FP16 model
`XLMRobertaMultiHead_INT8.mlpackage/`	266 MB	INT8 quantized (recommended)
`label_definitions.json`	1 KB	Label mappings for all heads
`config.json`	—	Model configuration

Usage

import CoreML

let model = try MLModel(contentsOf: modelURL)

// Prepare inputs (use XLM-RoBERTa tokenizer)
let inputArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
let maskArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
// ... fill with tokenized text ...

let input = try MLDictionaryFeatureProvider(dictionary: [
    "input_ids": MLFeatureValue(multiArray: inputArray),
    "attention_mask": MLFeatureValue(multiArray: maskArray)
])

let output = try model.prediction(from: input)
let sentimentLogits = output.featureValue(for: "sentiment_logits")!.multiArrayValue!
let tagLogits = output.featureValue(for: "tag_logits")!.multiArrayValue!
let nerLogits = output.featureValue(for: "ner_logits")!.multiArrayValue!

Tokenizer

This model uses the standard XLM-RoBERTa tokenizer from FacebookAI/xlm-roberta-base. CoreML does not include the tokenizer — use tokenizers library or bundle tokenizer.json separately.

Attribution

Base model XLM-RoBERTa by Facebook AI. Fine-tuning on Russian/English datasets and CoreML conversion by @smkrv.

Downloads last month: -

Model tree for smkrv/xlm-roberta-multihead-coreml

Base model

FacebookAI/xlm-roberta-base

Quantized

(21)

this model

smkrv
/

xlm-roberta-multihead-coreml