XLM-RoBERTa Multi-Head Classifier (Fine-Tuned) — CoreML

Fine-tuned CoreML version of XLM-RoBERTa-base with three classification heads for on-device multilingual text analysis on Apple Silicon. Performs sentiment analysis, multi-label tagging, and named entity recognition in a single forward pass.

Model Details

  • Architecture: XLM-RoBERTa-base (12 layers, 768 hidden, 12 heads) + 3 task-specific heads
  • Format: CoreML .mlpackage (mlprogram)
  • Variants: FP16 (529 MB), INT8 (266 MB)
  • Sequence length: 128 tokens
  • Input: Tokenized text (input_ids + attention_mask, int32)
  • Output: Three tensors — sentiment logits, tag logits, NER logits

Heads

Sentiment (4 classes)

Single-label classification: positive, neutral, risk, toxic

Tags (20 multi-label)

Multi-label phrase tagging: stress_signal, confidence, emotional_state, trust_indicator, defensiveness, active_listening, rapport_building, conflict_signal, cooperation, clarification, pressure_tactic, concession, information_sharing, commitment, deadline_mention, deception_signal, manipulation, power_dynamic, agreement, problem_solving

NER (9 BIO labels, per-token)

Named entity recognition: O, B-PER, I-PER, B-ORG, I-ORG, B-MONEY, I-MONEY, B-DATE, I-DATE

Training

  • Backbone LR: 2e-5 with cosine decay + 10% linear warmup
  • Head LR: 2e-4 (10x backbone)
  • Epochs: 3 (best checkpoint at epoch 2)
  • Batch size: 32
  • Device: Apple M3 Max (MPS)
  • Training data: RuSentiment (50K), GoEmotions mapped to 4-class (50K), WikiAnn-ru NER (50K), synthetic negotiation tags (1.7K)
  • Multi-task loss: weighted CE (sentiment) + BCE (tags) + CE with ignore_index (NER)

Metrics (Validation)

Head Metric Value
Sentiment Accuracy 76.6%
Tags F1 (macro) 57.0%
NER Accuracy 94.8%
Combined Val Loss 0.492

Model Files

File Size Description
XLMRobertaMultiHead.mlpackage/ 529 MB FP16 model
XLMRobertaMultiHead_INT8.mlpackage/ 266 MB INT8 quantized (recommended)
label_definitions.json 1 KB Label mappings for all heads
config.json Model configuration

Usage

import CoreML

let model = try MLModel(contentsOf: modelURL)

// Prepare inputs (use XLM-RoBERTa tokenizer)
let inputArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
let maskArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
// ... fill with tokenized text ...

let input = try MLDictionaryFeatureProvider(dictionary: [
    "input_ids": MLFeatureValue(multiArray: inputArray),
    "attention_mask": MLFeatureValue(multiArray: maskArray)
])

let output = try model.prediction(from: input)
let sentimentLogits = output.featureValue(for: "sentiment_logits")!.multiArrayValue!
let tagLogits = output.featureValue(for: "tag_logits")!.multiArrayValue!
let nerLogits = output.featureValue(for: "ner_logits")!.multiArrayValue!

Tokenizer

This model uses the standard XLM-RoBERTa tokenizer from FacebookAI/xlm-roberta-base. CoreML does not include the tokenizer — use tokenizers library or bundle tokenizer.json separately.

Attribution

Base model XLM-RoBERTa by Facebook AI. Fine-tuning on Russian/English datasets and CoreML conversion by @smkrv.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smkrv/xlm-roberta-multihead-coreml

Quantized
(8)
this model

Dataset used to train smkrv/xlm-roberta-multihead-coreml