smkrv's picture
Upload fine-tuned XLM-RoBERTa multi-head CoreML (FP16 + INT8)
a65a996 verified
---
license: mit
library_name: coreml
base_model: FacebookAI/xlm-roberta-base
tags:
- text-classification
- sentiment-analysis
- named-entity-recognition
- multi-label-classification
- multi-task
- coreml
- russian
- xlm-roberta
language:
- ru
- en
pipeline_tag: text-classification
datasets:
- RuSentiment
- google/goEmotions
- wikiann
---
# XLM-RoBERTa Multi-Head Classifier (Fine-Tuned) — CoreML
Fine-tuned CoreML version of XLM-RoBERTa-base with three classification heads for on-device multilingual text analysis on Apple Silicon. Performs sentiment analysis, multi-label tagging, and named entity recognition in a single forward pass.
## Model Details
- **Architecture:** XLM-RoBERTa-base (12 layers, 768 hidden, 12 heads) + 3 task-specific heads
- **Format:** CoreML `.mlpackage` (mlprogram)
- **Variants:** FP16 (529 MB), INT8 (266 MB)
- **Sequence length:** 128 tokens
- **Input:** Tokenized text (`input_ids` + `attention_mask`, int32)
- **Output:** Three tensors — sentiment logits, tag logits, NER logits
## Heads
### Sentiment (4 classes)
Single-label classification: `positive`, `neutral`, `risk`, `toxic`
### Tags (20 multi-label)
Multi-label phrase tagging: `stress_signal`, `confidence`, `emotional_state`, `trust_indicator`, `defensiveness`, `active_listening`, `rapport_building`, `conflict_signal`, `cooperation`, `clarification`, `pressure_tactic`, `concession`, `information_sharing`, `commitment`, `deadline_mention`, `deception_signal`, `manipulation`, `power_dynamic`, `agreement`, `problem_solving`
### NER (9 BIO labels, per-token)
Named entity recognition: `O`, `B-PER`, `I-PER`, `B-ORG`, `I-ORG`, `B-MONEY`, `I-MONEY`, `B-DATE`, `I-DATE`
## Training
- **Backbone LR:** 2e-5 with cosine decay + 10% linear warmup
- **Head LR:** 2e-4 (10x backbone)
- **Epochs:** 3 (best checkpoint at epoch 2)
- **Batch size:** 32
- **Device:** Apple M3 Max (MPS)
- **Training data:** RuSentiment (50K), GoEmotions mapped to 4-class (50K), WikiAnn-ru NER (50K), synthetic negotiation tags (1.7K)
- **Multi-task loss:** weighted CE (sentiment) + BCE (tags) + CE with ignore_index (NER)
## Metrics (Validation)
| Head | Metric | Value |
|------|--------|-------|
| Sentiment | Accuracy | **76.6%** |
| Tags | F1 (macro) | **57.0%** |
| NER | Accuracy | **94.8%** |
| Combined | Val Loss | 0.492 |
## Model Files
| File | Size | Description |
|------|------|-------------|
| `XLMRobertaMultiHead.mlpackage/` | 529 MB | FP16 model |
| `XLMRobertaMultiHead_INT8.mlpackage/` | 266 MB | INT8 quantized (recommended) |
| `label_definitions.json` | 1 KB | Label mappings for all heads |
| `config.json` | — | Model configuration |
## Usage
```swift
import CoreML
let model = try MLModel(contentsOf: modelURL)
// Prepare inputs (use XLM-RoBERTa tokenizer)
let inputArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
let maskArray = try MLMultiArray(shape: [1, 128], dataType: .int32)
// ... fill with tokenized text ...
let input = try MLDictionaryFeatureProvider(dictionary: [
"input_ids": MLFeatureValue(multiArray: inputArray),
"attention_mask": MLFeatureValue(multiArray: maskArray)
])
let output = try model.prediction(from: input)
let sentimentLogits = output.featureValue(for: "sentiment_logits")!.multiArrayValue!
let tagLogits = output.featureValue(for: "tag_logits")!.multiArrayValue!
let nerLogits = output.featureValue(for: "ner_logits")!.multiArrayValue!
```
## Tokenizer
This model uses the standard XLM-RoBERTa tokenizer from `FacebookAI/xlm-roberta-base`. CoreML does not include the tokenizer — use `tokenizers` library or bundle `tokenizer.json` separately.
## Attribution
Base model [XLM-RoBERTa](https://huggingface.co/FacebookAI/xlm-roberta-base) by Facebook AI. Fine-tuning on Russian/English datasets and CoreML conversion by [@smkrv](https://huggingface.co/smkrv).