File size: 5,495 Bytes
8a91a1c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79c582a
 
 
 
fddfffd
79c582a
fddfffd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8a91a1c
 
fddfffd
 
 
 
8a91a1c
 
fddfffd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
library_name: transformers
license: mit
base_model: jhu-clsp/mmBERT-small
tags:
- sentiment
- text-classification
- multilingual
- modernbert
- sentiment-analysis
- product-reviews
- place-reviews
- mmbert
metrics:
- f1
- precision
- recall
model-index:
- name: mmBERT-small-multilingual-sentiment
  results: []
datasets:
- clapAI/MultiLingualSentiment
language:
- en
- zh
- vi
- ko
- ja
- ar
- de
- es
- fr
- hi
- id
- it
- ms
- pt
- ru
- tr
pipeline_tag: text-classification
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# clapAI/mmBERT-small-multilingual-sentiment

## Introduction

**mmBERT-small-multilingual-sentiment** is a multilingual sentiment classification model, part of
the [Multilingual-Sentiment](https://huggingface.co/collections/clapAI/multilingual-sentiment-677416a6b23e03f52cb6cc3f)
collection.

The model is fine-tuned from [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) using the
multilingual sentiment
dataset [clapAI/MultiLingualSentiment](https://huggingface.co/datasets/clapAI/MultiLingualSentiment).

Model supports multilingual sentiment classification across 16+ languages, including English, Vietnamese, Chinese,
French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Arabic, and more.

## Key Highlights
> 📈 **Improved accuracy**: Achieves **F1 = 82.2**.  
> 📜 **Long context support**: Handles sequences up to **8192 tokens**.  
> 🪶 **Efficient size**: Only **140M parameters**, smaller than RoBERTa-base (278M) with better performance.  
> ⚡ **FlashAttention-2 support**: Enables much faster inference on modern GPUs.

## Evaluation & Performance

Results on the test split
of [clapAI/MultiLingualSentiment](https://huggingface.co/datasets/clapAI/MultiLingualSentiment)

|                                                      Model                                                      |                            Pretrained Model                            | Parameters | Context-length | F1-score |
|:---------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------:|:----------:|----------------|:--------:|
| [clapAI/mmBERT-small-multilingual-sentiment](https://huggingface.co/clapAI/mmBERT-small-multilingual-sentiment) | [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small)  |    140M    | 8192           | **82.2** |
| [modernBERT-base-multilingual-sentiment](https://huggingface.co/clapAI/modernBERT-base-multilingual-sentiment)  | [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)  |    150M    | 8192           |  80.16   |
|    [roberta-base-multilingual-sentiment](https://huggingface.co/clapAI/roberta-base-multilingual-sentiment)     | [XLM-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) |    278M    | 512            |   81.8   | 

## How to use

### Installation

```bash
pip install torch==2.8
pip install transformers==4.55.0
```

`Optional: accelerate inference with FlashAttention-2 (if supported by your GPU):`

```bash
pip install packaging==25.0 ninja==1.13.0
MAX_JOBS=4 pip install flash-attn==2.8.3 --no-build-isolation
```

### Example Usage

Try it on [Google Colab](https://colab.research.google.com/drive/1nsh22sEz0znV3OedE8RqA0dNoPOJjghJ?usp=sharing)

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_id = "clapAI/mmBERT-small-multilingual-sentiment"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    torch_dtype=dtype,
    # Uncomment if device supports FA2
    # attn_implementation="flash_attention_2" 
)

model.to(device)
model.eval()

# Retrieve labels from the model's configuration
id2label = model.config.id2label

texts = [
    "I absolutely love the new design of this app!",  # English
    "الخدمة كانت سيئة للغاية.",
    "Ich bin sehr zufrieden mit dem Kauf.",  # German
    "El producto llegó roto y no funciona.",  # Spanish
    "J'adore ce restaurant, la nourriture est délicieuse!",  # French
    "Makanannya benar-benar tidak enak.",  # Indonesian
    "この製品は本当に素晴らしいです!",  # Japanese
    "고객 서비스가 정말 실망스러웠어요.",  # Korean
    "Этот фильм просто потрясающий!",  # Russian
    "Tôi thực sự yêu thích sản phẩm này!",  # Vietnamese
    "质量真的很差。"  # Chinese
]

for text in texts:
    inputs = tokenizer(text, return_tensors="pt").to(device)
    with torch.inference_mode():
        outputs = model(**inputs)
        prediction = id2label[outputs.logits.argmax(dim=-1).item()]
    print(f"Text: {text} | Prediction: {prediction}")
```

## Citation

If you use this model, please consider citing:

```bibtex
@misc{clapAI_mmbert_small_multilingual_sentiment,
      title={mmBERT-small-multilingual-sentiment: A Multilingual Sentiment Classification Model},
      author={clapAI},
      howpublished={\url{https://huggingface.co/clapAI/mmBERT-small-multilingual-sentiment}},
      year={2025},
}