File size: 6,185 Bytes

50f8fb6

---
license: apache-2.0
language:
  - en
  - zh
  - ja
  - de
  - fr
  - es
tags:
  - finance
  - sentiment-analysis
  - multilingual
  - xlm-roberta
  - finbert
datasets:
  - Kenpache/multilingual-financial-sentiment
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
model-index:
  - name: FinBERT-Multilingual
    results:
      - task:
          type: text-classification
          name: Financial Sentiment Analysis
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.8103
          - name: F1 (weighted)
            type: f1
            value: 0.8102
---

# FinBERT-Multilingual

A multilingual extension of the FinBERT paradigm: domain-adapted transformer for financial sentiment classification across six languages (EN, ZH, JA, DE, FR, ES).

While the original [FinBERT](https://arxiv.org/abs/1908.10063) demonstrated the effectiveness of domain-specific pre-training for English financial NLP, this model extends that approach to a multilingual setting using XLM-RoBERTa-base as the backbone, enabling cross-lingual financial sentiment analysis without language-specific models.

## Model Architecture

- **Base model:** `xlm-roberta-base` (278M parameters)
- **Task:** 3-class sequence classification (Negative / Neutral / Positive)
- **Domain adaptation:** Task-Adaptive Pre-Training (TAPT) via Masked Language Modeling on 35K+ financial texts
- **Languages:** English, Chinese, Japanese, German, French, Spanish

## Training Pipeline

### Stage 1: Task-Adaptive Pre-Training (TAPT)

Following [Gururangan et al. (2020)](https://arxiv.org/abs/2004.10964), we perform continued MLM pre-training on the unlabeled financial corpus to adapt the model's representations to the financial domain. This stage exposes the model to domain-specific vocabulary and discourse patterns across all six target languages using approximately 35,000 financial text samples.

### Stage 2: Supervised Fine-Tuning

The domain-adapted model is then fine-tuned on the labeled sentiment classification task.

**Hyperparameters:**

| Parameter | Value |
|---|---|
| Learning rate | 2e-5 |
| LR scheduler | Cosine annealing |
| Label smoothing | 0.1 |
| Checkpoint selection | SWA (top-3 checkpoints) |
| Base model | xlm-roberta-base |

**Stochastic Weight Averaging (SWA):** Rather than selecting a single best checkpoint, we average the weights of the top-3 performing checkpoints. This produces a flatter loss minimum and more robust generalization, particularly beneficial for multilingual settings where overfitting to dominant languages is a risk.

**Label smoothing (0.1):** Prevents overconfident predictions and improves calibration, which is important for financial applications where prediction confidence informs downstream decisions.

## Evaluation Results

### Overall Metrics

| Metric | Score |
|---|---|
| Accuracy | 0.8103 |
| F1 (weighted) | 0.8102 |
| Precision (weighted) | 0.8111 |
| Recall (weighted) | 0.8103 |

### Per-Class Performance

| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Negative | 0.78 | 0.83 | 0.81 |
| Neutral | 0.83 | 0.79 | 0.81 |
| Positive | 0.80 | 0.82 | 0.81 |

The balanced per-class performance (all F1 scores at 0.81) indicates that the model does not exhibit significant class bias, despite the imbalanced training distribution (Neutral: 45.5%, Positive: 30.8%, Negative: 23.7%).

## Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="Kenpache/finbert-multilingual")

# English
classifier("The company reported record quarterly earnings, driven by strong demand.")
# [{'label': 'positive', 'score': 0.95}]

# German
classifier("Die Aktie verlor nach der Gewinnwarnung deutlich an Wert.")
# [{'label': 'negative', 'score': 0.92}]

# Japanese
classifier("同社の売上高は前年同期比で横ばいとなった。")
# [{'label': 'neutral', 'score': 0.88}]

# Chinese
classifier("该公司宣布大规模裁员计划，股价应声下跌。")
# [{'label': 'negative', 'score': 0.91}]
```

### Direct Model Loading

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("Kenpache/finbert-multilingual")
model = AutoModelForSequenceClassification.from_pretrained("Kenpache/finbert-multilingual")

text = "Les bénéfices du groupe ont augmenté de 15% au premier trimestre."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    pred = torch.argmax(probs, dim=-1).item()

labels = {0: "negative", 1: "neutral", 2: "positive"}
print(f"Prediction: {labels[pred]} ({probs[0][pred]:.4f})")
```

## Training Data

The model was trained on [Kenpache/multilingual-financial-sentiment](https://huggingface.co/datasets/Kenpache/multilingual-financial-sentiment), a curated dataset of ~39K financial news sentences from 80+ sources across six languages.

| Language | Samples | Sources |
|---|---|---|
| Japanese | 8,287 | Nikkei, Nikkan Kogyo, Reuters JP, Minkabu, etc. |
| Chinese | 7,930 | Sina Finance, EastMoney, 10jqka, etc. |
| Spanish | 7,125 | Expansión, Cinco Días, Bloomberg Línea, etc. |
| English | 6,887 | CNBC, Yahoo Finance, Fortune, Benzinga, etc. |
| German | 5,023 | Börse.de, FAZ, NTV Börse, Handelsblatt, etc. |
| French | 3,935 | Boursorama, Tradingsat, BFM Business, etc. |

## Comparison with FinBERT

| Feature | FinBERT | FinBERT-Multilingual |
|---|---|---|
| Base model | BERT-base | XLM-RoBERTa-base |
| Languages | English only | 6 languages |
| Domain adaptation | Financial corpus pre-training | TAPT on multilingual financial texts |
| Classes | 3 (Pos/Neg/Neu) | 3 (Pos/Neg/Neu) |
| Checkpoint selection | Single best | SWA (top-3) |

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{finbert-multilingual-2025,
  title={FinBERT-Multilingual: Cross-Lingual Financial Sentiment Analysis with Domain-Adapted XLM-RoBERTa},
  author={Kenpache},
  year={2025},
  url={https://huggingface.co/Kenpache/finbert-multilingual}
}
```

## License

Apache 2.0