Kenpache
/

finbert-multilingual

+---
+license: apache-2.0
+language:
+  - en
+  - zh
+  - ja
+  - de
+  - fr
+  - es
+tags:
+  - finance
+  - sentiment-analysis
+  - multilingual
+  - xlm-roberta
+  - finbert
+datasets:
+  - Kenpache/multilingual-financial-sentiment
+metrics:
+  - accuracy
+  - f1
+pipeline_tag: text-classification
+model-index:
+  - name: FinBERT-Multilingual
+    results:
+      - task:
+          type: text-classification
+          name: Financial Sentiment Analysis
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 0.8103
+          - name: F1 (weighted)
+            type: f1
+            value: 0.8102
+---
+# FinBERT-Multilingual
+A multilingual extension of the FinBERT paradigm: domain-adapted transformer for financial sentiment classification across six languages (EN, ZH, JA, DE, FR, ES).
+While the original [FinBERT](https://arxiv.org/abs/1908.10063) demonstrated the effectiveness of domain-specific pre-training for English financial NLP, this model extends that approach to a multilingual setting using XLM-RoBERTa-base as the backbone, enabling cross-lingual financial sentiment analysis without language-specific models.
+## Model Architecture
+- **Base model:** `xlm-roberta-base` (278M parameters)
+- **Task:** 3-class sequence classification (Negative / Neutral / Positive)
+- **Domain adaptation:** Task-Adaptive Pre-Training (TAPT) via Masked Language Modeling on 35K+ financial texts
+- **Languages:** English, Chinese, Japanese, German, French, Spanish
+## Training Pipeline
+### Stage 1: Task-Adaptive Pre-Training (TAPT)
+Following [Gururangan et al. (2020)](https://arxiv.org/abs/2004.10964), we perform continued MLM pre-training on the unlabeled financial corpus to adapt the model's representations to the financial domain. This stage exposes the model to domain-specific vocabulary and discourse patterns across all six target languages using approximately 35,000 financial text samples.
+### Stage 2: Supervised Fine-Tuning
+The domain-adapted model is then fine-tuned on the labeled sentiment classification task.
+**Hyperparameters:**
+| Parameter | Value |
+|---|---|
+| Learning rate | 2e-5 |
+| LR scheduler | Cosine annealing |
+| Label smoothing | 0.1 |
+| Checkpoint selection | SWA (top-3 checkpoints) |
+| Base model | xlm-roberta-base |
+**Stochastic Weight Averaging (SWA):** Rather than selecting a single best checkpoint, we average the weights of the top-3 performing checkpoints. This produces a flatter loss minimum and more robust generalization, particularly beneficial for multilingual settings where overfitting to dominant languages is a risk.
+**Label smoothing (0.1):** Prevents overconfident predictions and improves calibration, which is important for financial applications where prediction confidence informs downstream decisions.
+## Evaluation Results
+### Overall Metrics
+| Metric | Score |
+|---|---|
+| Accuracy | 0.8103 |
+| F1 (weighted) | 0.8102 |
+| Precision (weighted) | 0.8111 |
+| Recall (weighted) | 0.8103 |
+### Per-Class Performance
+| Class | Precision | Recall | F1-Score |
+|---|---|---|---|
+| Negative | 0.78 | 0.83 | 0.81 |
+| Neutral | 0.83 | 0.79 | 0.81 |
+| Positive | 0.80 | 0.82 | 0.81 |
+The balanced per-class performance (all F1 scores at 0.81) indicates that the model does not exhibit significant class bias, despite the imbalanced training distribution (Neutral: 45.5%, Positive: 30.8%, Negative: 23.7%).
+## Usage
+```python
+from transformers import pipeline
+classifier = pipeline("text-classification", model="Kenpache/finbert-multilingual")
+# English
+classifier("The company reported record quarterly earnings, driven by strong demand.")
+# [{'label': 'positive', 'score': 0.95}]
+# German
+classifier("Die Aktie verlor nach der Gewinnwarnung deutlich an Wert.")
+# [{'label': 'negative', 'score': 0.92}]
+# Japanese
+classifier("同社の売上高は前年同期比で横ばいとなった。")
+# [{'label': 'neutral', 'score': 0.88}]
+# Chinese
+classifier("该公司宣布大规模裁员计划，股价应声下跌。")
+# [{'label': 'negative', 'score': 0.91}]
+```
+### Direct Model Loading
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+tokenizer = AutoTokenizer.from_pretrained("Kenpache/finbert-multilingual")
+model = AutoModelForSequenceClassification.from_pretrained("Kenpache/finbert-multilingual")
+text = "Les bénéfices du groupe ont augmenté de 15% au premier trimestre."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model(**inputs)
+    probs = torch.softmax(outputs.logits, dim=-1)
+    pred = torch.argmax(probs, dim=-1).item()
+labels = {0: "negative", 1: "neutral", 2: "positive"}
+print(f"Prediction: {labels[pred]} ({probs[0][pred]:.4f})")
+```
+## Training Data
+The model was trained on [Kenpache/multilingual-financial-sentiment](https://huggingface.co/datasets/Kenpache/multilingual-financial-sentiment), a curated dataset of ~39K financial news sentences from 80+ sources across six languages.
+| Language | Samples | Sources |
+|---|---|---|
+| Japanese | 8,287 | Nikkei, Nikkan Kogyo, Reuters JP, Minkabu, etc. |
+| Chinese | 7,930 | Sina Finance, EastMoney, 10jqka, etc. |
+| Spanish | 7,125 | Expansión, Cinco Días, Bloomberg Línea, etc. |
+| English | 6,887 | CNBC, Yahoo Finance, Fortune, Benzinga, etc. |
+| German | 5,023 | Börse.de, FAZ, NTV Börse, Handelsblatt, etc. |
+| French | 3,935 | Boursorama, Tradingsat, BFM Business, etc. |
+## Comparison with FinBERT
+| Feature | FinBERT | FinBERT-Multilingual |
+|---|---|---|
+| Base model | BERT-base | XLM-RoBERTa-base |
+| Languages | English only | 6 languages |
+| Domain adaptation | Financial corpus pre-training | TAPT on multilingual financial texts |
+| Classes | 3 (Pos/Neg/Neu) | 3 (Pos/Neg/Neu) |
+| Checkpoint selection | Single best | SWA (top-3) |
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{finbert-multilingual-2025,
+  title={FinBERT-Multilingual: Cross-Lingual Financial Sentiment Analysis with Domain-Adapted XLM-RoBERTa},
+  author={Kenpache},
+  year={2025},
+  url={https://huggingface.co/Kenpache/finbert-multilingual}
+}
+```
+## License
+Apache 2.0