Kenpache commited on
Commit
af07ccd
·
verified ·
1 Parent(s): a6d8bbf

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +113 -20
README.md CHANGED
@@ -12,6 +12,9 @@ tags:
12
  - sentiment-analysis
13
  - multilingual
14
  - xlm-roberta
 
 
 
15
  datasets:
16
  - Kenpache/multilingual-financial-sentiment
17
  metrics:
@@ -33,13 +36,13 @@ model-index:
33
  value: 0.8102
34
  ---
35
 
36
- # FLAME - Financial Language Analysis for Multilingual Economics
37
 
38
  **One model. Six languages. Real financial sentiment.**
39
 
40
- FLAME classifies financial text as **Negative**, **Neutral**, or **Positive** across English, Chinese, Japanese, German, French, and Spanish.
41
 
42
- Built on XLM-RoBERTa-base, domain-adapted on 35K+ financial texts, fine-tuned on ~39K multilingual financial news samples.
43
 
44
  ## Quick Start
45
 
@@ -48,39 +51,129 @@ from transformers import pipeline
48
 
49
  classifier = pipeline("text-classification", model="Kenpache/flame")
50
 
51
- classifier("Revenue surged 40% year-over-year, beating analyst expectations.")
52
- # [{'label': 'positive', 'score': 0.96}]
 
53
 
54
- classifier("La empresa reportó pérdidas significativas este trimestre.")
55
- # [{'label': 'negative', 'score': 0.93}]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ```
57
 
58
  ## Results
59
 
60
  | Metric | Score |
61
  |---|---|
62
- | Accuracy | **0.8103** |
63
- | F1 (weighted) | **0.8102** |
64
- | Precision | **0.8111** |
65
- | Recall | **0.8103** |
66
 
67
- | Class | Precision | Recall | F1 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  |---|---|---|---|
69
- | Negative | 0.78 | 0.83 | 0.81 |
70
- | Neutral | 0.83 | 0.79 | 0.81 |
71
- | Positive | 0.80 | 0.82 | 0.81 |
 
 
 
72
 
73
- ## Languages
74
 
75
- EN | ZH | JA | DE | FR | ES
 
 
 
 
 
76
 
77
- ## Training
78
 
79
- XLM-RoBERTa-base + Task-Adaptive Pre-Training (MLM) + fine-tuning with label smoothing, cosine LR schedule, and SWA checkpoint averaging.
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
  ## Dataset
82
 
83
- [Kenpache/multilingual-financial-sentiment](https://huggingface.co/datasets/Kenpache/multilingual-financial-sentiment) -- ~39K samples from CNBC, Yahoo Finance, Reuters, Nikkei, Sina Finance, and 80+ other financial news sources.
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  ## License
86
 
 
12
  - sentiment-analysis
13
  - multilingual
14
  - xlm-roberta
15
+ - financial-nlp
16
+ - stock-market
17
+ - trading
18
  datasets:
19
  - Kenpache/multilingual-financial-sentiment
20
  metrics:
 
36
  value: 0.8102
37
  ---
38
 
39
+ # FLAME Financial Language Analysis for Multilingual Economics
40
 
41
  **One model. Six languages. Real financial sentiment.**
42
 
43
+ FLAME classifies financial text as **Negative**, **Neutral**, or **Positive** across English, Chinese, Japanese, German, French, and Spanish — in a single model, no language detection needed.
44
 
45
+ Built on XLM-RoBERTa with domain-adaptive pretraining on 35K+ financial texts, then fine-tuned on ~39K real financial news samples from 80+ sources worldwide.
46
 
47
  ## Quick Start
48
 
 
51
 
52
  classifier = pipeline("text-classification", model="Kenpache/flame")
53
 
54
+ # English
55
+ classifier("Apple reported record quarterly revenue of $124 billion, up 11% year over year.")
56
+ # [{'label': 'Positive', 'score': 0.96}]
57
 
58
+ # Chinese
59
+ classifier("该公司季度亏损扩大至5亿美元,远超市场预期。")
60
+ # [{'label': 'Negative', 'score': 0.94}]
61
+
62
+ # Japanese
63
+ classifier("トヨタ自動車の営業利益は前年同期比30%増の1兆円を突破した。")
64
+ # [{'label': 'Positive', 'score': 0.95}]
65
+
66
+ # German
67
+ classifier("Die Aktie verlor nach der Gewinnwarnung deutlich an Wert.")
68
+ # [{'label': 'Negative', 'score': 0.92}]
69
+
70
+ # French
71
+ classifier("Le chiffre d'affaires du groupe a progressé de 8% au premier semestre.")
72
+ # [{'label': 'Positive', 'score': 0.93}]
73
+
74
+ # Spanish
75
+ classifier("Las acciones de la empresa se mantuvieron estables tras la publicación de resultados.")
76
+ # [{'label': 'Neutral', 'score': 0.89}]
77
+ ```
78
+
79
+ ## Batch Processing
80
+
81
+ ```python
82
+ from transformers import pipeline
83
+
84
+ classifier = pipeline("text-classification", model="Kenpache/flame", device=0)
85
+
86
+ texts = [
87
+ "Stocks rallied after the Fed signaled a pause in rate hikes.",
88
+ "The company filed for Chapter 11 bankruptcy protection.",
89
+ "Q3 earnings were in line with analyst expectations.",
90
+ "日経平均株価が3万円台を回復した。",
91
+ "Les marchés européens ont clôturé en forte baisse.",
92
+ "El beneficio neto de la compañía creció un 25% interanual.",
93
+ ]
94
+
95
+ results = classifier(texts, batch_size=32)
96
+ for text, result in zip(texts, results):
97
+ print(f"{result['label']:>8} ({result['score']:.2f}) {text[:70]}")
98
  ```
99
 
100
  ## Results
101
 
102
  | Metric | Score |
103
  |---|---|
104
+ | **Accuracy** | **0.8103** |
105
+ | **F1 (weighted)** | **0.8102** |
106
+ | **Precision (weighted)** | **0.8111** |
107
+ | **Recall (weighted)** | **0.8103** |
108
 
109
+ ### Per-Class Performance
110
+
111
+ | Class | Precision | Recall | F1 | Support |
112
+ |---|---|---|---|---|
113
+ | Negative | 0.78 | 0.83 | 0.81 | 917 |
114
+ | Neutral | 0.83 | 0.79 | 0.81 | 1,779 |
115
+ | Positive | 0.80 | 0.82 | 0.81 | 1,225 |
116
+
117
+ All three classes achieve balanced F1=0.81, even with imbalanced training data (Neutral 45%, Positive 31%, Negative 24%).
118
+
119
+ ## Labels
120
+
121
+ | Label | ID | What it captures |
122
+ |---|---|---|
123
+ | **Negative** | 0 | Losses, decline, bearish signals, layoffs, bankruptcy |
124
+ | **Neutral** | 1 | Factual statements, announcements, no clear sentiment |
125
+ | **Positive** | 2 | Growth, gains, bullish signals, record earnings, upgrades |
126
+
127
+ ## Supported Languages
128
+
129
+ | Language | Code | Training Samples | Key Sources |
130
  |---|---|---|---|
131
+ | Japanese | JA | 8,287 | Nikkei, Nikkan Kogyo, Reuters JP |
132
+ | Chinese | ZH | 7,930 | Sina Finance, EastMoney, 10jqka |
133
+ | Spanish | ES | 7,125 | Expansión, Cinco Días, Bloomberg Línea |
134
+ | English | EN | 6,887 | CNBC, Yahoo Finance, Fortune, Reuters |
135
+ | German | DE | 5,023 | Börse.de, FAZ, NTV Börse |
136
+ | French | FR | 3,935 | Boursorama, Tradingsat, BFM Business |
137
 
138
+ ## Use Cases
139
 
140
+ - **News Monitoring** classify sentiment of financial headlines across global markets in real time
141
+ - **Trading Signals** — feed sentiment scores into quantitative trading strategies
142
+ - **Portfolio Risk** — monitor sentiment shifts across international holdings
143
+ - **Earnings Analysis** — analyze tone of corporate press releases and earnings calls
144
+ - **Social Media** — track financial discussions on multilingual platforms
145
+ - **Research** — cross-language sentiment studies in financial NLP
146
 
147
+ ## How It Was Built
148
 
149
+ 1. **Domain Adaptation (TAPT):** Masked Language Modeling on 35K+ financial texts across 6 languages the model learns financial vocabulary and patterns before seeing any labels.
150
+
151
+ 2. **Fine-Tuning:** Supervised classification with label smoothing (0.1), cosine LR schedule (2e-5), and Stochastic Weight Averaging of top-3 checkpoints for robust generalization.
152
+
153
+ | Parameter | Value |
154
+ |---|---|
155
+ | Base model | xlm-roberta-base (278M params) |
156
+ | Learning rate | 2e-5 |
157
+ | Scheduler | Cosine |
158
+ | Label smoothing | 0.1 |
159
+ | Effective batch size | 64 |
160
+ | Precision | FP16 |
161
+ | Post-processing | SWA (top-3 checkpoints) |
162
 
163
  ## Dataset
164
 
165
+ Trained on [Kenpache/multilingual-financial-sentiment](https://huggingface.co/datasets/Kenpache/multilingual-financial-sentiment) ~39K curated financial news samples from 80+ real sources worldwide.
166
+
167
+ ## Citation
168
+
169
+ ```bibtex
170
+ @misc{flame2025,
171
+ title={FLAME: Financial Language Analysis for Multilingual Economics},
172
+ author={Kenpache},
173
+ year={2025},
174
+ url={https://huggingface.co/Kenpache/flame}
175
+ }
176
+ ```
177
 
178
  ## License
179