exploring
#1
by
HermannS11
- opened
- README.md +7 -68
- config.json +2 -1
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -23,7 +23,6 @@ language:
|
|
| 23 |
- tl
|
| 24 |
- nl
|
| 25 |
- gsw
|
| 26 |
-
- sw
|
| 27 |
library_name: transformers
|
| 28 |
license: cc-by-nc-4.0
|
| 29 |
pipeline_tag: text-classification
|
|
@@ -40,50 +39,26 @@ tags:
|
|
| 40 |
- multilingual
|
| 41 |
- 🇪🇺
|
| 42 |
- region:eu
|
| 43 |
-
|
| 44 |
-
datasets:
|
| 45 |
-
- tabularisai/swahili_sentiment_dataset
|
| 46 |
---
|
| 47 |
|
| 48 |
|
| 49 |
-
# 🚀 Multilingual Sentiment Classification Model
|
| 50 |
|
| 51 |
<!-- TRY IT HERE: `coming soon`
|
| 52 |
-->
|
| 53 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/sznxwdqBXj)
|
| 54 |
|
| 55 |
|
| 56 |
-
# NEWS!
|
| 57 |
-
- 2025/8: Major model update +1 new language: **Swahili**! Also, general improvements accross all languages.
|
| 58 |
-
|
| 59 |
-
- 2025/8: Free DEMO API for our model! Please see below!
|
| 60 |
-
|
| 61 |
-
- 2025/7: We’ve just released ModernFinBERT, a model we’ve been working on for a while. It’s built on the ModernBERT architecture and trained on a mix of real and synthetic data, with LLM-based label correction applied to public datasets to fix human annotation errors.
|
| 62 |
-
It’s performing well across a range of benchmarks — in some cases improving accuracy by up to 48% over existing models like FinBERT.
|
| 63 |
-
You can check it out here on Hugging Face:
|
| 64 |
-
👉 https://huggingface.co/tabularisai/ModernFinBERT
|
| 65 |
|
| 66 |
- 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.
|
| 67 |
|
| 68 |
-
|
| 69 |
-
## 🔌 Hosted DEMO API
|
| 70 |
-
|
| 71 |
-
We provide a hosted inference API:
|
| 72 |
-
|
| 73 |
-
**Example request body:**
|
| 74 |
-
|
| 75 |
-
```json
|
| 76 |
-
curl -X POST https://api.tabularis.ai/ \
|
| 77 |
-
-H "Content-Type: application/json" \
|
| 78 |
-
-d '{"text":"I love the design","return_all_scores":false}'
|
| 79 |
-
|
| 80 |
-
```
|
| 81 |
-
|
| 82 |
## Model Details
|
| 83 |
- `Model Name:` tabularisai/multilingual-sentiment-analysis
|
| 84 |
- `Base Model:` distilbert/distilbert-base-multilingual-cased
|
| 85 |
- `Task:` Text Classification (Sentiment Analysis)
|
| 86 |
-
- `Languages:` Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch)
|
| 87 |
- `Number of Classes:` 5 (*Very Negative, Negative, Neutral, Positive, Very Positive*)
|
| 88 |
- `Usage:`
|
| 89 |
- Social media analysis
|
|
@@ -94,9 +69,6 @@ curl -X POST https://api.tabularis.ai/ \
|
|
| 94 |
- Customer service optimization
|
| 95 |
- Competitive intelligence
|
| 96 |
|
| 97 |
-
> If you wish to use this model for commercial purposes, please obtain a license by contacting: info@tabularis.ai
|
| 98 |
-
|
| 99 |
-
|
| 100 |
## Model Description
|
| 101 |
|
| 102 |
This model is a fine-tuned version of `distilbert/distilbert-base-multilingual-cased` for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.
|
|
@@ -206,45 +178,12 @@ for text, sentiment in zip(texts, predict_sentiment(texts)):
|
|
| 206 |
Synthetic data reduces bias, but validation in real-world scenarios is advised.
|
| 207 |
|
| 208 |
## Citation
|
| 209 |
-
```
|
| 210 |
-
|
| 211 |
-
author = { tabularisai and Samuel Gyamfi and Vadim Borisov and Richard H. Schreiber },
|
| 212 |
-
title = { multilingual-sentiment-analysis (Revision 69afb83) },
|
| 213 |
-
year = 2025,
|
| 214 |
-
url = { https://huggingface.co/tabularisai/multilingual-sentiment-analysis },
|
| 215 |
-
doi = { 10.57967/hf/5968 },
|
| 216 |
-
publisher = { Hugging Face }
|
| 217 |
-
}
|
| 218 |
```
|
| 219 |
|
| 220 |
## Contact
|
| 221 |
|
| 222 |
For inquiries, data, private APIs, better models, contact info@tabularis.ai
|
| 223 |
|
| 224 |
-
tabularis.ai
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
<table align="center">
|
| 228 |
-
<tr>
|
| 229 |
-
<td align="center">
|
| 230 |
-
<a href="https://www.linkedin.com/company/tabularis-ai/">
|
| 231 |
-
<img src="https://cdn.jsdelivr.net/gh/simple-icons/simple-icons/icons/linkedin.svg" alt="LinkedIn" width="30" height="30">
|
| 232 |
-
</a>
|
| 233 |
-
</td>
|
| 234 |
-
<td align="center">
|
| 235 |
-
<a href="https://x.com/tabularis_ai">
|
| 236 |
-
<img src="https://cdn.jsdelivr.net/gh/simple-icons/simple-icons/icons/x.svg" alt="X" width="30" height="30">
|
| 237 |
-
</a>
|
| 238 |
-
</td>
|
| 239 |
-
<td align="center">
|
| 240 |
-
<a href="https://github.com/tabularis-ai">
|
| 241 |
-
<img src="https://cdn.jsdelivr.net/gh/simple-icons/simple-icons/icons/github.svg" alt="GitHub" width="30" height="30">
|
| 242 |
-
</a>
|
| 243 |
-
</td>
|
| 244 |
-
<td align="center">
|
| 245 |
-
<a href="https://tabularis.ai">
|
| 246 |
-
<img src="https://cdn.jsdelivr.net/gh/simple-icons/simple-icons/icons/internetarchive.svg" alt="Website" width="30" height="30">
|
| 247 |
-
</a>
|
| 248 |
-
</td>
|
| 249 |
-
</tr>
|
| 250 |
-
</table>
|
|
|
|
| 23 |
- tl
|
| 24 |
- nl
|
| 25 |
- gsw
|
|
|
|
| 26 |
library_name: transformers
|
| 27 |
license: cc-by-nc-4.0
|
| 28 |
pipeline_tag: text-classification
|
|
|
|
| 39 |
- multilingual
|
| 40 |
- 🇪🇺
|
| 41 |
- region:eu
|
| 42 |
+
|
|
|
|
|
|
|
| 43 |
---
|
| 44 |
|
| 45 |
|
| 46 |
+
# 🚀 distilbert-based Multilingual Sentiment Classification Model
|
| 47 |
|
| 48 |
<!-- TRY IT HERE: `coming soon`
|
| 49 |
-->
|
| 50 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/sznxwdqBXj)
|
| 51 |
|
| 52 |
|
| 53 |
+
# NEWS!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
- 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
## Model Details
|
| 58 |
- `Model Name:` tabularisai/multilingual-sentiment-analysis
|
| 59 |
- `Base Model:` distilbert/distilbert-base-multilingual-cased
|
| 60 |
- `Task:` Text Classification (Sentiment Analysis)
|
| 61 |
+
- `Languages:` Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).
|
| 62 |
- `Number of Classes:` 5 (*Very Negative, Negative, Neutral, Positive, Very Positive*)
|
| 63 |
- `Usage:`
|
| 64 |
- Social media analysis
|
|
|
|
| 69 |
- Customer service optimization
|
| 70 |
- Competitive intelligence
|
| 71 |
|
|
|
|
|
|
|
|
|
|
| 72 |
## Model Description
|
| 73 |
|
| 74 |
This model is a fine-tuned version of `distilbert/distilbert-base-multilingual-cased` for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.
|
|
|
|
| 178 |
Synthetic data reduces bias, but validation in real-world scenarios is advised.
|
| 179 |
|
| 180 |
## Citation
|
| 181 |
+
```
|
| 182 |
+
Will be included.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 183 |
```
|
| 184 |
|
| 185 |
## Contact
|
| 186 |
|
| 187 |
For inquiries, data, private APIs, better models, contact info@tabularis.ai
|
| 188 |
|
| 189 |
+
tabularis.ai
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
|
@@ -1,4 +1,5 @@
|
|
| 1 |
{
|
|
|
|
| 2 |
"activation": "gelu",
|
| 3 |
"architectures": [
|
| 4 |
"DistilBertForSequenceClassification"
|
|
@@ -34,6 +35,6 @@
|
|
| 34 |
"sinusoidal_pos_embds": false,
|
| 35 |
"tie_weights_": true,
|
| 36 |
"torch_dtype": "float32",
|
| 37 |
-
"transformers_version": "4.
|
| 38 |
"vocab_size": 119547
|
| 39 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"_name_or_path": "results/checkpoint-1400_best",
|
| 3 |
"activation": "gelu",
|
| 4 |
"architectures": [
|
| 5 |
"DistilBertForSequenceClassification"
|
|
|
|
| 35 |
"sinusoidal_pos_embds": false,
|
| 36 |
"tie_weights_": true,
|
| 37 |
"torch_dtype": "float32",
|
| 38 |
+
"transformers_version": "4.46.3",
|
| 39 |
"vocab_size": 119547
|
| 40 |
}
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 541326604
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3bb33a58e6056036c2b396c6971d3c7ebe916c7f2d7fb5bb46aa319ed3288ff8
|
| 3 |
size 541326604
|