Keshav0308's picture
Update README.md
57358ac verified
---
language:
- multilingual
license: mit
tags:
- text-classification
- multilingual
- xlm-roberta
- topic-classification
datasets:
- Davlan/sib200
metrics:
- accuracy
- f1
---
# 🌍 Multilingual Topic Classifier
A multilingual text classification model fine-tuned on the SIB-200 dataset, capable of classifying text into 7 topics across **205 languages**.
## Model Details
- **Base model:** xlm-roberta-base
- **Task:** Text Classification (Topic)
- **Languages:** 205
- **Developed by:** Keshav0308
## Topics
| Label | Description |
|-------|-------------|
| 🌍 geography | Geographic content |
| 🔬 science/technology | Science and tech content |
| 🎬 entertainment | Entertainment content |
| 🏛️ politics | Political content |
| 🏥 health | Health and medical content |
| ✈️ travel | Travel content |
| ⚽ sports | Sports content |
## Performance
| Metric | Score |
|--------|-------|
| Test Accuracy | 69.17% |
| Test F1 Macro | 67.62% |
| Languages | 205 |
## Usage
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Keshav0308/multilingual-topic-classifier"
)
# Works in any language!
classifier("The patient was diagnosed with pneumonia.")
# {'label': 'health', 'score': 0.999}
classifier("El equipo ganó el campeonato mundial de fútbol.")
# {'label': 'sports', 'score': 0.999}
```
## Training Data
Fine-tuned on [SIB-200](https://huggingface.co/datasets/Davlan/sib200) — a massively multilingual dataset with 205 languages.
- Train samples: 143,705
- Validation samples: 20,295
- Test samples: 41,820