Keshav0308
/

multilingual-topic-classifier

Text Classification

topic-classification

Model card Files Files and versions

multilingual-topic-classifier / README.md

Keshav0308's picture

Update README.md

57358ac verified 21 days ago

|

history blame contribute delete

1.6 kB

	---
	language:
	- multilingual
	license: mit
	tags:
	- text-classification
	- multilingual
	- xlm-roberta
	- topic-classification
	datasets:
	- Davlan/sib200
	metrics:
	- accuracy
	- f1
	---

	# 🌍 Multilingual Topic Classifier

	A multilingual text classification model fine-tuned on the SIB-200 dataset, capable of classifying text into 7 topics across 205 languages.

	## Model Details
	- Base model: xlm-roberta-base
	- Task: Text Classification (Topic)
	- Languages: 205
	- Developed by: Keshav0308

	## Topics
	\| Label \| Description \|
	\|-------\|-------------\|
	\| 🌍 geography \| Geographic content \|
	\| 🔬 science/technology \| Science and tech content \|
	\| 🎬 entertainment \| Entertainment content \|
	\| 🏛️ politics \| Political content \|
	\| 🏥 health \| Health and medical content \|
	\| ✈️ travel \| Travel content \|
	\| ⚽ sports \| Sports content \|

	## Performance
	\| Metric \| Score \|
	\|--------\|-------\|
	\| Test Accuracy \| 69.17% \|
	\| Test F1 Macro \| 67.62% \|
	\| Languages \| 205 \|

	## Usage

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="Keshav0308/multilingual-topic-classifier"
	)

	# Works in any language!
	classifier("The patient was diagnosed with pneumonia.")
	# {'label': 'health', 'score': 0.999}

	classifier("El equipo ganó el campeonato mundial de fútbol.")
	# {'label': 'sports', 'score': 0.999}
	```

	## Training Data
	Fine-tuned on [SIB-200](https://huggingface.co/datasets/Davlan/sib200) — a massively multilingual dataset with 205 languages.

	- Train samples: 143,705
	- Validation samples: 20,295
	- Test samples: 41,820