juenp's picture
add tokenizer
50a6cb0
|
raw
history blame
2.07 kB
model_name: FrenchTextCategorizer
language: French
tags:
- text-classification
- fine-tuned
- french
license: mit
dataset: "French News Dataset"
# 📝 Usage
This model is a **FLAUBERT** fine-tuned version to categorize French texts into the following categories:
> **CULTURE**, **DEBATS_ET_OPINIONS**, **ECONOMIE**, **EDUCATION**, **FAIT_DIVERS**, **INTERNATIONAL**, **LIFESTYLE**, **NUMERIQUE**, **POLITIQUE**, **RELIGION**, **SANTE**, **SCIENCE_ET_ENVIRONNEMENT**, **SOCIETE**, **SPORT**, **INDEFINI**
---
## 🚀 Quick Start
```python
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
model.eval()
```
---
## 🔎 Full Example (with Tokenizer, Prediction and Probabilities)
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import torch.nn.functional as F
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
tokenizer = AutoTokenizer.from_pretrained("juenp/FrenchTextCategorizer")
model.eval()
# Input text
text = "Ce film est un chef-d'œuvre incroyable, tout était parfait."
# Tokenize
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
inputs.pop("token_type_ids", None)
# Predict
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = F.softmax(logits, dim=-1)
predicted_class_idx = torch.argmax(probs, dim=-1).item()
# Decode predicted class from config
predicted_class = model.config.id2label[str(predicted_class_idx)]
prob_percentages = [round(p.item() * 100, 2) for p in probs[0]]
# Output
print(f"Texte : {text}")
print(f"Classe prédite : {predicted_class}")
print(f"Probabilités (%) : {prob_percentages}")
```
---
# 📋 Notes
- `model.config.id2label` is automatically loaded from the model's configuration (`config.json`).
- If you want to process multiple texts at once, simply pass a list of texts to the tokenizer.
---
# ✅ Ready for Inference!