File size: 2,066 Bytes
50a6cb0 b123b3f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
model_name: FrenchTextCategorizer
language: French
tags:
- text-classification
- fine-tuned
- french
license: mit
dataset: "French News Dataset"
# 📝 Usage
This model is a **FLAUBERT** fine-tuned version to categorize French texts into the following categories:
> **CULTURE**, **DEBATS_ET_OPINIONS**, **ECONOMIE**, **EDUCATION**, **FAIT_DIVERS**, **INTERNATIONAL**, **LIFESTYLE**, **NUMERIQUE**, **POLITIQUE**, **RELIGION**, **SANTE**, **SCIENCE_ET_ENVIRONNEMENT**, **SOCIETE**, **SPORT**, **INDEFINI**
---
## 🚀 Quick Start
```python
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
model.eval()
```
---
## 🔎 Full Example (with Tokenizer, Prediction and Probabilities)
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import torch.nn.functional as F
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
tokenizer = AutoTokenizer.from_pretrained("juenp/FrenchTextCategorizer")
model.eval()
# Input text
text = "Ce film est un chef-d'œuvre incroyable, tout était parfait."
# Tokenize
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
inputs.pop("token_type_ids", None)
# Predict
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = F.softmax(logits, dim=-1)
predicted_class_idx = torch.argmax(probs, dim=-1).item()
# Decode predicted class from config
predicted_class = model.config.id2label[str(predicted_class_idx)]
prob_percentages = [round(p.item() * 100, 2) for p in probs[0]]
# Output
print(f"Texte : {text}")
print(f"Classe prédite : {predicted_class}")
print(f"Probabilités (%) : {prob_percentages}")
```
---
# 📋 Notes
- `model.config.id2label` is automatically loaded from the model's configuration (`config.json`).
- If you want to process multiple texts at once, simply pass a list of texts to the tokenizer.
---
# ✅ Ready for Inference!
|