File size: 2,066 Bytes
50a6cb0
 
 
 
 
 
 
 
 
b123b3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
model_name: FrenchTextCategorizer
language: French
tags:
  - text-classification
  - fine-tuned
  - french
license: mit
dataset: "French News Dataset"


# 📝 Usage

This model is a **FLAUBERT** fine-tuned version to categorize French texts into the following categories:

> **CULTURE**, **DEBATS_ET_OPINIONS**, **ECONOMIE**, **EDUCATION**, **FAIT_DIVERS**, **INTERNATIONAL**, **LIFESTYLE**, **NUMERIQUE**, **POLITIQUE**, **RELIGION**, **SANTE**, **SCIENCE_ET_ENVIRONNEMENT**, **SOCIETE**, **SPORT**, **INDEFINI**

---

## 🚀 Quick Start

```python
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
model.eval()
```

---

## 🔎 Full Example (with Tokenizer, Prediction and Probabilities)

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import torch.nn.functional as F

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
tokenizer = AutoTokenizer.from_pretrained("juenp/FrenchTextCategorizer")
model.eval()

# Input text
text = "Ce film est un chef-d'œuvre incroyable, tout était parfait."

# Tokenize
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
inputs.pop("token_type_ids", None)

# Predict
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
probs = F.softmax(logits, dim=-1)
predicted_class_idx = torch.argmax(probs, dim=-1).item()

# Decode predicted class from config
predicted_class = model.config.id2label[str(predicted_class_idx)]
prob_percentages = [round(p.item() * 100, 2) for p in probs[0]]

# Output
print(f"Texte : {text}")
print(f"Classe prédite : {predicted_class}")
print(f"Probabilités (%) : {prob_percentages}")
```

---

# 📋 Notes

- `model.config.id2label` is automatically loaded from the model's configuration (`config.json`).
- If you want to process multiple texts at once, simply pass a list of texts to the tokenizer.

---

# ✅ Ready for Inference!