juenp
/

FrenchTextCategorizer

Model card Files Files and versions

juenp commited on Apr 30, 2025

Commit

b123b3f

·

1 Parent(s): 3e7ab78

initial commit: add model

Files changed (3) hide show

README.md +68 -0
config.json +40 -0
pytorch_model.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# 📝 Usage
+This model is a **FLAUBERT** fine-tuned version to categorize French texts into the following categories:
+> **CULTURE**, **DEBATS_ET_OPINIONS**, **ECONOMIE**, **EDUCATION**, **FAIT_DIVERS**, **INTERNATIONAL**, **LIFESTYLE**, **NUMERIQUE**, **POLITIQUE**, **RELIGION**, **SANTE**, **SCIENCE_ET_ENVIRONNEMENT**, **SOCIETE**, **SPORT**, **INDEFINI**
+---
+## 🚀 Quick Start
+```python
+from transformers import AutoModelForSequenceClassification
+model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
+model.eval()
+```
+---
+## 🔎 Full Example (with Tokenizer, Prediction and Probabilities)
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+import torch.nn.functional as F
+# Load model and tokenizer
+model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
+tokenizer = AutoTokenizer.from_pretrained("juenp/FrenchTextCategorizer")
+model.eval()
+# Input text
+text = "Ce film est un chef-d'œuvre incroyable, tout était parfait."
+# Tokenize
+inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
+inputs.pop("token_type_ids", None)
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+logits = outputs.logits
+probs = F.softmax(logits, dim=-1)
+predicted_class_idx = torch.argmax(probs, dim=-1).item()
+# Decode predicted class from config
+predicted_class = model.config.id2label[str(predicted_class_idx)]
+prob_percentages = [round(p.item() * 100, 2) for p in probs[0]]
+# Output
+print(f"Texte : {text}")
+print(f"Classe prédite : {predicted_class}")
+print(f"Probabilités (%) : {prob_percentages}")
+```
+---
+# 📋 Notes
+- `model.config.id2label` is automatically loaded from the model's configuration (`config.json`).
+- If you want to process multiple texts at once, simply pass a list of texts to the tokenizer.
+---
+# ✅ Ready for Inference!

config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "architectures": ["AutoModelForSequenceClassification"],
+  "model_type": "flaubert",
+  "num_labels": 15,
+  "id2label": {
+    "0": "CULTURE",
+    "1": "DEBATS_ET_OPINIONS",
+    "2": "ECONOMIE",
+    "3": "EDUCATION",
+    "4": "FAIT_DIVERS",
+    "5": "INTERNATIONAL",
+    "6": "LIFESTYLE",
+    "7": "NUMERIQUE",
+    "8": "POLITIQUE",
+    "9": "RELIGION",
+    "10": "SANTE",
+    "11": "SCIENCE_ET_ENVIRONNEMENT",
+    "12": "SOCIETE",
+    "13": "SPORT",
+    "14": "INDEFINI"
+  },
+  "label2id": {
+    "CULTURE": 0,
+    "DEBATS_ET_OPINIONS": 1,
+    "ECONOMIE": 2,
+    "EDUCATION": 3,
+    "FAIT_DIVERS": 4,
+    "INTERNATIONAL": 5,
+    "LIFESTYLE": 6,
+    "NUMERIQUE": 7,
+    "POLITIQUE": 8,
+    "RELIGION": 9,
+    "SANTE": 10,
+    "SCIENCE_ET_ENVIRONNEMENT": 11,
+    "SOCIETE": 12,
+    "SPORT": 13,
+    "INDEFINI": 14
+  }
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d02f23076b19327716955de2f53867228cbb967551fa0005eb5d365a87284af9
+size 553806078