| model_name: FrenchTextCategorizer | |
| language: French | |
| tags: | |
| - text-classification | |
| - fine-tuned | |
| - french | |
| license: mit | |
| dataset: "French News Dataset" | |
| # 📝 Usage | |
| This model is a **FLAUBERT** fine-tuned version to categorize French texts into the following categories: | |
| > **CULTURE**, **DEBATS_ET_OPINIONS**, **ECONOMIE**, **EDUCATION**, **FAIT_DIVERS**, **INTERNATIONAL**, **LIFESTYLE**, **NUMERIQUE**, **POLITIQUE**, **RELIGION**, **SANTE**, **SCIENCE_ET_ENVIRONNEMENT**, **SOCIETE**, **SPORT**, **INDEFINI** | |
| --- | |
| ## 🚀 Quick Start | |
| ```python | |
| from transformers import AutoModelForSequenceClassification | |
| model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer") | |
| model.eval() | |
| ``` | |
| --- | |
| ## 🔎 Full Example (with Tokenizer, Prediction and Probabilities) | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| import torch | |
| import torch.nn.functional as F | |
| # Load model and tokenizer | |
| model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer") | |
| tokenizer = AutoTokenizer.from_pretrained("juenp/FrenchTextCategorizer") | |
| model.eval() | |
| # Input text | |
| text = "Ce film est un chef-d'œuvre incroyable, tout était parfait." | |
| # Tokenize | |
| inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) | |
| inputs.pop("token_type_ids", None) | |
| # Predict | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| logits = outputs.logits | |
| probs = F.softmax(logits, dim=-1) | |
| predicted_class_idx = torch.argmax(probs, dim=-1).item() | |
| # Decode predicted class from config | |
| predicted_class = model.config.id2label[str(predicted_class_idx)] | |
| prob_percentages = [round(p.item() * 100, 2) for p in probs[0]] | |
| # Output | |
| print(f"Texte : {text}") | |
| print(f"Classe prédite : {predicted_class}") | |
| print(f"Probabilités (%) : {prob_percentages}") | |
| ``` | |
| --- | |
| # 📋 Notes | |
| - `model.config.id2label` is automatically loaded from the model's configuration (`config.json`). | |
| - If you want to process multiple texts at once, simply pass a list of texts to the tokenizer. | |
| --- | |
| # ✅ Ready for Inference! | |