juenp
/

FrenchTextCategorizer

Model card Files Files and versions

FrenchTextCategorizer / README.md

juenp's picture

add tokenizer

50a6cb0 9 months ago

|

2.07 kB

	model_name: FrenchTextCategorizer
	language: French
	tags:
	- text-classification
	- fine-tuned
	- french
	license: mit
	dataset: "French News Dataset"


	# 📝 Usage

	This model is a FLAUBERT fine-tuned version to categorize French texts into the following categories:

	> CULTURE, DEBATS_ET_OPINIONS, ECONOMIE, EDUCATION, FAIT_DIVERS, INTERNATIONAL, LIFESTYLE, NUMERIQUE, POLITIQUE, RELIGION, SANTE, SCIENCE_ET_ENVIRONNEMENT, SOCIETE, SPORT, INDEFINI

	---

	## 🚀 Quick Start

	```python
	from transformers import AutoModelForSequenceClassification

	model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
	model.eval()
	```

	---

	## 🔎 Full Example (with Tokenizer, Prediction and Probabilities)

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch
	import torch.nn.functional as F

	# Load model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained("juenp/FrenchTextCategorizer")
	tokenizer = AutoTokenizer.from_pretrained("juenp/FrenchTextCategorizer")
	model.eval()

	# Input text
	text = "Ce film est un chef-d'œuvre incroyable, tout était parfait."

	# Tokenize
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
	inputs.pop("token_type_ids", None)

	# Predict
	with torch.no_grad():
	outputs = model(**inputs)

	logits = outputs.logits
	probs = F.softmax(logits, dim=-1)
	predicted_class_idx = torch.argmax(probs, dim=-1).item()

	# Decode predicted class from config
	predicted_class = model.config.id2label[str(predicted_class_idx)]
	prob_percentages = [round(p.item() * 100, 2) for p in probs[0]]

	# Output
	print(f"Texte : {text}")
	print(f"Classe prédite : {predicted_class}")
	print(f"Probabilités (%) : {prob_percentages}")
	```

	---

	# 📋 Notes

	- `model.config.id2label` is automatically loaded from the model's configuration (`config.json`).
	- If you want to process multiple texts at once, simply pass a list of texts to the tokenizer.

	---

	# ✅ Ready for Inference!