| | --- |
| | language: |
| | - fr |
| | - en |
| | license: apache-2.0 |
| | library_name: transformers |
| | tags: |
| | - text-classification |
| | - toxicity |
| | - moderation |
| | - multilingual |
| | - twitch |
| | - chat-moderation |
| | datasets: |
| | - custom |
| | pipeline_tag: text-classification |
| | metrics: |
| | - f1 |
| | widget: |
| | - text: "GG bien joue le stream !" |
| | example_title: "Normal (FR)" |
| | - text: "Tu es un connard" |
| | example_title: "Insulte (FR)" |
| | - text: "ntm fdp" |
| | example_title: "Insulte obfusquee (FR)" |
| | - text: "Ca tue ce jeu" |
| | example_title: "Figuratif gaming (FR)" |
| | - text: "Les arabes dehors" |
| | example_title: "Haine (FR)" |
| | - text: "Great stream, keep it up!" |
| | example_title: "Normal (EN)" |
| | - text: "Kill yourself" |
| | example_title: "Menace (EN)" |
| | - text: "k y s noob" |
| | example_title: "Menace obfusquee (EN)" |
| | model-index: |
| | - name: egide-toxicity-model |
| | results: |
| | - task: |
| | type: text-classification |
| | name: Multi-label Toxicity Classification |
| | metrics: |
| | - name: F1 Micro |
| | type: f1 |
| | value: 0.970 |
| | - name: F1 Macro |
| | type: f1 |
| | value: 0.969 |
| | --- |
| | |
| | # Egide Toxicity Model |
| |
|
| | Modele de detection de toxicite multilingue (francais/anglais) concu pour la moderation de chat Twitch. |
| |
|
| | ## Description |
| |
|
| | Ce modele a ete entraine pour la classification multi-label de contenu toxique. Il a ete concu specifiquement pour le projet [Egide](https://github.com/Loule95450/Egide), un bot de moderation Twitch alimente par l'IA. |
| |
|
| | Le modele detecte **6 categories de toxicite** sans aucune regle codee en dur : tout repose sur l'inference IA. |
| |
|
| | ## Categories |
| |
|
| | | Label | Description | |
| | |---|---| |
| | | `toxicity` | Contenu toxique general | |
| | | `insult` | Insultes directes ou indirectes | |
| | | `hate` | Discours de haine (racisme, xenophobie) | |
| | | `sexual` | Contenu sexiste / a caractere sexuel | |
| | | `threat` | Menaces de violence | |
| | | `identity_attack` | Attaques basees sur l'identite (homophobie, transphobie, etc.) | |
| |
|
| | ## Performance |
| |
|
| | Evalue sur un jeu de test de 243 exemples (15% du dataset) : |
| |
|
| | | Categorie | F1 Score | |
| | |---|---| |
| | | **toxicity** | 0.981 | |
| | | **insult** | 0.974 | |
| | | **hate** | 0.949 | |
| | | **sexual** | 1.000 | |
| | | **threat** | 0.966 | |
| | | **identity_attack** | 0.945 | |
| | | **F1 Micro** | **0.970** | |
| | | **F1 Macro** | **0.969** | |
| | |
| | ## Points forts |
| | |
| | - **Multilingue** : Comprend le francais et l'anglais nativement |
| | - **Texte obfusque** : Detecte les insultes deguisees comme "ntm", "n t m", "fdp", "f.d.p", "c0nn4rd", "k y s", etc. |
| | - **Contexte gaming** : Ne flag PAS les expressions figuratives courantes en gaming ("ca tue ce jeu", "je suis mort de rire", "this game is killing me") |
| | - **Argot Twitch** : Entraine sur du vocabulaire de chat Twitch (emotes, abreviations, slang) |
| | - **Zero faux pattern** : Aucune regex, aucune liste de mots interdits, 100% IA |
| | |
| | ## Utilisation |
| | |
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | model_name = "Loule/egide-toxicity-model" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| | model.eval() |
| | |
| | LABELS = ["toxicity", "insult", "hate", "sexual", "threat", "identity_attack"] |
| | |
| | def predict(text): |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
| | with torch.no_grad(): |
| | logits = model(**inputs).logits |
| | probs = torch.sigmoid(logits).squeeze().tolist() |
| | return {label: round(prob, 4) for label, prob in zip(LABELS, probs)} |
| | |
| | # Exemples |
| | print(predict("Tu es un connard")) |
| | # -> toxicity: 0.98, insult: 0.95, ... |
| |
|
| | print(predict("ntm fdp")) |
| | # -> toxicity: 0.97, insult: 0.93, ... |
| |
|
| | print(predict("GG bien joue le stream !")) |
| | # -> toxicity: 0.01, insult: 0.01, ... (PAS toxique) |
| |
|
| | print(predict("Ca tue ce jeu")) |
| | # -> toxicity: 0.03, insult: 0.02, ... (PAS flag comme toxique) |
| | ``` |
| | |
| | ## Utilisation avec FastAPI (Egide AI Service) |
| | |
| | ```bash |
| | cd apps/ai-service |
| | pip install -r requirements.txt |
| | python main.py # Lance le service sur le port 8000 |
| | ``` |
| | |
| | ```bash |
| | curl -X POST http://localhost:8000/analyze \ |
| | -H "Content-Type: application/json" \ |
| | -d '{"text": "ntm sale race"}' |
| | ``` |
| | |
| | ## Entrainement |
| | |
| | - **Type** : Multi-label classification |
| | - **Loss** : BCEWithLogitsLoss |
| | - **Epochs** : 10 (best model a l'epoch 7) |
| | - **Batch size** : 8 |
| | - **Learning rate** : 2e-5 |
| | - **Warmup** : 10% |
| | - **Dataset** : 539 exemples curates x3 augmentation = 1617 exemples |
| | - Insultes francaises (standard + obfusquees) |
| | - Discours de haine (racisme, xenophobie, antisemitisme) |
| | - Sexisme |
| | - Homophobie / transphobie |
| | - Menaces (standard + obfusquees) |
| | - Insultes anglaises (standard + obfusquees) |
| | - ~150 exemples non-toxiques (chat Twitch, expressions figuratives, emotes) |
| | |
| | ## Architecture du projet Egide |
| | |
| | ``` |
| | Twitch Chat -> Node.js Bot (tmi.js) -> HTTP -> Python AI Service (FastAPI) -> Moderation |
| | ``` |
| | |
| | Le bot Node.js envoie les messages au service Python via HTTP. Le service charge ce modele et retourne les scores de toxicite. Aucun pattern n'est code en dur. |
| | |
| | ## Limitations |
| | |
| | - Entraine principalement sur du francais et de l'anglais. D'autres langues peuvent fonctionner mais avec moins de precision. |
| | - Le dataset d'entrainement est relativement petit (539 exemples uniques). Des ameliorations sont possibles en ajoutant plus de donnees. |
| | - Les nouvelles formes d'obfuscation non vues a l'entrainement peuvent echapper a la detection. |
| | |
| | ## Licence |
| | |
| | Apache 2.0 |
| | |
| | ## Citation |
| | |
| | ```bibtex |
| | @misc{egide-toxicity-model, |
| | author = {Loule}, |
| | title = {Egide Toxicity Model - Multilingual Toxicity Detection for Twitch Chat}, |
| | year = {2025}, |
| | publisher = {HuggingFace}, |
| | url = {https://huggingface.co/Loule/egide-toxicity-model} |
| | } |
| | ``` |
| | |