--- license: mit task_categories: - text-classification language: - fr tags: - toxicity - safety - chain-of-thought - nlp - french-dataset - qlora - curriculum-learning pretty_name: ToxiFrench datasets: - Naela00/ToxiFrench base_model: - Qwen/Qwen3-4B --- # ToxiFrench: French Toxicity Detection [![arXiv](https://img.shields.io/badge/arXiv-2508.11281-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2508.11281) [![GitHub Pages](https://img.shields.io/badge/GitHub%20Pages-Deployed-brightgreen?style=flat-square&logo=github)](https://axeldlv00.github.io/ToxiFrench/) [![Hugging Face Dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-blue?style=flat-square&logo=huggingface)](https://huggingface.co/datasets/AxelDlv00/ToxiFrench) [![GitHub Repository](https://img.shields.io/badge/GitHub-Repo-181717?style=flat-square&logo=github)](https://github.com/AxelDlv00/ToxiFrench) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](./LICENSE) **Author:** Axel Delaval **Affiliations:** École Polytechnique & Shanghai Jiao Tong University (SJTU) **Email:** [name].[surname]@gmail.com --- > ⚠️ **Content Warning**: This model is trained on toxic data. It will generate reasoning steps explaining why a text is toxic, which may include offensive language. --- ## Key Contributions * **ToxiFrench Dataset**: A benchmark of 53,622 French comments with CoT annotations. * **Dynamic Weighted Loss (DWL)**: A novel fine-tuning strategy that synchronizes reasoning steps with the final classification. * **Optimizer Efficiency**: Utilization of the **SOAP** optimizer to improve convergence over standard AdamW. * **Preference Alignment**: DPO-tuned versions for enhanced reasoning stability. --- ## Model Architecture & Adapters This repository contains multiple **QLoRA adapters** based on the `Qwen/Qwen3-4B` architecture. Each folder corresponds to a specific training configuration. ### Available Adapters (Subfolders) | Adapter Name | Type | Optimizer | Methodology | | :--- | :--- | :--- | :--- | | `Standard-SFT` | SFT | AdamW | Standard CoT Fine-Tuning | | `SOAP-SFT` | SFT | **SOAP** | Advanced convergence training | | `SOAP-Oversampled` | SFT | SOAP | Oversampled for class balance | | `SOAP-DWL` | SFT | SOAP | **DWL** for reasoning faithfulness | | `SOAP-DWL-DPO` | SFT + **DPO** | SOAP | Aligned for preference & safety | --- ## How to Use ### 1. Requirements ```bash conda env create -f environment.yml conda activate ToxiFrench ``` ### 2. Loading the Model (Inference) To use one of the models, load the base `Qwen3-4B` model and then apply the adapter by specifying the desired `subfolder`. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel import os base_model_name = "Qwen/Qwen3-4B" adapter_repo_id = "AxelDlv00/ToxiFrench" target_adapter = "SOAP-DWL-DPO" tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token tokens = ["", ""] tokenizer.add_special_tokens({"additional_special_tokens": tokens}) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained( base_model_name, quantization_config=bnb_config, trust_remote_code=True, device_map="auto" ) tokenizer_vocab_size = len(tokenizer) model_embedding_size = model.get_input_embeddings().weight.size(0) if model_embedding_size != tokenizer_vocab_size: print(f"Syncing vocab: {model_embedding_size} -> {tokenizer_vocab_size}") model.resize_token_embeddings(tokenizer_vocab_size) model = PeftModel.from_pretrained(model, adapter_repo_id, subfolder=target_adapter) model.eval() text = "Je ne supporte plus ton comportement, tu es vraiment un idiot !" prompt = f"Message:\n{text}\n\nAnalyse:\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, do_sample=True, repetition_penalty=1.1 ) print(tokenizer.decode(outputs[0], skip_special_tokens=False)) ``` --- ## Citation ```bibtex @misc{delaval2025toxifrench, title={ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection}, author={Axel Delaval and Shujian Yang and Haicheng Wang and Han Qiu and Jialiang Lu}, year={2025}, eprint={2508.11281}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```