---
license: mit
task_categories:
- text-classification
language:
- fr
tags:
- toxicity
- safety
- chain-of-thought
- nlp
- french-dataset
- qlora
- curriculum-learning
pretty_name: ToxiFrench
datasets:
- Naela00/ToxiFrench
base_model:
- Qwen/Qwen3-4B
---

# ToxiFrench: French Toxicity Detection

[![arXiv](https://img.shields.io/badge/arXiv-2508.11281-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2508.11281)
[![GitHub Pages](https://img.shields.io/badge/GitHub%20Pages-Deployed-brightgreen?style=flat-square&logo=github)](https://axeldlv00.github.io/ToxiFrench/)
[![Hugging Face Dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-blue?style=flat-square&logo=huggingface)](https://huggingface.co/datasets/AxelDlv00/ToxiFrench)
[![GitHub Repository](https://img.shields.io/badge/GitHub-Repo-181717?style=flat-square&logo=github)](https://github.com/AxelDlv00/ToxiFrench)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](./LICENSE)

**Author:** Axel Delaval  
**Affiliations:** École Polytechnique & Shanghai Jiao Tong University (SJTU)  
**Email:** [name].[surname]@gmail.com

---

> ⚠️ **Content Warning**: This model is trained on toxic data. It will generate reasoning steps explaining why a text is toxic, which may include offensive language.

---

## Key Contributions

* **ToxiFrench Dataset**: A benchmark of 53,622 French comments with CoT annotations.
* **Dynamic Weighted Loss (DWL)**: A novel fine-tuning strategy that synchronizes reasoning steps with the final classification.
* **Optimizer Efficiency**: Utilization of the **SOAP** optimizer to improve convergence over standard AdamW.
* **Preference Alignment**: DPO-tuned versions for enhanced reasoning stability.

---

## Model Architecture & Adapters

This repository contains multiple **QLoRA adapters** based on the `Qwen/Qwen3-4B` architecture. Each folder corresponds to a specific training configuration.

### Available Adapters (Subfolders)

| Adapter Name | Type | Optimizer | Methodology |
| :--- | :--- | :--- | :--- |
| `Standard-SFT` | SFT | AdamW | Standard CoT Fine-Tuning |
| `SOAP-SFT` | SFT | **SOAP** | Advanced convergence training |
| `SOAP-Oversampled` | SFT | SOAP | Oversampled for class balance |
| `SOAP-DWL` | SFT | SOAP | **DWL** for reasoning faithfulness |
| `SOAP-DWL-DPO` | SFT + **DPO** | SOAP | Aligned for preference & safety |

---

## How to Use

### 1. Requirements
```bash
conda env create -f environment.yml
conda activate ToxiFrench
```

### 2. Loading the Model (Inference)

To use one of the models, load the base `Qwen3-4B` model and then apply the adapter by specifying the desired `subfolder`.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import os

base_model_name = "Qwen/Qwen3-4B"
adapter_repo_id = "AxelDlv00/ToxiFrench" 
target_adapter = "SOAP-DWL-DPO" 

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

tokens = ["<think>", "</think>"]
tokenizer.add_special_tokens({"additional_special_tokens": tokens})

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    trust_remote_code=True,
    device_map="auto"
)

tokenizer_vocab_size = len(tokenizer)
model_embedding_size = model.get_input_embeddings().weight.size(0)

if model_embedding_size != tokenizer_vocab_size:
    print(f"Syncing vocab: {model_embedding_size} -> {tokenizer_vocab_size}")
    model.resize_token_embeddings(tokenizer_vocab_size)

model = PeftModel.from_pretrained(model, adapter_repo_id, subfolder=target_adapter)
model.eval()

text = "Je ne supporte plus ton comportement, tu es vraiment un idiot !"
prompt = f"Message:\n{text}\n\nAnalyse:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs, 
        max_new_tokens=512, 
        temperature=0.7, 
        do_sample=True,
        repetition_penalty=1.1
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=False))
```

---

## Citation

```bibtex
@misc{delaval2025toxifrench,
  title={ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection},
  author={Axel Delaval and Shujian Yang and Haicheng Wang and Han Qiu and Jialiang Lu},
  year={2025},
  eprint={2508.11281},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
```