ToxiFrench / README.md
AxelDlv00's picture
correcting the code for inference
9afa635
---
license: mit
task_categories:
- text-classification
language:
- fr
tags:
- toxicity
- safety
- chain-of-thought
- nlp
- french-dataset
- qlora
- curriculum-learning
pretty_name: ToxiFrench
datasets:
- Naela00/ToxiFrench
base_model:
- Qwen/Qwen3-4B
---
# ToxiFrench: French Toxicity Detection
[![arXiv](https://img.shields.io/badge/arXiv-2508.11281-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2508.11281)
[![GitHub Pages](https://img.shields.io/badge/GitHub%20Pages-Deployed-brightgreen?style=flat-square&logo=github)](https://axeldlv00.github.io/ToxiFrench/)
[![Hugging Face Dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-blue?style=flat-square&logo=huggingface)](https://huggingface.co/datasets/AxelDlv00/ToxiFrench)
[![GitHub Repository](https://img.shields.io/badge/GitHub-Repo-181717?style=flat-square&logo=github)](https://github.com/AxelDlv00/ToxiFrench)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](./LICENSE)
**Author:** Axel Delaval
**Affiliations:** École Polytechnique & Shanghai Jiao Tong University (SJTU)
**Email:** [name].[surname]@gmail.com
---
> ⚠️ **Content Warning**: This model is trained on toxic data. It will generate reasoning steps explaining why a text is toxic, which may include offensive language.
---
## Key Contributions
* **ToxiFrench Dataset**: A benchmark of 53,622 French comments with CoT annotations.
* **Dynamic Weighted Loss (DWL)**: A novel fine-tuning strategy that synchronizes reasoning steps with the final classification.
* **Optimizer Efficiency**: Utilization of the **SOAP** optimizer to improve convergence over standard AdamW.
* **Preference Alignment**: DPO-tuned versions for enhanced reasoning stability.
---
## Model Architecture & Adapters
This repository contains multiple **QLoRA adapters** based on the `Qwen/Qwen3-4B` architecture. Each folder corresponds to a specific training configuration.
### Available Adapters (Subfolders)
| Adapter Name | Type | Optimizer | Methodology |
| :--- | :--- | :--- | :--- |
| `Standard-SFT` | SFT | AdamW | Standard CoT Fine-Tuning |
| `SOAP-SFT` | SFT | **SOAP** | Advanced convergence training |
| `SOAP-Oversampled` | SFT | SOAP | Oversampled for class balance |
| `SOAP-DWL` | SFT | SOAP | **DWL** for reasoning faithfulness |
| `SOAP-DWL-DPO` | SFT + **DPO** | SOAP | Aligned for preference & safety |
---
## How to Use
### 1. Requirements
```bash
conda env create -f environment.yml
conda activate ToxiFrench
```
### 2. Loading the Model (Inference)
To use one of the models, load the base `Qwen3-4B` model and then apply the adapter by specifying the desired `subfolder`.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import os
base_model_name = "Qwen/Qwen3-4B"
adapter_repo_id = "AxelDlv00/ToxiFrench"
target_adapter = "SOAP-DWL-DPO"
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokens = ["<think>", "</think>"]
tokenizer.add_special_tokens({"additional_special_tokens": tokens})
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config,
trust_remote_code=True,
device_map="auto"
)
tokenizer_vocab_size = len(tokenizer)
model_embedding_size = model.get_input_embeddings().weight.size(0)
if model_embedding_size != tokenizer_vocab_size:
print(f"Syncing vocab: {model_embedding_size} -> {tokenizer_vocab_size}")
model.resize_token_embeddings(tokenizer_vocab_size)
model = PeftModel.from_pretrained(model, adapter_repo_id, subfolder=target_adapter)
model.eval()
text = "Je ne supporte plus ton comportement, tu es vraiment un idiot !"
prompt = f"Message:\n{text}\n\nAnalyse:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
```
---
## Citation
```bibtex
@misc{delaval2025toxifrench,
title={ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection},
author={Axel Delaval and Shujian Yang and Haicheng Wang and Han Qiu and Jialiang Lu},
year={2025},
eprint={2508.11281},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```