|
|
--- |
|
|
license: mit |
|
|
task_categories: |
|
|
- text-classification |
|
|
language: |
|
|
- fr |
|
|
tags: |
|
|
- toxicity |
|
|
- safety |
|
|
- chain-of-thought |
|
|
- nlp |
|
|
- french-dataset |
|
|
- qlora |
|
|
- curriculum-learning |
|
|
pretty_name: ToxiFrench |
|
|
datasets: |
|
|
- Naela00/ToxiFrench |
|
|
base_model: |
|
|
- Qwen/Qwen3-4B |
|
|
--- |
|
|
|
|
|
# ToxiFrench: French Toxicity Detection |
|
|
|
|
|
[](https://arxiv.org/abs/2508.11281) |
|
|
[](https://axeldlv00.github.io/ToxiFrench/) |
|
|
[](https://huggingface.co/datasets/AxelDlv00/ToxiFrench) |
|
|
[](https://github.com/AxelDlv00/ToxiFrench) |
|
|
[](./LICENSE) |
|
|
|
|
|
**Author:** Axel Delaval |
|
|
**Affiliations:** École Polytechnique & Shanghai Jiao Tong University (SJTU) |
|
|
**Email:** [name].[surname]@gmail.com |
|
|
|
|
|
--- |
|
|
|
|
|
> ⚠️ **Content Warning**: This model is trained on toxic data. It will generate reasoning steps explaining why a text is toxic, which may include offensive language. |
|
|
|
|
|
--- |
|
|
|
|
|
## Key Contributions |
|
|
|
|
|
* **ToxiFrench Dataset**: A benchmark of 53,622 French comments with CoT annotations. |
|
|
* **Dynamic Weighted Loss (DWL)**: A novel fine-tuning strategy that synchronizes reasoning steps with the final classification. |
|
|
* **Optimizer Efficiency**: Utilization of the **SOAP** optimizer to improve convergence over standard AdamW. |
|
|
* **Preference Alignment**: DPO-tuned versions for enhanced reasoning stability. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Architecture & Adapters |
|
|
|
|
|
This repository contains multiple **QLoRA adapters** based on the `Qwen/Qwen3-4B` architecture. Each folder corresponds to a specific training configuration. |
|
|
|
|
|
### Available Adapters (Subfolders) |
|
|
|
|
|
| Adapter Name | Type | Optimizer | Methodology | |
|
|
| :--- | :--- | :--- | :--- | |
|
|
| `Standard-SFT` | SFT | AdamW | Standard CoT Fine-Tuning | |
|
|
| `SOAP-SFT` | SFT | **SOAP** | Advanced convergence training | |
|
|
| `SOAP-Oversampled` | SFT | SOAP | Oversampled for class balance | |
|
|
| `SOAP-DWL` | SFT | SOAP | **DWL** for reasoning faithfulness | |
|
|
| `SOAP-DWL-DPO` | SFT + **DPO** | SOAP | Aligned for preference & safety | |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### 1. Requirements |
|
|
```bash |
|
|
conda env create -f environment.yml |
|
|
conda activate ToxiFrench |
|
|
``` |
|
|
|
|
|
### 2. Loading the Model (Inference) |
|
|
|
|
|
To use one of the models, load the base `Qwen3-4B` model and then apply the adapter by specifying the desired `subfolder`. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
from peft import PeftModel |
|
|
import os |
|
|
|
|
|
base_model_name = "Qwen/Qwen3-4B" |
|
|
adapter_repo_id = "AxelDlv00/ToxiFrench" |
|
|
target_adapter = "SOAP-DWL-DPO" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True) |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
tokens = ["<think>", "</think>"] |
|
|
tokenizer.add_special_tokens({"additional_special_tokens": tokens}) |
|
|
|
|
|
bnb_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_compute_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
base_model_name, |
|
|
quantization_config=bnb_config, |
|
|
trust_remote_code=True, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
tokenizer_vocab_size = len(tokenizer) |
|
|
model_embedding_size = model.get_input_embeddings().weight.size(0) |
|
|
|
|
|
if model_embedding_size != tokenizer_vocab_size: |
|
|
print(f"Syncing vocab: {model_embedding_size} -> {tokenizer_vocab_size}") |
|
|
model.resize_token_embeddings(tokenizer_vocab_size) |
|
|
|
|
|
model = PeftModel.from_pretrained(model, adapter_repo_id, subfolder=target_adapter) |
|
|
model.eval() |
|
|
|
|
|
text = "Je ne supporte plus ton comportement, tu es vraiment un idiot !" |
|
|
prompt = f"Message:\n{text}\n\nAnalyse:\n" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=512, |
|
|
temperature=0.7, |
|
|
do_sample=True, |
|
|
repetition_penalty=1.1 |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=False)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{delaval2025toxifrench, |
|
|
title={ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection}, |
|
|
author={Axel Delaval and Shujian Yang and Haicheng Wang and Han Qiu and Jialiang Lu}, |
|
|
year={2025}, |
|
|
eprint={2508.11281}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL} |
|
|
} |
|
|
``` |