ToxiFrench / README.md

correcting the code for inference

9afa635 8 days ago

4.69 kB

	---
	license: mit
	task_categories:
	- text-classification
	language:
	- fr
	tags:
	- toxicity
	- safety
	- chain-of-thought
	- nlp
	- french-dataset
	- qlora
	- curriculum-learning
	pretty_name: ToxiFrench
	datasets:
	- Naela00/ToxiFrench
	base_model:
	- Qwen/Qwen3-4B
	---

	# ToxiFrench: French Toxicity Detection

	[![arXiv](https://img.shields.io/badge/arXiv-2508.11281-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2508.11281)
	[![GitHub Pages](https://img.shields.io/badge/GitHub%20Pages-Deployed-brightgreen?style=flat-square&logo=github)](https://axeldlv00.github.io/ToxiFrench/)
	[![Hugging Face Dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-blue?style=flat-square&logo=huggingface)](https://huggingface.co/datasets/AxelDlv00/ToxiFrench)
	[![GitHub Repository](https://img.shields.io/badge/GitHub-Repo-181717?style=flat-square&logo=github)](https://github.com/AxelDlv00/ToxiFrench)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](./LICENSE)

	Author: Axel Delaval
	Affiliations: École Polytechnique & Shanghai Jiao Tong University (SJTU)
	Email: [name].[surname]@gmail.com

	---

	> ⚠️ Content Warning: This model is trained on toxic data. It will generate reasoning steps explaining why a text is toxic, which may include offensive language.

	---

	## Key Contributions

	* ToxiFrench Dataset: A benchmark of 53,622 French comments with CoT annotations.
	* Dynamic Weighted Loss (DWL): A novel fine-tuning strategy that synchronizes reasoning steps with the final classification.
	* Optimizer Efficiency: Utilization of the SOAP optimizer to improve convergence over standard AdamW.
	* Preference Alignment: DPO-tuned versions for enhanced reasoning stability.

	---

	## Model Architecture & Adapters

	This repository contains multiple QLoRA adapters based on the `Qwen/Qwen3-4B` architecture. Each folder corresponds to a specific training configuration.

	### Available Adapters (Subfolders)

	\| Adapter Name \| Type \| Optimizer \| Methodology \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| `Standard-SFT` \| SFT \| AdamW \| Standard CoT Fine-Tuning \|
	\| `SOAP-SFT` \| SFT \| SOAP \| Advanced convergence training \|
	\| `SOAP-Oversampled` \| SFT \| SOAP \| Oversampled for class balance \|
	\| `SOAP-DWL` \| SFT \| SOAP \| DWL for reasoning faithfulness \|
	\| `SOAP-DWL-DPO` \| SFT + DPO \| SOAP \| Aligned for preference & safety \|

	---

	## How to Use

	### 1. Requirements
	```bash
	conda env create -f environment.yml
	conda activate ToxiFrench
	```

	### 2. Loading the Model (Inference)

	To use one of the models, load the base `Qwen3-4B` model and then apply the adapter by specifying the desired `subfolder`.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel
	import os

	base_model_name = "Qwen/Qwen3-4B"
	adapter_repo_id = "AxelDlv00/ToxiFrench"
	target_adapter = "SOAP-DWL-DPO"

	tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	tokens = ["<think>", "</think>"]
	tokenizer.add_special_tokens({"additional_special_tokens": tokens})

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	model = AutoModelForCausalLM.from_pretrained(
	base_model_name,
	quantization_config=bnb_config,
	trust_remote_code=True,
	device_map="auto"
	)

	tokenizer_vocab_size = len(tokenizer)
	model_embedding_size = model.get_input_embeddings().weight.size(0)

	if model_embedding_size != tokenizer_vocab_size:
	print(f"Syncing vocab: {model_embedding_size} -> {tokenizer_vocab_size}")
	model.resize_token_embeddings(tokenizer_vocab_size)

	model = PeftModel.from_pretrained(model, adapter_repo_id, subfolder=target_adapter)
	model.eval()

	text = "Je ne supporte plus ton comportement, tu es vraiment un idiot !"
	prompt = f"Message:\n{text}\n\nAnalyse:\n"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.7,
	do_sample=True,
	repetition_penalty=1.1
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=False))
	```

	---

	## Citation

	```bibtex
	@misc{delaval2025toxifrench,
	title={ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection},
	author={Axel Delaval and Shujian Yang and Haicheng Wang and Han Qiu and Jialiang Lu},
	year={2025},
	eprint={2508.11281},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```