QwenTox
Model Summary
QwenTox is a parameter-efficient multi-label toxic comment classification
model built upon Qwen/Qwen3-0.6B-Base.
By integrating LoRA adapters with a lightweight multi-label classification head,
the model is specifically designed to address the severe class imbalance problem
commonly observed in toxic comment detection tasks.
The model supports six toxicity categories and emphasizes reproducibility, computational efficiency, and multilingual generalization.
Model Details
Task Description
- Task type: Multi-label text classification
- Domain: Toxic / abusive language detection
- Input: User-generated text (comments)
- Output: A 6-dimensional binary label vector
Each comment may belong to multiple toxicity categories or none.
Supported Labels
| Label | Description |
|---|---|
toxic |
有毒 |
severe_toxic |
严重有毒 |
obscene |
淫秽 |
threat |
威胁 |
insult |
侮辱 |
identity_hate |
身份仇恨 |
Model Architecture
- Backbone: Qwen/Qwen3-0.6B-Base (Decoder-only Transformer)
- Adaptation: LoRA (Low-Rank Adaptation)
- Classifier: Lightweight linear multi-label classification head
- Activation: Sigmoid (per-label probability)
Only the LoRA adapters and the classification head are trainable; all backbone parameters remain frozen.
Model Sources
Base Model:
https://huggingface.co/Qwen/Qwen3-0.6B-BaseTraining Dataset:
Jigsaw Toxic Comment Classification
Getting Started
import torch
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
# Load base model
base_model = AutoModel.from_pretrained(
"Qwen/Qwen3-0.6B-Base",
trust_remote_code=True
)
# Load LoRA adapters
model = PeftModel.from_pretrained(
base_model,
"yingfeng64/QwenTox"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"Qwen/Qwen3-0.6B-Base",
trust_remote_code=True
)
# Load classification head
state_dict = torch.load("classifier_head.pt", map_location="cpu")
model.classifier.load_state_dict(state_dict)
model.eval()
Training Details
Training Data
Dataset: Jigsaw Toxic Comment Classification
Label setting: Multi-label
Class distribution: Highly imbalanced
- Non-toxic : toxic ≈ 9 : 1
- Rare categories include
threat,severe_toxic, andidentity_hate
To mitigate data scarcity, translation-based data augmentation was applied to low-frequency categories.
Training Strategy
Fine-tuning scope:
- Trainable: LoRA adapters + classification head
- Frozen: Qwen3 backbone parameters
Loss function: Focal Loss
Training framework: PEFT (Parameter-Efficient Fine-Tuning)
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Precision | FP16 (mixed precision) |
| Optimizer | AdamW |
| Learning rate | 5e-5 |
| Epochs | 3 |
| Max sequence length | 384 |
| LoRA rank (r) | 128 |
| LoRA alpha | 256 |
| LoRA dropout | 0.3 |
Evaluation
Evaluation Datasets
- In-domain: Jigsaw Toxic Comment Test Set
- Out-of-domain: Jigsaw Multilingual Toxic Comment Test Set (binary classification)
Evaluation Metrics
- Subset Accuracy
- Hamming Loss
- Macro-F1
- Macro-AUC
These metrics jointly evaluate both overall prediction consistency and performance on minority toxicity categories.
Intended Use
- Academic research on toxic / abusive language detection
- Experiments on parameter-efficient fine-tuning (LoRA, PEFT)
- Multilingual and cross-domain generalization analysis
Limitations
- Trained primarily on English data; multilingual performance depends on semantic transfer from the backbone model.
- Rare toxicity categories remain challenging despite data augmentation.
- Not designed for real-time moderation without further calibration.
License
This model is released under the GNU General Public License v3.0 (GPL-3.0).
Under this license:
- You are free to use, modify, and redistribute this model.
- Any derivative work or fine-tuned version based on this model must also be released under GPL-3.0.
- Source code and model modifications must be made publicly available.
This ensures that improvements and downstream adaptations of the model remain open and accessible to the research community.
- Downloads last month
- 36
Model tree for yingfeng64/QwenTox
Base model
Qwen/Qwen3-0.6B-Base