QwenTox

Model Summary

QwenTox is a parameter-efficient multi-label toxic comment classification model built upon Qwen/Qwen3-0.6B-Base.
By integrating LoRA adapters with a lightweight multi-label classification head, the model is specifically designed to address the severe class imbalance problem commonly observed in toxic comment detection tasks.

The model supports six toxicity categories and emphasizes reproducibility, computational efficiency, and multilingual generalization.

Model Details

Task Description

Task type: Multi-label text classification
Domain: Toxic / abusive language detection
Input: User-generated text (comments)
Output: A 6-dimensional binary label vector

Each comment may belong to multiple toxicity categories or none.

Supported Labels

Label	Description
`toxic`	有毒
`severe_toxic`	严重有毒
`obscene`	淫秽
`threat`	威胁
`insult`	侮辱
`identity_hate`	身份仇恨

Model Architecture

Backbone: Qwen/Qwen3-0.6B-Base (Decoder-only Transformer)
Adaptation: LoRA (Low-Rank Adaptation)
Classifier: Lightweight linear multi-label classification head
Activation: Sigmoid (per-label probability)

Only the LoRA adapters and the classification head are trainable; all backbone parameters remain frozen.

Model Sources

Base Model:
https://huggingface.co/Qwen/Qwen3-0.6B-Base
Training Dataset:
Jigsaw Toxic Comment Classification

Getting Started

import torch
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

# Load base model
base_model = AutoModel.from_pretrained(
    "Qwen/Qwen3-0.6B-Base",
    trust_remote_code=True
)

# Load LoRA adapters
model = PeftModel.from_pretrained(
    base_model,
    "yingfeng64/QwenTox"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-0.6B-Base",
    trust_remote_code=True
)

# Load classification head
state_dict = torch.load("classifier_head.pt", map_location="cpu")
model.classifier.load_state_dict(state_dict)

model.eval()

Training Details

Training Data

Dataset: Jigsaw Toxic Comment Classification
Label setting: Multi-label
Class distribution: Highly imbalanced
- Non-toxic : toxic ≈ 9 : 1
- Rare categories include threat, severe_toxic, and identity_hate

To mitigate data scarcity, translation-based data augmentation was applied to low-frequency categories.

Training Strategy

Fine-tuning scope:
- Trainable: LoRA adapters + classification head
- Frozen: Qwen3 backbone parameters
Loss function: Focal Loss
Training framework: PEFT (Parameter-Efficient Fine-Tuning)

Hyperparameters

Hyperparameter	Value
Precision	FP16 (mixed precision)
Optimizer	AdamW
Learning rate	5e-5
Epochs	3
Max sequence length	384
LoRA rank (r)	128
LoRA alpha	256
LoRA dropout	0.3

Evaluation

Evaluation Datasets

In-domain: Jigsaw Toxic Comment Test Set
Out-of-domain: Jigsaw Multilingual Toxic Comment Test Set (binary classification)

Evaluation Metrics

Subset Accuracy
Hamming Loss
Macro-F1
Macro-AUC

These metrics jointly evaluate both overall prediction consistency and performance on minority toxicity categories.

Intended Use

Academic research on toxic / abusive language detection
Experiments on parameter-efficient fine-tuning (LoRA, PEFT)
Multilingual and cross-domain generalization analysis

Limitations

Trained primarily on English data; multilingual performance depends on semantic transfer from the backbone model.
Rare toxicity categories remain challenging despite data augmentation.
Not designed for real-time moderation without further calibration.

License

This model is released under the GNU General Public License v3.0 (GPL-3.0).

Under this license:

You are free to use, modify, and redistribute this model.
Any derivative work or fine-tuned version based on this model must also be released under GPL-3.0.
Source code and model modifications must be made publicly available.

This ensures that improvements and downstream adaptations of the model remain open and accessible to the research community.

Downloads last month: 13

Model tree for yingfeng64/QwenTox

Base model

Qwen/Qwen3-0.6B-Base

Adapter

(57)

this model

yingfeng64
/

QwenTox