QwenTox

Model Summary

QwenTox is a parameter-efficient multi-label toxic comment classification model built upon Qwen/Qwen3-0.6B-Base.
By integrating LoRA adapters with a lightweight multi-label classification head, the model is specifically designed to address the severe class imbalance problem commonly observed in toxic comment detection tasks.

The model supports six toxicity categories and emphasizes reproducibility, computational efficiency, and multilingual generalization.


Model Details

Task Description

  • Task type: Multi-label text classification
  • Domain: Toxic / abusive language detection
  • Input: User-generated text (comments)
  • Output: A 6-dimensional binary label vector

Each comment may belong to multiple toxicity categories or none.


Supported Labels

Label Description
toxic 有毒
severe_toxic 严重有毒
obscene 淫秽
threat 威胁
insult 侮辱
identity_hate 身份仇恨

Model Architecture

  • Backbone: Qwen/Qwen3-0.6B-Base (Decoder-only Transformer)
  • Adaptation: LoRA (Low-Rank Adaptation)
  • Classifier: Lightweight linear multi-label classification head
  • Activation: Sigmoid (per-label probability)

Only the LoRA adapters and the classification head are trainable; all backbone parameters remain frozen.


Model Sources


Getting Started

import torch
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

# Load base model
base_model = AutoModel.from_pretrained(
    "Qwen/Qwen3-0.6B-Base",
    trust_remote_code=True
)

# Load LoRA adapters
model = PeftModel.from_pretrained(
    base_model,
    "yingfeng64/QwenTox"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-0.6B-Base",
    trust_remote_code=True
)

# Load classification head
state_dict = torch.load("classifier_head.pt", map_location="cpu")
model.classifier.load_state_dict(state_dict)

model.eval()

Training Details

Training Data

  • Dataset: Jigsaw Toxic Comment Classification

  • Label setting: Multi-label

  • Class distribution: Highly imbalanced

    • Non-toxic : toxic ≈ 9 : 1
    • Rare categories include threat, severe_toxic, and identity_hate

To mitigate data scarcity, translation-based data augmentation was applied to low-frequency categories.


Training Strategy

  • Fine-tuning scope:

    • Trainable: LoRA adapters + classification head
    • Frozen: Qwen3 backbone parameters
  • Loss function: Focal Loss

  • Training framework: PEFT (Parameter-Efficient Fine-Tuning)


Hyperparameters

Hyperparameter Value
Precision FP16 (mixed precision)
Optimizer AdamW
Learning rate 5e-5
Epochs 3
Max sequence length 384
LoRA rank (r) 128
LoRA alpha 256
LoRA dropout 0.3

Evaluation

Evaluation Datasets

  • In-domain: Jigsaw Toxic Comment Test Set
  • Out-of-domain: Jigsaw Multilingual Toxic Comment Test Set (binary classification)

Evaluation Metrics

  • Subset Accuracy
  • Hamming Loss
  • Macro-F1
  • Macro-AUC

These metrics jointly evaluate both overall prediction consistency and performance on minority toxicity categories.


Intended Use

  • Academic research on toxic / abusive language detection
  • Experiments on parameter-efficient fine-tuning (LoRA, PEFT)
  • Multilingual and cross-domain generalization analysis

Limitations

  • Trained primarily on English data; multilingual performance depends on semantic transfer from the backbone model.
  • Rare toxicity categories remain challenging despite data augmentation.
  • Not designed for real-time moderation without further calibration.

License

This model is released under the GNU General Public License v3.0 (GPL-3.0).

Under this license:

  • You are free to use, modify, and redistribute this model.
  • Any derivative work or fine-tuned version based on this model must also be released under GPL-3.0.
  • Source code and model modifications must be made publicly available.

This ensures that improvements and downstream adaptations of the model remain open and accessible to the research community.

Downloads last month
36
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yingfeng64/QwenTox

Adapter
(54)
this model

Space using yingfeng64/QwenTox 1