LiteDetective🕵️

Lightweight and Accurate Chinese Toxic Text Detection

Disclaimer: The paper contains content that may be profane, vulgar, or offensive.
{1zhaoq ,1humin } @ kean.edu
* Equal Contribution
CPS 3320 2025
Kean University

Abstract

Harmful content detection is a critical task for any social media platform, as the presence of misinformation, age, gender, and racial discrim- ination can lead to a reduction in active users. This paper introduces a novel approach that leverages large language models (LLMs) to an- alyze specific social media data and generate training data, combined with a BERT-based Dy- namic TextCNN architecture. We first crawl potential harmful comments from targeted com- munities (e.g., "ShunBa"). These comments are then subjected to random filtering and clus- tering using a smaller LLM to generate policy- guided seed examples. Next, we employ a large LLM (Qwen-3) for context-aware and context- free data augmentation. Finally, we integrate BERT embeddings with a Dynamic TextCNN classifier on our custom dataset.

Online Demo

Model Architecture

Model Architecture

Model Architecture

BERT Encoder

We utilize hfl/chinese-roberta-wwm-ext as the base transformer, with hidden states concatenation:

Hconcat=[HL1;HL]RL×1536H_{\text{concat}} = [H_{L-1}; H_L] \in \mathbb{R}^{L \times 1536}

Dynamic Multi-Scale Convolution

  • Kernel Adaptation: Each DynamicConv1d layer contains K=4K=4 parallel kernels with attention mechanism:
    α=Softmax(W2σ(W1AvgPool(x)))\alpha = \text{Softmax}(W_2\sigma(W_1\text{AvgPool}(x)))
    Output=k=1Kα(k)Convk(x)+x\text{Output} = \sum_{k=1}^K \alpha^{(k)} \otimes \text{Conv}_k(x) + x

Hierarchical Classification

The final prediction head implements dimension reduction with layer normalization:

y^=W3(Dropout(σ(W2(Dropout(σ(W1Fpooled)))))\hat{y} = W_3(\text{Dropout}(\sigma(W_2(\text{Dropout}(\sigma(W_1F_{\text{pooled}})))))