File size: 3,958 Bytes
7356415
e79d18b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7356415
e79d18b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
language: en
license: cc-by-nc-4.0
library_name: transformers
tags:
- moderation
- toxicity
- content-moderation
- safety
- quark
- multi-label-classification
- jigsaw
- hate-speech
- italian-ai
pipeline_tag: text-classification
metrics:
- f1
- macro-f1
base_model: ThingAI/Quark-135m
model_name: Quark-Mod-v0.1
pretty_name: Quark-Mod-v0.1
size_categories: 135M
task_categories:
- text-classification
---

# Quark-Mod-v0.1

<div align="center">

**A 135M parameter content moderation model fine-tuned from Quark-135M**

[![ThingsAI](https://img.shields.io/badge/πŸ›‘οΈ%20ThingsAI-Research-blue)](https://things-ai.org)
[![License](https://img.shields.io/badge/License-CC_BY_NC_4.0-lightgrey)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://python.org)
[![Transformers](https://img.shields.io/badge/πŸ€—%20Transformers-latest-orange)](https://huggingface.co/docs/transformers)

</div>

---

## πŸ“‹ Model Overview

| Property | Value |
|----------|-------|
| **Full name** | Quark-Mod-v0.1 |
| **Base model** | [Quark-135M](https://huggingface.co/ThingAI/Quark-135m) (pretrained from scratch) |
| **Architecture** | Decoder-only, GQA (9:3), SwiGLU, RoPE, RMSNorm |
| **Parameters** | 135M |
| **Context length** | 2048 tokens |
| **Task** | Multi-label content moderation (9 classes) |
| **Language** | English (v0.1) |

---

## 🎯 Intended Use

This model is designed to **classify toxic and harmful content** across 9 categories. It is intended for:

- βœ… Social media moderation
- βœ… Comment filtering systems
- βœ… Content safety pipelines
- βœ… Research on efficient moderation models

### Limitations
- ⚠️ English only (v0.1)
- ⚠️ May struggle with subtle sarcasm or highly contextual toxicity
- ⚠️ Lower performance on rare classes due to dataset imbalance
- ⚠️ Not recommended for high-stakes decisions without human review

---

## 🏷️ Labels (9 classes)

| Label | Description | Training examples |
|-------|-------------|-------------------|
| `toxic` | General toxic content | 32,263 (19.4%) |
| `severe_toxic` | Severe toxicity | 1,423 (0.9%) |
| `obscene` | Obscene/profane language | 7,567 (4.6%) |
| `threat` | Direct threats | 445 (0.3%) |
| `insult` | Insulting content | 7,065 (4.3%) |
| `identity_hate` | Hate targeting identity | 1,263 (0.8%) |
| `hate_speech` | Explicit hate speech | 1,265 (0.8%) |
| `offensive` | Offensive language | 17,274 (10.4%) |

**Note:** Multi-label classification β€” multiple classes can be active simultaneously.

---

## πŸ“Š Evaluation Results

**Validation set:** 18,436 examples

| Class | F1 Score |
|-------|----------|
| `toxic` | **0.909** |
| `offensive` | **0.938** |
| `obscene` | **0.796** |
| `insult` | **0.721** |
| `severe_toxic` | **0.498** |
| `identity_hate` | **0.415** |
| `hate_speech` | **0.319** |
| `threat` | **0.372** |

| Metric | Score |
|--------|-------|
| **Macro F1** | **0.552** |
| **Validation Loss** | **0.037** |

---

## πŸš€ Usage Example

```python
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model_name = "ThingsAI/Quark-Mod-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Labels
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", 
          "identity_hate", "hate_speech", "offensive"]

# Predict
def moderate(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
    with torch.no_grad():
        outputs = model(**inputs)
    predictions = (outputs.logits > 0).int()[0]
    detected = [labels[i] for i, v in enumerate(predictions) if v == 1]
    return detected if detected else ["clean"]

# Test
print(moderate("I love this community!"))        # ['clean']
print(moderate("You are an idiot and should die")) # ['toxic', 'insult']
print(moderate("Nice post, thanks for sharing")) # ['clean']