File size: 3,958 Bytes

---
language: en
license: cc-by-nc-4.0
library_name: transformers
tags:
- moderation
- toxicity
- content-moderation
- safety
- quark
- multi-label-classification
- jigsaw
- hate-speech
- italian-ai
pipeline_tag: text-classification
metrics:
- f1
- macro-f1
base_model: ThingAI/Quark-135m
model_name: Quark-Mod-v0.1
pretty_name: Quark-Mod-v0.1
size_categories: 135M
task_categories:
- text-classification
---

# Quark-Mod-v0.1

<div align="center">

**A 135M parameter content moderation model fine-tuned from Quark-135M**

[![ThingsAI](https://img.shields.io/badge/🛡️%20ThingsAI-Research-blue)](https://things-ai.org)
[![License](https://img.shields.io/badge/License-CC_BY_NC_4.0-lightgrey)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://python.org)
[![Transformers](https://img.shields.io/badge/🤗%20Transformers-latest-orange)](https://huggingface.co/docs/transformers)

</div>

---

## 📋 Model Overview

| Property | Value |
|----------|-------|
| **Full name** | Quark-Mod-v0.1 |
| **Base model** | [Quark-135M](https://huggingface.co/ThingAI/Quark-135m) (pretrained from scratch) |
| **Architecture** | Decoder-only, GQA (9:3), SwiGLU, RoPE, RMSNorm |
| **Parameters** | 135M |
| **Context length** | 2048 tokens |
| **Task** | Multi-label content moderation (9 classes) |
| **Language** | English (v0.1) |

---

## 🎯 Intended Use

This model is designed to **classify toxic and harmful content** across 9 categories. It is intended for:

- ✅ Social media moderation
- ✅ Comment filtering systems
- ✅ Content safety pipelines
- ✅ Research on efficient moderation models

### Limitations
- ⚠️ English only (v0.1)
- ⚠️ May struggle with subtle sarcasm or highly contextual toxicity
- ⚠️ Lower performance on rare classes due to dataset imbalance
- ⚠️ Not recommended for high-stakes decisions without human review

---

## 🏷️ Labels (9 classes)

| Label | Description | Training examples |
|-------|-------------|-------------------|
| `toxic` | General toxic content | 32,263 (19.4%) |
| `severe_toxic` | Severe toxicity | 1,423 (0.9%) |
| `obscene` | Obscene/profane language | 7,567 (4.6%) |
| `threat` | Direct threats | 445 (0.3%) |
| `insult` | Insulting content | 7,065 (4.3%) |
| `identity_hate` | Hate targeting identity | 1,263 (0.8%) |
| `hate_speech` | Explicit hate speech | 1,265 (0.8%) |
| `offensive` | Offensive language | 17,274 (10.4%) |

**Note:** Multi-label classification — multiple classes can be active simultaneously.

---

## 📊 Evaluation Results

**Validation set:** 18,436 examples

| Class | F1 Score |
|-------|----------|
| `toxic` | **0.909** |
| `offensive` | **0.938** |
| `obscene` | **0.796** |
| `insult` | **0.721** |
| `severe_toxic` | **0.498** |
| `identity_hate` | **0.415** |
| `hate_speech` | **0.319** |
| `threat` | **0.372** |

| Metric | Score |
|--------|-------|
| **Macro F1** | **0.552** |
| **Validation Loss** | **0.037** |

---

## 🚀 Usage Example

```python
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model_name = "ThingsAI/Quark-Mod-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Labels
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", 
          "identity_hate", "hate_speech", "offensive"]

# Predict
def moderate(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
    with torch.no_grad():
        outputs = model(**inputs)
    predictions = (outputs.logits > 0).int()[0]
    detected = [labels[i] for i, v in enumerate(predictions) if v == 1]
    return detected if detected else ["clean"]

# Test
print(moderate("I love this community!"))        # ['clean']
print(moderate("You are an idiot and should die")) # ['toxic', 'insult']
print(moderate("Nice post, thanks for sharing")) # ['clean']