Quark-Mod / README.md

ThingsAI

Update README.md

e79d18b verified 1 day ago

preview code

raw

history blame contribute delete

3.96 kB

metadata

language: en
license: cc-by-nc-4.0
library_name: transformers
tags:
  - moderation
  - toxicity
  - content-moderation
  - safety
  - quark
  - multi-label-classification
  - jigsaw
  - hate-speech
  - italian-ai
pipeline_tag: text-classification
metrics:
  - f1
  - macro-f1
base_model: ThingAI/Quark-135m
model_name: Quark-Mod-v0.1
pretty_name: Quark-Mod-v0.1
size_categories: 135M
task_categories:
  - text-classification

Quark-Mod-v0.1

A 135M parameter content moderation model fine-tuned from Quark-135M

📋 Model Overview

Property	Value
Full name	Quark-Mod-v0.1
Base model	Quark-135M (pretrained from scratch)
Architecture	Decoder-only, GQA (9:3), SwiGLU, RoPE, RMSNorm
Parameters	135M
Context length	2048 tokens
Task	Multi-label content moderation (9 classes)
Language	English (v0.1)

🎯 Intended Use

This model is designed to classify toxic and harmful content across 9 categories. It is intended for:

✅ Social media moderation
✅ Comment filtering systems
✅ Content safety pipelines
✅ Research on efficient moderation models

Limitations

⚠️ English only (v0.1)
⚠️ May struggle with subtle sarcasm or highly contextual toxicity
⚠️ Lower performance on rare classes due to dataset imbalance
⚠️ Not recommended for high-stakes decisions without human review

🏷️ Labels (9 classes)

Label	Description	Training examples
`toxic`	General toxic content	32,263 (19.4%)
`severe_toxic`	Severe toxicity	1,423 (0.9%)
`obscene`	Obscene/profane language	7,567 (4.6%)
`threat`	Direct threats	445 (0.3%)
`insult`	Insulting content	7,065 (4.3%)
`identity_hate`	Hate targeting identity	1,263 (0.8%)
`hate_speech`	Explicit hate speech	1,265 (0.8%)
`offensive`	Offensive language	17,274 (10.4%)

Note: Multi-label classification — multiple classes can be active simultaneously.

📊 Evaluation Results

Validation set: 18,436 examples

Class	F1 Score
`toxic`	0.909
`offensive`	0.938
`obscene`	0.796
`insult`	0.721
`severe_toxic`	0.498
`identity_hate`	0.415
`hate_speech`	0.319
`threat`	0.372

Metric	Score
Macro F1	0.552
Validation Loss	0.037

🚀 Usage Example

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model_name = "ThingsAI/Quark-Mod-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Labels
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", 
          "identity_hate", "hate_speech", "offensive"]

# Predict
def moderate(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
    with torch.no_grad():
        outputs = model(**inputs)
    predictions = (outputs.logits > 0).int()[0]
    detected = [labels[i] for i, v in enumerate(predictions) if v == 1]
    return detected if detected else ["clean"]

# Test
print(moderate("I love this community!"))        # ['clean']
print(moderate("You are an idiot and should die")) # ['toxic', 'insult']
print(moderate("Nice post, thanks for sharing")) # ['clean']