Quark-Mod / README.md
ThingsAI's picture
Update README.md
e79d18b verified
metadata
language: en
license: cc-by-nc-4.0
library_name: transformers
tags:
  - moderation
  - toxicity
  - content-moderation
  - safety
  - quark
  - multi-label-classification
  - jigsaw
  - hate-speech
  - italian-ai
pipeline_tag: text-classification
metrics:
  - f1
  - macro-f1
base_model: ThingAI/Quark-135m
model_name: Quark-Mod-v0.1
pretty_name: Quark-Mod-v0.1
size_categories: 135M
task_categories:
  - text-classification

Quark-Mod-v0.1

A 135M parameter content moderation model fine-tuned from Quark-135M

ThingsAI License Python Transformers


πŸ“‹ Model Overview

Property Value
Full name Quark-Mod-v0.1
Base model Quark-135M (pretrained from scratch)
Architecture Decoder-only, GQA (9:3), SwiGLU, RoPE, RMSNorm
Parameters 135M
Context length 2048 tokens
Task Multi-label content moderation (9 classes)
Language English (v0.1)

🎯 Intended Use

This model is designed to classify toxic and harmful content across 9 categories. It is intended for:

  • βœ… Social media moderation
  • βœ… Comment filtering systems
  • βœ… Content safety pipelines
  • βœ… Research on efficient moderation models

Limitations

  • ⚠️ English only (v0.1)
  • ⚠️ May struggle with subtle sarcasm or highly contextual toxicity
  • ⚠️ Lower performance on rare classes due to dataset imbalance
  • ⚠️ Not recommended for high-stakes decisions without human review

🏷️ Labels (9 classes)

Label Description Training examples
toxic General toxic content 32,263 (19.4%)
severe_toxic Severe toxicity 1,423 (0.9%)
obscene Obscene/profane language 7,567 (4.6%)
threat Direct threats 445 (0.3%)
insult Insulting content 7,065 (4.3%)
identity_hate Hate targeting identity 1,263 (0.8%)
hate_speech Explicit hate speech 1,265 (0.8%)
offensive Offensive language 17,274 (10.4%)

Note: Multi-label classification β€” multiple classes can be active simultaneously.


πŸ“Š Evaluation Results

Validation set: 18,436 examples

Class F1 Score
toxic 0.909
offensive 0.938
obscene 0.796
insult 0.721
severe_toxic 0.498
identity_hate 0.415
hate_speech 0.319
threat 0.372
Metric Score
Macro F1 0.552
Validation Loss 0.037

πŸš€ Usage Example

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model_name = "ThingsAI/Quark-Mod-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Labels
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", 
          "identity_hate", "hate_speech", "offensive"]

# Predict
def moderate(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
    with torch.no_grad():
        outputs = model(**inputs)
    predictions = (outputs.logits > 0).int()[0]
    detected = [labels[i] for i, v in enumerate(predictions) if v == 1]
    return detected if detected else ["clean"]

# Test
print(moderate("I love this community!"))        # ['clean']
print(moderate("You are an idiot and should die")) # ['toxic', 'insult']
print(moderate("Nice post, thanks for sharing")) # ['clean']