--- language: en license: cc-by-nc-4.0 library_name: transformers tags: - moderation - toxicity - content-moderation - safety - quark - multi-label-classification - jigsaw - hate-speech - italian-ai pipeline_tag: text-classification metrics: - f1 - macro-f1 base_model: ThingAI/Quark-135m model_name: Quark-Mod-v0.1 pretty_name: Quark-Mod-v0.1 size_categories: 135M task_categories: - text-classification --- # Quark-Mod-v0.1
**A 135M parameter content moderation model fine-tuned from Quark-135M** [![ThingsAI](https://img.shields.io/badge/🛡️%20ThingsAI-Research-blue)](https://things-ai.org) [![License](https://img.shields.io/badge/License-CC_BY_NC_4.0-lightgrey)](LICENSE) [![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://python.org) [![Transformers](https://img.shields.io/badge/🤗%20Transformers-latest-orange)](https://huggingface.co/docs/transformers)
--- ## 📋 Model Overview | Property | Value | |----------|-------| | **Full name** | Quark-Mod-v0.1 | | **Base model** | [Quark-135M](https://huggingface.co/ThingAI/Quark-135m) (pretrained from scratch) | | **Architecture** | Decoder-only, GQA (9:3), SwiGLU, RoPE, RMSNorm | | **Parameters** | 135M | | **Context length** | 2048 tokens | | **Task** | Multi-label content moderation (9 classes) | | **Language** | English (v0.1) | --- ## 🎯 Intended Use This model is designed to **classify toxic and harmful content** across 9 categories. It is intended for: - ✅ Social media moderation - ✅ Comment filtering systems - ✅ Content safety pipelines - ✅ Research on efficient moderation models ### Limitations - ⚠️ English only (v0.1) - ⚠️ May struggle with subtle sarcasm or highly contextual toxicity - ⚠️ Lower performance on rare classes due to dataset imbalance - ⚠️ Not recommended for high-stakes decisions without human review --- ## 🏷️ Labels (9 classes) | Label | Description | Training examples | |-------|-------------|-------------------| | `toxic` | General toxic content | 32,263 (19.4%) | | `severe_toxic` | Severe toxicity | 1,423 (0.9%) | | `obscene` | Obscene/profane language | 7,567 (4.6%) | | `threat` | Direct threats | 445 (0.3%) | | `insult` | Insulting content | 7,065 (4.3%) | | `identity_hate` | Hate targeting identity | 1,263 (0.8%) | | `hate_speech` | Explicit hate speech | 1,265 (0.8%) | | `offensive` | Offensive language | 17,274 (10.4%) | **Note:** Multi-label classification — multiple classes can be active simultaneously. --- ## 📊 Evaluation Results **Validation set:** 18,436 examples | Class | F1 Score | |-------|----------| | `toxic` | **0.909** | | `offensive` | **0.938** | | `obscene` | **0.796** | | `insult` | **0.721** | | `severe_toxic` | **0.498** | | `identity_hate` | **0.415** | | `hate_speech` | **0.319** | | `threat` | **0.372** | | Metric | Score | |--------|-------| | **Macro F1** | **0.552** | | **Validation Loss** | **0.037** | --- ## 🚀 Usage Example ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer # Load model and tokenizer model_name = "ThingsAI/Quark-Mod-v0.1" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Labels labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate", "hate_speech", "offensive"] # Predict def moderate(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048) with torch.no_grad(): outputs = model(**inputs) predictions = (outputs.logits > 0).int()[0] detected = [labels[i] for i, v in enumerate(predictions) if v == 1] return detected if detected else ["clean"] # Test print(moderate("I love this community!")) # ['clean'] print(moderate("You are an idiot and should die")) # ['toxic', 'insult'] print(moderate("Nice post, thanks for sharing")) # ['clean']