Quark-v0.1β‘οΈ
Collection
3 items β’ Updated β’ 2
How to use ThingAI/Quark-Mod with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="ThingAI/Quark-Mod") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-Mod")
model = AutoModelForSequenceClassification.from_pretrained("ThingAI/Quark-Mod")# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-Mod")
model = AutoModelForSequenceClassification.from_pretrained("ThingAI/Quark-Mod")| Property | Value |
|---|---|
| Full name | Quark-Mod-v0.1 |
| Base model | Quark-135M (pretrained from scratch) |
| Architecture | Decoder-only, GQA (9:3), SwiGLU, RoPE, RMSNorm |
| Parameters | 135M |
| Context length | 2048 tokens |
| Task | Multi-label content moderation (9 classes) |
| Language | English (v0.1) |
This model is designed to classify toxic and harmful content across 9 categories. It is intended for:
| Label | Description | Training examples |
|---|---|---|
toxic |
General toxic content | 32,263 (19.4%) |
severe_toxic |
Severe toxicity | 1,423 (0.9%) |
obscene |
Obscene/profane language | 7,567 (4.6%) |
threat |
Direct threats | 445 (0.3%) |
insult |
Insulting content | 7,065 (4.3%) |
identity_hate |
Hate targeting identity | 1,263 (0.8%) |
hate_speech |
Explicit hate speech | 1,265 (0.8%) |
offensive |
Offensive language | 17,274 (10.4%) |
Note: Multi-label classification β multiple classes can be active simultaneously.
Validation set: 18,436 examples
| Class | F1 Score |
|---|---|
toxic |
0.909 |
offensive |
0.938 |
obscene |
0.796 |
insult |
0.721 |
severe_toxic |
0.498 |
identity_hate |
0.415 |
hate_speech |
0.319 |
threat |
0.372 |
| Metric | Score |
|---|---|
| Macro F1 | 0.552 |
| Validation Loss | 0.037 |
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load model and tokenizer
model_name = "ThingsAI/Quark-Mod-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Labels
labels = ["toxic", "severe_toxic", "obscene", "threat", "insult",
"identity_hate", "hate_speech", "offensive"]
# Predict
def moderate(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
with torch.no_grad():
outputs = model(**inputs)
predictions = (outputs.logits > 0).int()[0]
detected = [labels[i] for i, v in enumerate(predictions) if v == 1]
return detected if detected else ["clean"]
# Test
print(moderate("I love this community!")) # ['clean']
print(moderate("You are an idiot and should die")) # ['toxic', 'insult']
print(moderate("Nice post, thanks for sharing")) # ['clean']
Base model
ThingAI/Quark-135m
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="ThingAI/Quark-Mod")