Update README.md

e79d18b verified 1 day ago

3.96 kB

	---
	language: en
	license: cc-by-nc-4.0
	library_name: transformers
	tags:
	- moderation
	- toxicity
	- content-moderation
	- safety
	- quark
	- multi-label-classification
	- jigsaw
	- hate-speech
	- italian-ai
	pipeline_tag: text-classification
	metrics:
	- f1
	- macro-f1
	base_model: ThingAI/Quark-135m
	model_name: Quark-Mod-v0.1
	pretty_name: Quark-Mod-v0.1
	size_categories: 135M
	task_categories:
	- text-classification
	---

	# Quark-Mod-v0.1

	<div align="center">

	A 135M parameter content moderation model fine-tuned from Quark-135M

	[![ThingsAI](https://img.shields.io/badge/🛡️%20ThingsAI-Research-blue)](https://things-ai.org)
	[![License](https://img.shields.io/badge/License-CC_BY_NC_4.0-lightgrey)](LICENSE)
	[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://python.org)
	[![Transformers](https://img.shields.io/badge/🤗%20Transformers-latest-orange)](https://huggingface.co/docs/transformers)

	</div>

	---

	## 📋 Model Overview

	\| Property \| Value \|
	\|----------\|-------\|
	\| Full name \| Quark-Mod-v0.1 \|
	\| Base model \| [Quark-135M](https://huggingface.co/ThingAI/Quark-135m) (pretrained from scratch) \|
	\| Architecture \| Decoder-only, GQA (9:3), SwiGLU, RoPE, RMSNorm \|
	\| Parameters \| 135M \|
	\| Context length \| 2048 tokens \|
	\| Task \| Multi-label content moderation (9 classes) \|
	\| Language \| English (v0.1) \|

	---

	## 🎯 Intended Use

	This model is designed to classify toxic and harmful content across 9 categories. It is intended for:

	- ✅ Social media moderation
	- ✅ Comment filtering systems
	- ✅ Content safety pipelines
	- ✅ Research on efficient moderation models

	### Limitations
	- ⚠️ English only (v0.1)
	- ⚠️ May struggle with subtle sarcasm or highly contextual toxicity
	- ⚠️ Lower performance on rare classes due to dataset imbalance
	- ⚠️ Not recommended for high-stakes decisions without human review

	---

	## 🏷️ Labels (9 classes)

	\| Label \| Description \| Training examples \|
	\|-------\|-------------\|-------------------\|
	\| `toxic` \| General toxic content \| 32,263 (19.4%) \|
	\| `severe_toxic` \| Severe toxicity \| 1,423 (0.9%) \|
	\| `obscene` \| Obscene/profane language \| 7,567 (4.6%) \|
	\| `threat` \| Direct threats \| 445 (0.3%) \|
	\| `insult` \| Insulting content \| 7,065 (4.3%) \|
	\| `identity_hate` \| Hate targeting identity \| 1,263 (0.8%) \|
	\| `hate_speech` \| Explicit hate speech \| 1,265 (0.8%) \|
	\| `offensive` \| Offensive language \| 17,274 (10.4%) \|

	Note: Multi-label classification — multiple classes can be active simultaneously.

	---

	## 📊 Evaluation Results

	Validation set: 18,436 examples

	\| Class \| F1 Score \|
	\|-------\|----------\|
	\| `toxic` \| 0.909 \|
	\| `offensive` \| 0.938 \|
	\| `obscene` \| 0.796 \|
	\| `insult` \| 0.721 \|
	\| `severe_toxic` \| 0.498 \|
	\| `identity_hate` \| 0.415 \|
	\| `hate_speech` \| 0.319 \|
	\| `threat` \| 0.372 \|

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Macro F1 \| 0.552 \|
	\| Validation Loss \| 0.037 \|

	---

	## 🚀 Usage Example

	```python
	import torch
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	# Load model and tokenizer
	model_name = "ThingsAI/Quark-Mod-v0.1"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Labels
	labels = ["toxic", "severe_toxic", "obscene", "threat", "insult",
	"identity_hate", "hate_speech", "offensive"]

	# Predict
	def moderate(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048)
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = (outputs.logits > 0).int()[0]
	detected = [labels[i] for i, v in enumerate(predictions) if v == 1]
	return detected if detected else ["clean"]

	# Test
	print(moderate("I love this community!")) # ['clean']
	print(moderate("You are an idiot and should die")) # ['toxic', 'insult']
	print(moderate("Nice post, thanks for sharing")) # ['clean']