Nekochu's picture
add credits
04119dd verified
---
license: apache-2.0
base_model: answerdotai/ModernBERT-base
library_name: transformers
pipeline_tag: text-classification
tags:
- modernbert
- text-classification
- spam-detection
- automation-detection
- long-context
- pytorch
- safetensors
language:
- en
metrics:
- f1
- precision
- recall
---
# raga - mahoraga from anime (His ability is to adapt to nature itself)
A tiny spicy ModernBERT classifier for text-risk signals - Made by @PotatoOff
> Potato did not write a README, so this appeared by magic!
## What does it classify?
Probably text / account-behavior risk labels, inferred from the eval table:
- `transactional_spam` β€” spammy transactional or promo-style content
- `extractive_presence` β€” likely copy/extraction/presence-pattern signal
- `engagement_automation` β€” botty engagement / automated interaction signal
- `account_farming` β€” account-growth or farming behavior signal
Exact label semantics depend on the training data.
## Model
- Base: `answerdotai/ModernBERT-base`
- Type: ModernBERT sequence classifier
- Context: up to 8,192 tokens
- Best for: classification, moderation-ish filters, long text scoring
## Eval snapshot
| Label | F1 | Precision | Recall | Notes |
|---|---:|---:|---:|---|
| `transactional_spam` | 0.94 | 0.89 | 0.99 | 🟒 Excellent |
| `extractive_presence` | 0.84 | 0.73 | 0.99 | 🟒 Great recall |
| `engagement_automation` | 0.65 | 0.53 | 0.85 | 🟑 Precision weak |
| `account_farming` | 0.62 | 0.61 | 0.63 | 🟑 Hardest label |
## Install
```bash
pip install -U "transformers>=4.48.0" torch
````
Optional GPU speedup:
```bash
pip install flash-attn
```
## Inference
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_id = "WeReCooking/raga"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
model_id,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else None,
device_map="auto" if torch.cuda.is_available() else None,
# attn_implementation="flash_attention_2", # optional, if installed
)
text = "paste text to classify here"
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=getattr(model.config, "max_position_embeddings", 8192),
)
# ModernBERT does not need token_type_ids
inputs.pop("token_type_ids", None)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits[0].float()
id2label = {int(k): v for k, v in model.config.id2label.items()}
multi = getattr(model.config, "problem_type", None) == "multi_label_classification"
scores = torch.sigmoid(logits) if multi else torch.softmax(logits, dim=-1)
for i, score in sorted(enumerate(scores.tolist()), key=lambda x: x[1], reverse=True):
print(f"{id2label.get(i, str(i))}: {score:.4f}")
```
## Notes
Use threshold `0.50` for multi-label as a starting point, then tune per label.
`transactional_spam` looks strong.
`engagement_automation` and `account_farming` probably need calibration before serious use.