Text Classification
Transformers
Safetensors
PyTorch
English
modernbert
spam-detection
automation-detection
long-context
text-embeddings-inference
Instructions to use WeReCooking/ModernBERT-risk-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WeReCooking/ModernBERT-risk-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="WeReCooking/ModernBERT-risk-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("WeReCooking/ModernBERT-risk-classifier") model = AutoModelForSequenceClassification.from_pretrained("WeReCooking/ModernBERT-risk-classifier") - Notebooks
- Google Colab
- Kaggle
File size: 3,092 Bytes
a5609f8 9208bd2 a5609f8 04119dd a5609f8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | ---
license: apache-2.0
base_model: answerdotai/ModernBERT-base
library_name: transformers
pipeline_tag: text-classification
tags:
- modernbert
- text-classification
- spam-detection
- automation-detection
- long-context
- pytorch
- safetensors
language:
- en
metrics:
- f1
- precision
- recall
---
# raga - mahoraga from anime (His ability is to adapt to nature itself)
A tiny spicy ModernBERT classifier for text-risk signals - Made by @PotatoOff
> Potato did not write a README, so this appeared by magic!
## What does it classify?
Probably text / account-behavior risk labels, inferred from the eval table:
- `transactional_spam` — spammy transactional or promo-style content
- `extractive_presence` — likely copy/extraction/presence-pattern signal
- `engagement_automation` — botty engagement / automated interaction signal
- `account_farming` — account-growth or farming behavior signal
Exact label semantics depend on the training data.
## Model
- Base: `answerdotai/ModernBERT-base`
- Type: ModernBERT sequence classifier
- Context: up to 8,192 tokens
- Best for: classification, moderation-ish filters, long text scoring
## Eval snapshot
| Label | F1 | Precision | Recall | Notes |
|---|---:|---:|---:|---|
| `transactional_spam` | 0.94 | 0.89 | 0.99 | 🟢 Excellent |
| `extractive_presence` | 0.84 | 0.73 | 0.99 | 🟢 Great recall |
| `engagement_automation` | 0.65 | 0.53 | 0.85 | 🟡 Precision weak |
| `account_farming` | 0.62 | 0.61 | 0.63 | 🟡 Hardest label |
## Install
```bash
pip install -U "transformers>=4.48.0" torch
````
Optional GPU speedup:
```bash
pip install flash-attn
```
## Inference
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_id = "WeReCooking/raga"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
model_id,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else None,
device_map="auto" if torch.cuda.is_available() else None,
# attn_implementation="flash_attention_2", # optional, if installed
)
text = "paste text to classify here"
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=getattr(model.config, "max_position_embeddings", 8192),
)
# ModernBERT does not need token_type_ids
inputs.pop("token_type_ids", None)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits[0].float()
id2label = {int(k): v for k, v in model.config.id2label.items()}
multi = getattr(model.config, "problem_type", None) == "multi_label_classification"
scores = torch.sigmoid(logits) if multi else torch.softmax(logits, dim=-1)
for i, score in sorted(enumerate(scores.tolist()), key=lambda x: x[1], reverse=True):
print(f"{id2label.get(i, str(i))}: {score:.4f}")
```
## Notes
Use threshold `0.50` for multi-label as a starting point, then tune per label.
`transactional_spam` looks strong.
`engagement_automation` and `account_farming` probably need calibration before serious use. |