Text Classification
Transformers
Safetensors
PyTorch
English
modernbert
spam-detection
automation-detection
long-context
text-embeddings-inference
Instructions to use WeReCooking/ModernBERT-risk-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WeReCooking/ModernBERT-risk-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="WeReCooking/ModernBERT-risk-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("WeReCooking/ModernBERT-risk-classifier") model = AutoModelForSequenceClassification.from_pretrained("WeReCooking/ModernBERT-risk-classifier") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: answerdotai/ModernBERT-base | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| tags: | |
| - modernbert | |
| - text-classification | |
| - spam-detection | |
| - automation-detection | |
| - long-context | |
| - pytorch | |
| - safetensors | |
| language: | |
| - en | |
| metrics: | |
| - f1 | |
| - precision | |
| - recall | |
| # raga - mahoraga from anime (His ability is to adapt to nature itself) | |
| A tiny spicy ModernBERT classifier for text-risk signals - Made by @PotatoOff | |
| > Potato did not write a README, so this appeared by magic! | |
| ## What does it classify? | |
| Probably text / account-behavior risk labels, inferred from the eval table: | |
| - `transactional_spam` β spammy transactional or promo-style content | |
| - `extractive_presence` β likely copy/extraction/presence-pattern signal | |
| - `engagement_automation` β botty engagement / automated interaction signal | |
| - `account_farming` β account-growth or farming behavior signal | |
| Exact label semantics depend on the training data. | |
| ## Model | |
| - Base: `answerdotai/ModernBERT-base` | |
| - Type: ModernBERT sequence classifier | |
| - Context: up to 8,192 tokens | |
| - Best for: classification, moderation-ish filters, long text scoring | |
| ## Eval snapshot | |
| | Label | F1 | Precision | Recall | Notes | | |
| |---|---:|---:|---:|---| | |
| | `transactional_spam` | 0.94 | 0.89 | 0.99 | π’ Excellent | | |
| | `extractive_presence` | 0.84 | 0.73 | 0.99 | π’ Great recall | | |
| | `engagement_automation` | 0.65 | 0.53 | 0.85 | π‘ Precision weak | | |
| | `account_farming` | 0.62 | 0.61 | 0.63 | π‘ Hardest label | | |
| ## Install | |
| ```bash | |
| pip install -U "transformers>=4.48.0" torch | |
| ```` | |
| Optional GPU speedup: | |
| ```bash | |
| pip install flash-attn | |
| ``` | |
| ## Inference | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| model_id = "WeReCooking/raga" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16 if torch.cuda.is_available() else None, | |
| device_map="auto" if torch.cuda.is_available() else None, | |
| # attn_implementation="flash_attention_2", # optional, if installed | |
| ) | |
| text = "paste text to classify here" | |
| inputs = tokenizer( | |
| text, | |
| return_tensors="pt", | |
| truncation=True, | |
| max_length=getattr(model.config, "max_position_embeddings", 8192), | |
| ) | |
| # ModernBERT does not need token_type_ids | |
| inputs.pop("token_type_ids", None) | |
| inputs = {k: v.to(model.device) for k, v in inputs.items()} | |
| with torch.no_grad(): | |
| logits = model(**inputs).logits[0].float() | |
| id2label = {int(k): v for k, v in model.config.id2label.items()} | |
| multi = getattr(model.config, "problem_type", None) == "multi_label_classification" | |
| scores = torch.sigmoid(logits) if multi else torch.softmax(logits, dim=-1) | |
| for i, score in sorted(enumerate(scores.tolist()), key=lambda x: x[1], reverse=True): | |
| print(f"{id2label.get(i, str(i))}: {score:.4f}") | |
| ``` | |
| ## Notes | |
| Use threshold `0.50` for multi-label as a starting point, then tune per label. | |
| `transactional_spam` looks strong. | |
| `engagement_automation` and `account_farming` probably need calibration before serious use. |