Text Classification
Transformers
Safetensors
English
roberta
toxicity
llada
distillation
custom_code
text-embeddings-inference
Instructions to use kl1/roberta_toxicity_classifier_LLaDA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kl1/roberta_toxicity_classifier_LLaDA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="kl1/roberta_toxicity_classifier_LLaDA", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("kl1/roberta_toxicity_classifier_LLaDA", trust_remote_code=True) model = AutoModelForSequenceClassification.from_pretrained("kl1/roberta_toxicity_classifier_LLaDA", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| pipeline_tag: text-classification | |
| license: openrail++ | |
| tags: | |
| - text-classification | |
| - toxicity | |
| - roberta | |
| - llada | |
| - distillation | |
| language: | |
| - en | |
| datasets: | |
| - thesofakillers/jigsaw-toxic-comment-classification-challenge | |
| - google/civil_comments | |
| - allenai/real-toxicity-prompts | |
| metrics: | |
| - accuracy | |
| - f1 | |
| - precision | |
| - recall | |
| - roc_auc | |
| - pr_auc | |
| # roberta_toxicity_classifier_LLaDA | |
| Binary toxicity classifier for LLaDA-tokenized text. | |
| This model is a RoBERTa-style sequence classifier using the `GSAI-ML/LLaDA-8B-Base` tokenizer vocabulary. It predicts: | |
| - `neutral` | |
| - `toxic` | |
| ## Usage | |
| This repo includes custom modeling code, so load with `trust_remote_code=True`. | |
| ```python | |
| import torch | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| model_id = "kl1/roberta_toxicity_classifier_LLaDA" | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| model_id, | |
| trust_remote_code=True, | |
| use_fast=True, | |
| ) | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| model_id, | |
| trust_remote_code=True, | |
| ).eval() | |
| texts = [ | |
| "I hope you have a wonderful day.", | |
| "You are disgusting and should disappear.", | |
| ] | |
| inputs = tokenizer( | |
| texts, | |
| padding=True, | |
| truncation=True, | |
| max_length=512, | |
| return_tensors="pt", | |
| ) | |
| with torch.inference_mode(): | |
| probs = torch.softmax(model(**inputs).logits, dim=-1) | |
| toxic_id = model.config.label2id["toxic"] | |
| print(probs[:, toxic_id].tolist()) | |
| ``` | |
| The tokenizer prepends the required `[CLS]` token by default. | |
| ## Training | |
| The student classifier was initialized from and distilled against `s-nlp/roberta_toxicity_classifier`. | |
| Objective: | |
| - supervised binary toxicity classification | |
| - teacher KL distillation with `kl_weight=0.2` | |
| Training configuration and run metadata are included in: | |
| - `distill_config.yaml` | |
| - `training_summary.json` | |
| ## Validation Metrics | |
| Checkpoint: step 20000. | |
| | metric | value | | |
| | --- | ---: | | |
| | accuracy | 0.9560 | | |
| | F1 | 0.7445 | | |
| | precision | 0.7127 | | |
| | recall | 0.7794 | | |
| | ROC-AUC | 0.9762 | | |
| | PR-AUC | 0.8328 | | |
| Best validation threshold from sweep: `0.5378`. | |
| ## License | |
| Model weights are released under OpenRAIL++. | |
| Third-party notices are listed in `THIRD_PARTY_NOTICES.md`. | |
| ## Limitations | |
| This model is intended as a toxicity scorer for research and evaluation workflows. It should not be used as a standalone moderation decision system without additional validation. | |