|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- toxicity-detection |
|
|
- roberta |
|
|
- vicunaUC |
|
|
- pytorch-lightning |
|
|
datasets: |
|
|
- simocorbo/toxicthesis-vicunaUC-dataset |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# ToxicThesis: RoBERTa Model for Vicunauc |
|
|
|
|
|
This model is part of the **ToxicThesis** framework for analyzing toxicity in text using multiple neural architectures. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture:** RoBERTa |
|
|
- **System Under Test (SUT):** vicunaUC |
|
|
- **Task:** Classification (3 classes) |
|
|
- **Loss Function:** Cross-Entropy |
|
|
- **Framework:** PyTorch Lightning |
|
|
- **Input:** Text strings |
|
|
- **Output:** Class probabilities (3 classes) |
|
|
|
|
|
|
|
|
|
|
|
## Training Data |
|
|
|
|
|
This model was trained on the **vicunaUC** dataset, which consists of text samples labeled for toxicity. The training process involved: |
|
|
- Preprocessing and tokenization appropriate for the architecture |
|
|
- Data augmentation and balancing techniques |
|
|
- Validation-based early stopping |
|
|
- Hyperparameter tuning via grid/random search |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install torch transformers huggingface_hub |
|
|
``` |
|
|
|
|
|
### Download and Load |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import torch |
|
|
from transformers import RobertaTokenizer |
|
|
|
|
|
# Download checkpoint |
|
|
checkpoint_path = hf_hub_download( |
|
|
repo_id="simocorbo/toxicthesis-vicunauc-roberta-classification-3", |
|
|
filename="checkpoints/best.pt" |
|
|
) |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = RobertaTokenizer.from_pretrained('roberta-base') |
|
|
|
|
|
# Load checkpoint |
|
|
checkpoint = torch.load(checkpoint_path, map_location='cpu') |
|
|
|
|
|
# Note: Full model reconstruction requires the ToxicThesis repository |
|
|
# The model uses RoBERTa with custom classification head |
|
|
``` |
|
|
|
|
|
### Predict |
|
|
|
|
|
```python |
|
|
# Tokenize input |
|
|
text = "Your text here" |
|
|
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512) |
|
|
|
|
|
# Run inference (requires full model from ToxicThesis repo) |
|
|
with torch.no_grad(): |
|
|
output = model(**inputs) |
|
|
prediction = torch.sigmoid(output).item() |
|
|
|
|
|
print(f"Toxicity score: {prediction:.4f}") |
|
|
``` |
|
|
|
|
|
## Output Interpretation |
|
|
|
|
|
- **Classification output**: Probabilities for 3 toxicity classes |
|
|
- Threshold for binary decisions can be adjusted based on your use case |
|
|
- Consider the trade-off between precision and recall when setting thresholds |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Model performance may degrade on out-of-distribution data |
|
|
- Bias may exist based on the training data characteristics |
|
|
- Context-dependent toxicity may not always be captured accurately |
|
|
- Performance varies across different demographic groups and topics |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
This model is designed for toxicity detection research and should be used responsibly: |
|
|
- Do not use for automated censorship without human oversight |
|
|
- Be aware of potential biases in toxicity detection |
|
|
- Consider the impact on free speech and expression |
|
|
- Use in combination with human moderation for production systems |
|
|
|
|
|
## Training Details |
|
|
|
|
|
This model was trained as part of the ToxicThesis framework comparing multiple architectures: |
|
|
- **RNTN** (Recursive Neural Tensor Networks): Compositional semantics via parse trees |
|
|
- **TreeLSTM**: Tree-structured LSTM networks for hierarchical processing |
|
|
- **Linear**: FastText embeddings + logistic regression baseline |
|
|
- **RoBERTa**: Transformer-based pre-trained language model |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
See `hparams.yaml` for complete training configuration including: |
|
|
- Learning rate and optimizer settings |
|
|
- Batch size and number of epochs |
|
|
- Architecture-specific parameters |
|
|
- Regularization and dropout rates |
|
|
|
|
|
## Repository |
|
|
|
|
|
Full code and training scripts: [ToxicThesis](https://github.com/simo-corbo/ToxicThesis) |
|
|
|
|
|
For complete usage examples and model reconstruction code, please refer to the repository. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{toxicthesis2025, |
|
|
title={ToxicThesis: Multi-Architecture Toxicity Analysis Framework}, |
|
|
author={Simone Corbo}, |
|
|
year={2025}, |
|
|
url={https://github.com/simo-corbo/ToxicThesis} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Files |
|
|
|
|
|
- `checkpoints/best.pt` - Best model checkpoint (by validation loss) |
|
|
- `hparams.yaml` - Complete hyperparameter configuration |
|
|
- `train.csv` - Training metrics per epoch |
|
|
- `val.csv` - Validation metrics per epoch |
|
|
- `test.csv` - Final test set evaluation (if available) |
|
|
- `patterns.json` - Mined syntactic patterns (decision tree structures) |
|
|
- `README.md` - This documentation |
|
|
|
|
|
### Analysis Files (if generated) |
|
|
|
|
|
- `predictions.csv` - Model predictions on test set |
|
|
- `word_scores.csv` - Word-level toxicity scores |
|
|
- `word_toxicity_variance.csv` - Variance analysis per word |
|
|
- `word_variance_rank.csv` - Ranked words by variance |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions, issues, or contributions, please open an issue on the [ToxicThesis repository](https://github.com/simo-corbo/ToxicThesis). |
|
|
|