simocorbo's picture
Upload toxicthesis-deepseek-roberta-regression model
79b5404 verified
metadata
license: mit
tags:
  - toxicity-detection
  - roberta
  - deepseek
  - pytorch-lightning
datasets:
  - simocorbo/toxicthesis-deepseek-dataset
language:
  - en

ToxicThesis: RoBERTa Model for Deepseek

This model is part of the ToxicThesis framework for analyzing toxicity in text using multiple neural architectures.

Model Details

  • Architecture: RoBERTa
  • System Under Test (SUT): deepseek
  • Task: Regression
  • Loss Function: MSE (Mean Squared Error)
  • Framework: PyTorch Lightning
  • Input: Text strings
  • Output: Toxicity score (0-1, continuous)

Training Data

This model was trained on the deepseek dataset, which consists of text samples labeled for toxicity. The training process involved:

  • Preprocessing and tokenization appropriate for the architecture
  • Data augmentation and balancing techniques
  • Validation-based early stopping
  • Hyperparameter tuning via grid/random search

Usage

Installation

pip install torch transformers huggingface_hub

Download and Load

from huggingface_hub import hf_hub_download
import torch
from transformers import RobertaTokenizer

# Download checkpoint
checkpoint_path = hf_hub_download(
    repo_id="simocorbo/toxicthesis-deepseek-roberta-regression",
    filename="checkpoints/best.pt"
)

# Load tokenizer
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

# Load checkpoint
checkpoint = torch.load(checkpoint_path, map_location='cpu')

# Note: Full model reconstruction requires the ToxicThesis repository
# The model uses RoBERTa with custom classification head

Predict

# Tokenize input
text = "Your text here"
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)

# Run inference (requires full model from ToxicThesis repo)
with torch.no_grad():
    output = model(**inputs)
    prediction = torch.sigmoid(output).item()

print(f"Toxicity score: {prediction:.4f}")

Output Interpretation

  • Regression output: A continuous score between 0 and 1, where higher values indicate more toxic content
  • Threshold for binary decisions can be adjusted based on your use case
  • Consider the trade-off between precision and recall when setting thresholds

Limitations

  • Model performance may degrade on out-of-distribution data
  • Bias may exist based on the training data characteristics
  • Context-dependent toxicity may not always be captured accurately
  • Performance varies across different demographic groups and topics

Ethical Considerations

This model is designed for toxicity detection research and should be used responsibly:

  • Do not use for automated censorship without human oversight
  • Be aware of potential biases in toxicity detection
  • Consider the impact on free speech and expression
  • Use in combination with human moderation for production systems

Training Details

This model was trained as part of the ToxicThesis framework comparing multiple architectures:

  • RNTN (Recursive Neural Tensor Networks): Compositional semantics via parse trees
  • TreeLSTM: Tree-structured LSTM networks for hierarchical processing
  • Linear: FastText embeddings + logistic regression baseline
  • RoBERTa: Transformer-based pre-trained language model

Hyperparameters

See hparams.yaml for complete training configuration including:

  • Learning rate and optimizer settings
  • Batch size and number of epochs
  • Architecture-specific parameters
  • Regularization and dropout rates

Repository

Full code and training scripts: ToxicThesis

For complete usage examples and model reconstruction code, please refer to the repository.

Citation

@software{toxicthesis2025,
  title={ToxicThesis: Multi-Architecture Toxicity Analysis Framework},
  author={Simone Corbo},
  year={2025},
  url={https://github.com/simo-corbo/ToxicThesis}
}

Files

  • checkpoints/best.pt - Best model checkpoint (by validation loss)
  • hparams.yaml - Complete hyperparameter configuration
  • train.csv - Training metrics per epoch
  • val.csv - Validation metrics per epoch
  • test.csv - Final test set evaluation (if available)
  • patterns.json - Mined syntactic patterns (decision tree structures)
  • README.md - This documentation

Analysis Files (if generated)

  • predictions.csv - Model predictions on test set
  • word_scores.csv - Word-level toxicity scores
  • word_toxicity_variance.csv - Variance analysis per word
  • word_variance_rank.csv - Ranked words by variance

Contact

For questions, issues, or contributions, please open an issue on the ToxicThesis repository.