Upload toxicthesis-deepseek-roberta-regression model

79b5404 verified 23 days ago

4.71 kB

metadata

license: mit
tags:
  - toxicity-detection
  - roberta
  - deepseek
  - pytorch-lightning
datasets:
  - simocorbo/toxicthesis-deepseek-dataset
language:
  - en

ToxicThesis: RoBERTa Model for Deepseek

This model is part of the ToxicThesis framework for analyzing toxicity in text using multiple neural architectures.

Model Details

Architecture: RoBERTa
System Under Test (SUT): deepseek
Task: Regression
Loss Function: MSE (Mean Squared Error)
Framework: PyTorch Lightning
Input: Text strings
Output: Toxicity score (0-1, continuous)

Training Data

This model was trained on the deepseek dataset, which consists of text samples labeled for toxicity. The training process involved:

Preprocessing and tokenization appropriate for the architecture
Data augmentation and balancing techniques
Validation-based early stopping
Hyperparameter tuning via grid/random search

Usage

Installation

pip install torch transformers huggingface_hub

Download and Load

from huggingface_hub import hf_hub_download
import torch
from transformers import RobertaTokenizer

# Download checkpoint
checkpoint_path = hf_hub_download(
    repo_id="simocorbo/toxicthesis-deepseek-roberta-regression",
    filename="checkpoints/best.pt"
)

# Load tokenizer
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

# Load checkpoint
checkpoint = torch.load(checkpoint_path, map_location='cpu')

# Note: Full model reconstruction requires the ToxicThesis repository
# The model uses RoBERTa with custom classification head

Predict

# Tokenize input
text = "Your text here"
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)

# Run inference (requires full model from ToxicThesis repo)
with torch.no_grad():
    output = model(**inputs)
    prediction = torch.sigmoid(output).item()

print(f"Toxicity score: {prediction:.4f}")

Output Interpretation

Regression output: A continuous score between 0 and 1, where higher values indicate more toxic content
Threshold for binary decisions can be adjusted based on your use case
Consider the trade-off between precision and recall when setting thresholds

Limitations

Model performance may degrade on out-of-distribution data
Bias may exist based on the training data characteristics
Context-dependent toxicity may not always be captured accurately
Performance varies across different demographic groups and topics

Ethical Considerations

This model is designed for toxicity detection research and should be used responsibly:

Do not use for automated censorship without human oversight
Be aware of potential biases in toxicity detection
Consider the impact on free speech and expression
Use in combination with human moderation for production systems

Training Details

This model was trained as part of the ToxicThesis framework comparing multiple architectures:

RNTN (Recursive Neural Tensor Networks): Compositional semantics via parse trees
TreeLSTM: Tree-structured LSTM networks for hierarchical processing
Linear: FastText embeddings + logistic regression baseline
RoBERTa: Transformer-based pre-trained language model

Hyperparameters

See hparams.yaml for complete training configuration including:

Learning rate and optimizer settings
Batch size and number of epochs
Architecture-specific parameters
Regularization and dropout rates

Repository

Full code and training scripts: ToxicThesis

For complete usage examples and model reconstruction code, please refer to the repository.

Citation

@software{toxicthesis2025,
  title={ToxicThesis: Multi-Architecture Toxicity Analysis Framework},
  author={Simone Corbo},
  year={2025},
  url={https://github.com/simo-corbo/ToxicThesis}
}

Files

checkpoints/best.pt - Best model checkpoint (by validation loss)
hparams.yaml - Complete hyperparameter configuration
train.csv - Training metrics per epoch
val.csv - Validation metrics per epoch
test.csv - Final test set evaluation (if available)
patterns.json - Mined syntactic patterns (decision tree structures)
README.md - This documentation

Analysis Files (if generated)

predictions.csv - Model predictions on test set
word_scores.csv - Word-level toxicity scores
word_toxicity_variance.csv - Variance analysis per word
word_variance_rank.csv - Ranked words by variance

Contact

For questions, issues, or contributions, please open an issue on the ToxicThesis repository.