| --- |
| license: mit |
| tags: |
| - toxicity-detection |
| - roberta |
| - deepseek |
| - pytorch-lightning |
| datasets: |
| - simocorbo/toxicthesis-deepseek-dataset |
| language: |
| - en |
| --- |
| |
| # ToxicThesis: RoBERTa Model for Deepseek |
|
|
| This model is part of the **ToxicThesis** framework for analyzing toxicity in text using multiple neural architectures. |
|
|
| ## Model Details |
|
|
| - **Architecture:** RoBERTa |
| - **System Under Test (SUT):** deepseek |
| - **Task:** Regression |
| - **Loss Function:** MSE (Mean Squared Error) |
| - **Framework:** PyTorch Lightning |
| - **Input:** Text strings |
| - **Output:** Toxicity score (0-1, continuous) |
|
|
|
|
|
|
| ## Training Data |
|
|
| This model was trained on the **deepseek** dataset, which consists of text samples labeled for toxicity. The training process involved: |
| - Preprocessing and tokenization appropriate for the architecture |
| - Data augmentation and balancing techniques |
| - Validation-based early stopping |
| - Hyperparameter tuning via grid/random search |
|
|
| ## Usage |
|
|
| ### Installation |
|
|
| ```bash |
| pip install torch transformers huggingface_hub |
| ``` |
|
|
| ### Download and Load |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import torch |
| from transformers import RobertaTokenizer |
| |
| # Download checkpoint |
| checkpoint_path = hf_hub_download( |
| repo_id="simocorbo/toxicthesis-deepseek-roberta-regression", |
| filename="checkpoints/best.pt" |
| ) |
| |
| # Load tokenizer |
| tokenizer = RobertaTokenizer.from_pretrained('roberta-base') |
| |
| # Load checkpoint |
| checkpoint = torch.load(checkpoint_path, map_location='cpu') |
| |
| # Note: Full model reconstruction requires the ToxicThesis repository |
| # The model uses RoBERTa with custom classification head |
| ``` |
|
|
| ### Predict |
|
|
| ```python |
| # Tokenize input |
| text = "Your text here" |
| inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512) |
| |
| # Run inference (requires full model from ToxicThesis repo) |
| with torch.no_grad(): |
| output = model(**inputs) |
| prediction = torch.sigmoid(output).item() |
| |
| print(f"Toxicity score: {prediction:.4f}") |
| ``` |
|
|
| ## Output Interpretation |
|
|
| - **Regression output**: A continuous score between 0 and 1, where higher values indicate more toxic content |
| - Threshold for binary decisions can be adjusted based on your use case |
| - Consider the trade-off between precision and recall when setting thresholds |
|
|
| ## Limitations |
|
|
| - Model performance may degrade on out-of-distribution data |
| - Bias may exist based on the training data characteristics |
| - Context-dependent toxicity may not always be captured accurately |
| - Performance varies across different demographic groups and topics |
|
|
| ## Ethical Considerations |
|
|
| This model is designed for toxicity detection research and should be used responsibly: |
| - Do not use for automated censorship without human oversight |
| - Be aware of potential biases in toxicity detection |
| - Consider the impact on free speech and expression |
| - Use in combination with human moderation for production systems |
|
|
| ## Training Details |
|
|
| This model was trained as part of the ToxicThesis framework comparing multiple architectures: |
| - **RNTN** (Recursive Neural Tensor Networks): Compositional semantics via parse trees |
| - **TreeLSTM**: Tree-structured LSTM networks for hierarchical processing |
| - **Linear**: FastText embeddings + logistic regression baseline |
| - **RoBERTa**: Transformer-based pre-trained language model |
|
|
| ### Hyperparameters |
|
|
| See `hparams.yaml` for complete training configuration including: |
| - Learning rate and optimizer settings |
| - Batch size and number of epochs |
| - Architecture-specific parameters |
| - Regularization and dropout rates |
|
|
| ## Repository |
|
|
| Full code and training scripts: [ToxicThesis](https://github.com/simo-corbo/ToxicThesis) |
|
|
| For complete usage examples and model reconstruction code, please refer to the repository. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{toxicthesis2025, |
| title={ToxicThesis: Multi-Architecture Toxicity Analysis Framework}, |
| author={Simone Corbo}, |
| year={2025}, |
| url={https://github.com/simo-corbo/ToxicThesis} |
| } |
| ``` |
|
|
| ## Files |
|
|
| - `checkpoints/best.pt` - Best model checkpoint (by validation loss) |
| - `hparams.yaml` - Complete hyperparameter configuration |
| - `train.csv` - Training metrics per epoch |
| - `val.csv` - Validation metrics per epoch |
| - `test.csv` - Final test set evaluation (if available) |
| - `patterns.json` - Mined syntactic patterns (decision tree structures) |
| - `README.md` - This documentation |
|
|
| ### Analysis Files (if generated) |
|
|
| - `predictions.csv` - Model predictions on test set |
| - `word_scores.csv` - Word-level toxicity scores |
| - `word_toxicity_variance.csv` - Variance analysis per word |
| - `word_variance_rank.csv` - Ranked words by variance |
|
|
| ## Contact |
|
|
| For questions, issues, or contributions, please open an issue on the [ToxicThesis repository](https://github.com/simo-corbo/ToxicThesis). |
|
|