Toxic Thesis
Collection
5 items
•
Updated
Toxicity prediction model trained on the MISTRAL dataset.
| Property | Value |
|---|---|
| Model | MOELINEAR |
| Task | Classification (5 classes) |
| Dataset | mistral |
| Framework | PyTorch / PyTorch Lightning |
MixtureOfExpertsLightning
from src.models.mixture_of_experts import MixtureOfExpertsLightning
model = MixtureOfExpertsLightning(
expert_configs: List[Dict], # List of expert model configs
num_classes: int, # 1=regression, 2+=classification
sut: str = 'deepseek', # System under test name
weight_network_type: str = 'linear', # 'linear', 'mlp', 'attention'
hidden_dim: int = 64, # Hidden dim for MLP/attention gating
lr: float = 1e-3,
weight_decay: float = 1e-4,
combination_strategy: str = 'logits' # 'logits' or 'probabilities'
)
expert_configs = [
{
'checkpoint_path': 'path/to/linear/best.pt',
'model_type': 'linear',
'val_loss': 0.15,
'name': 'linear'
},
{
'checkpoint_path': 'path/to/rntn/best.pt',
'model_type': 'rntn',
'val_loss': 0.12,
'name': 'rntn'
},
# ... more experts
]
| Method | Description |
|---|---|
forward(expert_predictions) |
Combine expert predictions. Input: (batch, num_experts). |
preprocess_predictions() |
Cache expert predictions for all data splits. |
get_expert_weights(x) |
Get combination weights for input. |
The MoE model combines predictions from multiple expert models (Linear, LSTM, TreeLSTM, RNTN, RoBERTa).
# 1. Clone ToxicThesis repository
# git clone https://github.com/simo-corbo/ToxicThesis
# cd ToxicThesis && pip install -r requirements.txt
from huggingface_hub import snapshot_download
import torch
import pickle
# 2. Download model files
model_dir = snapshot_download(
repo_id="simocorbo/toxicthesis-mistral-moelinear-classification-5",
allow_patterns=["checkpoints/*", "moe_cache/*", "*.yaml"]
)
# 3. Import and load model from ToxicThesis
from src.models.mixture_of_experts import MixtureOfExpertsLightning
checkpoint = torch.load(f"{model_dir}/checkpoints/best.pt", map_location='cpu')
hparams = checkpoint.get('hyper_parameters', {})
# 4. For cached predictions (fastest - uses pre-computed expert predictions)
with open(f"{model_dir}/moe_cache/test_predictions.pkl", 'rb') as f:
cached = pickle.load(f)
expert_preds = cached['predictions'] # Shape: (num_samples, num_experts)
targets = cached['targets'] # Shape: (num_samples,)
# 5. Load the MoE gating network
model = MixtureOfExpertsLightning.load_from_checkpoint(
f"{model_dir}/checkpoints/best.pt",
expert_configs=[], # Not needed if using cached predictions
num_classes=hparams.get('num_classes', 5)
)
model.eval()
# 6. Get combined predictions from cached expert outputs
with torch.no_grad():
expert_tensor = torch.tensor(expert_preds, dtype=torch.float32)
combined_logits = model(expert_tensor)
if model.num_classes == 1:
scores = torch.sigmoid(combined_logits).squeeze().tolist()
print(f"Scores: {scores[:5]}") # First 5 predictions
else:
probs = torch.softmax(combined_logits, dim=-1)
print(f"Probabilities: {probs[:5]}")
# 7. Get expert weights (how much each expert contributes)
weights = model.get_expert_weights(expert_tensor)
print(f"Expert weights: {weights.mean(dim=0)}") # Average weights per expert
For inference on new text (not using cached predictions), you need to:
See src/models/mixture_of_experts.py in ToxicThesis for the complete implementation.
| Output | Range | Meaning |
|---|---|---|
probabilities |
List[float] | Probability distribution over 5 classes. |
class |
0 to 4 | Predicted class (argmax of probabilities). |
Classes: 5 toxicity levels, where higher class index = more toxic.
| File | Description |
|---|---|
checkpoints/best.pt |
Model checkpoint (best validation loss) |
hparams.yaml |
Hyperparameters used for training |
train.csv |
Training metrics per epoch |
val.csv |
Validation metrics per epoch |
vocab_stanza_hybrid.pkl |
Vocabulary (for tree-based models) |
# Clone ToxicThesis for full model implementations
git clone https://github.com/simo-corbo/ToxicThesis
cd ToxicThesis
pip install -r requirements.txt
# Or install dependencies directly
pip install torch transformers huggingface_hub fasttext-wheel stanza
@software{toxicthesis2025,
title={ToxicThesis},
author={Corbo, Simone},
year={2025},
url={https://github.com/simo-corbo/ToxicThesis}
}