================================================================================
DETOXIFY-MEDIUM MODEL BENCHMARK RESULTS
================================================================================

📊 EXECUTIVE SUMMARY
--------------------------------------------------
Benchmark Date: 2025-09-18 13:17:58
Model: Detoxify-Medium
Dataset: ParaDetox (https://github.com/s-nlp/paradetox)
Total Samples: 1011

🎯 OVERALL PERFORMANCE METRICS
--------------------------------------------------
   Toxicity Reduction: 0.178
   Semantic Preservation: 0.561
   Fluency: 0.929
   Average Latency: 160.2ms
   Original Toxicity: 0.196
   Final Toxicity: 0.018

📈 DATASET BREAKDOWN
--------------------------------------------------

🔹 PARADETOX TOXIC NEUTRAL
   Samples: 1000
   Toxicity Reduction: 0.044
   Semantic Preservation: 0.645
   Fluency: 0.922
   Latency: 156.2ms
   Original Toxicity: 0.051
   Final Toxicity: 0.007

🔹 PARADETOX HIGH TOXICITY
   Samples: 11
   Toxicity Reduction: 0.313
   Semantic Preservation: 0.477
   Fluency: 0.936
   Latency: 164.3ms
   Original Toxicity: 0.342
   Final Toxicity: 0.029

================================================================================