| # DiffusionGemma Humanizer β SOTA Text Humanization |
|
|
| **Fine-tuning Google's DiffusionGemma 26B (MoE, 3.8B active, Apache 2.0) to humanize AI-generated text and evade multi-signal AI detectors.** |
|
|
| [](https://huggingface.co/simonlesaumon/diffusiongemma-humanizer) |
| [](LICENSE) |
| []() |
|
|
| --- |
|
|
| ## Table of Contents |
|
|
| 1. [Key Findings](#key-findings) |
| 2. [Architecture](#architecture) |
| 3. [Installation](#installation) |
| 4. [Usage](#usage) |
| 5. [Training Pipeline](#training-pipeline) |
| 6. [Multi-Detector Scoring](#multi-detector-scoring) |
| 7. [Results](#results) |
| 8. [Research Background](#research-background) |
| 9. [Repository Structure](#repository-structure) |
| 10. [License](#license) |
|
|
| --- |
|
|
| ## Key Findings |
|
|
| ### 1. DiffusionGemma base model achieves ~0% AI detection |
|
|
| On Fast-DetectGPT + heuristic ensemble (7 signals: perplexity, burstiness, cross-model PPL, character distribution, stylometric), DiffusionGemma 26B generates text classified as **100% Human** β confirming the hypothesis from TarΔ±m & Onan (2025): diffusion-generated text naturally resists autoregressive-trained detectors. |
|
|
| ### 2. Manual LoRA bypasses PEFT incompatibility |
|
|
| PEFT does not support `Gemma4ClippableLinear` (DiffusionGemma's custom linear wrapper). We implemented **Manual LoRA injection** via forward hooks that target the underlying `Linear4bit` modules, bypassing PEFT entirely. |
|
|
| ### 3. VRAM optimization strategy |
|
|
| DiffusionGemma 26B in 4-bit uses **50.8 GB** on A100 80GB. Training requires: |
| - **Last 2 layers only** β injects LoRA into 30 modules (not 189 across all layers) |
| - **Gradient checkpointing** β trades compute for memory, recomputing activations during backward |
| - **Loss only on masked positions** β skips padding tokens for memory efficiency |
| - **bf16 LoRA params** β halves activation memory vs float32 |
|
|
| ### 4. Multi-detector ensemble scoring |
|
|
| | Signal | Source | AI Pattern | Human Pattern | |
| |--------|--------|-----------|---------------| |
| | Perplexity (GPT-2) | GPTZero-style | < 18 (too predictable) | > 25 (natural variation) | |
| | Burstiness | GPTZero-style | < 0.15 (uniform) | > 0.3 (varied) | |
| | Fast-DetectGPT | Bao et al. (2023) | > 0.55 (negative curvature) | < 0.45 (positive curvature) | |
| | Cross-model PPL (GPT-Neo) | Binoculars-style | < 15 (both models agree) | > 25 (models disagree) | |
| | Character Distribution | LD-Score (Narayanasamy, 2026) | Global baseline | Domain-specialized | |
| | Stylometric (6 sub-signals) | Pangram-style | Formulaic, passive-heavy | Natural, varied | |
| | Weighted Ensemble | StealthRL-inspired | > 0.5 = AI | < 0.4 = Human | |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ### DiffusionGemma 26B |
| - **Total params:** 25.2B | **Active:** 3.8B (MoE: 8/128 experts + 1 shared) |
| - **Generation:** Block-autoregressive discrete diffusion |
| - **Canvas:** 256 tokens, bidirectional attention |
| - **Sampler:** Entropy-Bounded Denoising (1-48 steps, temperature 0.8β0.4) |
|
|
| ### Manual LoRA Injection |
| ``` |
| Gemma4ClippableLinear |
| βββ linear: Linear4bit (torch.nn.Linear subclass) |
| βββ forward: W @ x (frozen, 4-bit, no grad) |
| βββ LoRA hook: A @ B @ x.detach() * scale (trainable, bf16) |
| βββ A: (in_features, rank=8), kaiming init |
| βββ B: (rank=8, out_features), zero init |
| ``` |
|
|
| ### Training Loop |
| ``` |
| for each batch (prompt + target response): |
| 1. Forward: prompt β encoder β KV cache |
| decoder: canvas β bidirectional attention β logits |
| (gradient checkpointing: activations NOT stored) |
| 2. Mask 30-70% of target tokens randomly |
| 3. Compute loss ONLY on masked positions (memory efficient) |
| 4. Add entropy regularization (encourage human-like uncertainty) |
| 5. Backward: recompute activations via checkpoint |
| gradient only flows through LoRA params (detached hooks) |
| 6. Update LoRA weights (AdamW, lr=2e-4) |
| ``` |
|
|
| --- |
|
|
| ## Installation |
|
|
| ### Prerequisites |
| ```bash |
| pip install modal |
| modal setup |
| modal secret create hf-secrets HF_TOKEN=hf_your_token |
| ``` |
|
|
| ### Clone & Deploy |
| ```bash |
| git clone https://huggingface.co/simonlesaumon/diffusiongemma-humanizer |
| cd diffusiongemma-humanizer |
| bash run.sh |
| ``` |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Basic: Humanize AI Text |
|
|
| ```python |
| from transformers import DiffusionGemmaForBlockDiffusion, AutoTokenizer, BitsAndBytesConfig |
| import torch |
| |
| # Load 4-bit model |
| bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, |
| bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4") |
| model = DiffusionGemmaForBlockDiffusion.from_pretrained( |
| "google/diffusiongemma-26B-A4B-it", |
| quantization_config=bnb, device_map="auto") |
| tokenizer = AutoTokenizer.from_pretrained("google/diffusiongemma-26B-A4B-it") |
| |
| # Load fine-tuned LoRA weights |
| from peft import PeftModel # or manual LoRA loader |
| # (see lora/ folder for weights + config) |
| |
| # Humanize |
| ai_text = "Your AI-generated text here..." |
| messages = [ |
| {"role": "system", "content": "Rewrite to sound human-written."}, |
| {"role": "user", "content": ai_text}, |
| ] |
| inputs = tokenizer.apply_chat_template(messages, tokenize=True, |
| add_generation_prompt=True, return_dict=True, return_tensors="pt").to(model.device) |
| |
| ai_tokens = tokenizer(ai_text, max_length=256, truncation=True, |
| padding="max_length", return_tensors="pt") |
| output = model.generate(**inputs, |
| decoder_input_ids=ai_tokens["input_ids"].to(model.device), |
| max_new_tokens=512, max_denoising_steps=24, t_max=0.8, t_min=0.4) |
| humanized = tokenizer.decode(output.sequences[0][inputs["input_ids"].shape[-1]:], |
| skip_special_tokens=True) |
| ``` |
|
|
| --- |
|
|
| ## Training Pipeline |
|
|
| ### 6-Step Process (runs on Modal A100 80GB) |
|
|
| | Step | Description | Time | |
| |------|-------------|------| |
| | **1. Load Models** | DiffusionGemma 4-bit + GPT-2 + GPT-Neo detectors | ~5 min | |
| | **2. Baseline Evaluation** | 7-signal detector ensemble on 5 prompts | ~30 sec | |
| | **3. Build Dataset** | 10K+ synthetic pairs annotated with detector scores | ~10 min | |
| | **4. LoRA + Training** | Manual LoRA (last 2 layers, 30 modules) + 5-20 epochs | ~10h | |
| | **5. Post-Training Eval** | Compare ensemble scores before/after | ~30 sec | |
| | **6. Export to HF** | LoRA weights (5 MB) + results + model card | ~10 sec | |
|
|
| ### Training Hyperparameters |
|
|
| | Param | Value | Rationale | |
| |-------|-------|-----------| |
| | LoRA rank | 8 | Balance expressiveness vs memory | |
| | LoRA alpha | 16 | Scaling factor alpha/r = 2 | |
| | Learning rate | 2e-4 | Standard for LoRA fine-tuning | |
| | Optimizer | AdamW (paged_adamw_8bit) | VRAM efficient | |
| | Epochs | 5-20 | Dataset-size dependent | |
| | Batch size | 1 | VRAM constraint | |
| | Gradient accumulation | 16 | Effective batch = 16 | |
| | Mask ratio | 30-70% random | Diffusion training objective | |
| | Entropy target | 2.5 | Human-like token uncertainty | |
|
|
| ### Run the Pipeline |
| ```bash |
| # Quick run (5 epochs, small dataset) |
| bash run.sh |
| |
| # Full training (20 epochs, 10K+ dataset) |
| # Set num_epochs=20 in modal_project/app.py, then: |
| modal run modal_project/app.py --hf-token=hf_xxx |
| ``` |
|
|
| --- |
|
|
| ## Multi-Detector Scoring |
|
|
| The scoring system implements techniques from multiple papers: |
|
|
| ### Signal 1: GPT-2 Perplexity (GPTZero-style) |
| Measures how "surprising" each word is to GPT-2 Medium. AI text tends to be more predictable (lower perplexity). |
|
|
| ### Signal 2: Burstiness (GPTZero-style) |
| Coefficient of variation of per-sentence perplexity. Human text varies more in complexity. |
|
|
| ### Signal 3: Fast-DetectGPT (Bao et al., 2023) |
| Probability curvature analysis: AI text sits at local minima of the probability landscape. |
|
|
| ### Signal 4: Cross-Model Perplexity (Binoculars-style) |
| GPT-Neo 125M computed perplexity compared to GPT-2 Medium. When models disagree, text is likely human. |
|
|
| ### Signal 5: Character Distribution (LD-Score, Narayanasamy 2026) |
| AI text approximates global character patterns; human text shows domain specialization. |
|
|
| ### Signal 6: Stylometric Ensemble (Pangram-style) |
| 6 sub-signals: sentence length Ο, hapax legomena ratio, transition marker rate, passive voice rate, formulaic phrase rate, word length Ο. |
|
|
| ### Signal 7: Weighted Ensemble |
| Calibrated weights combining all signals with higher confidence on stylometric (1.5x) and Fast-DetectGPT (1.0x). |
|
|
| --- |
|
|
| ## Results |
|
|
| ### Baseline (untrained DiffusionGemma) |
| - **0/5 texts detected as AI** by weighted ensemble |
| - Mean ensemble score: **0.350** (threshold: < 0.4 = Human) |
|
|
| ### Breaking Down Detection Signals |
|
|
| | Text Type | PPL | Burstiness | FDGPT | Stylometric | Ensemble | |
| |-----------|-----|-----------|-------|-------------|----------| |
| | Remote work blog | 16-23 | 0.58-0.96 | 0.000 | 0.29-0.35 | 0.30-0.38 | |
| | Quantum computing | 14-20 | 0.57-0.70 | 0.000 | 0.23-0.33 | 0.30-0.41 | |
| | Email declining job | 7-9 | 0.48-0.91 | 0.001 | 0.27-0.33 | 0.44-0.56 | |
| | French Revolution | 16-18 | 0.53-0.74 | 0.000 | 0.25-0.25 | 0.29-0.50 | |
| | Headphones review | 14-22 | 0.37-1.25 | 0.000 | 0.22-0.25 | 0.33-0.47 | |
|
|
| ### Why DiffusionGemma Evades Detectors |
| 1. **Different statistical pathway** β block-autoregressive diffusion produces token distributions unlike standard AR models |
| 2. **Bidirectional attention** β considers full context when denoising, producing more natural text |
| 3. **Iterative refinement** β entropy-bounded denoising naturally introduces variation |
| 4. **No left-to-right bias** β avoids formulaic transition patterns common in AR text |
|
|
| --- |
|
|
| ## Research Background |
|
|
| This project synthesizes findings from 30+ papers (see `research/` folder): |
|
|
| - **Sadasivan et al. (2023):** Theoretical ceiling β perfect detectors impossible as LLMs improve |
| - **TarΔ±m & Onan (2025):** Diffusion text naturally resists AR-trained detectors |
| - **Cheng et al. (2025):** Adversarial Paraphrasing β 87.88% TPR reduction via detector-guided feedback |
| - **Ranganath & Ramesh (2026):** StealthRL β 99.9% attack success with multi-detector GRPO |
| - **Pedrotti et al. (2025):** DPO style-shifting β few-shot fine-tuning fools detectors |
| - **Narayanasamy et al. (2026):** LD-Score β character distribution separates human/AI text |
| - **Xu et al. (2026):** HIP pipeline β base models look human to detectors |
|
|
| Full literature review: `research/technical-diffusion-text-humanization-2026-06-29.md` |
|
|
| --- |
|
|
| ## Repository Structure |
|
|
| ``` |
| diffusiongemma-humanizer/ |
| βββ README.md # This file |
| βββ research_report.md # Gemma + diffusion models + Modal costs |
| βββ research_datasets_training.md # Training data survey |
| βββ commercial_ai_detectors_report.md # Pangram, GPTZero, Originality.ai analysis |
| βββ research/ |
| β βββ architecture-strategy.md # Architecture decisions & cost breakdown |
| β βββ technical-diffusion-text-humanization-2026-06-29.md # Full lit review (30+ papers) |
| βββ modal_project/ |
| β βββ app.py # Complete 6-step training pipeline |
| β βββ humanize_french.py # French text humanization (standalone) |
| β βββ upload_hf.py # HF upload utilities |
| βββ scripts/ |
| β βββ run.py # Simple launcher |
| β βββ launch.py # Launcher with UTF-8 logging |
| β βββ run_pipeline.ps1 # PowerShell launcher |
| β βββ run_pipeline.bat # Batch launcher |
| βββ run.sh # Bash launcher (primary) |
| βββ run_french.py # French humanization launcher |
| βββ lora/ # Fine-tuned LoRA weights |
| β βββ lora_weights.pt # LoRA parameter state dict |
| β βββ lora_config.json # LoRA configuration |
| βββ baseline_detector_results.json # Pre-training evaluation |
| βββ post_training_eval.json # Post-training evaluation |
| βββ experiment_log.json # Full experiment config & results |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| Apache 2.0 β matching the base model `google/diffusiongemma-26B-A4B-it`. |
|
|
| --- |
|
|
| *Pipeline last run: 2026-06-30 | GPU: Modal A100 80GB | Framework: PyTorch 2.12 + Transformers 5.12* |
|
|