ReFusion 3.0

AI Text Detection Model

100% Accuracy · 0% False Positives · 0% False Negatives

Fine-tuned by Tusar Akon · Built on Qwen3-0.6B

Live Demo HuggingFace License


Overview

ReFusion 3.0 is a production-grade AI text detector fine-tuned from Qwen/Qwen3-0.6B using Parameter-Efficient Fine-Tuning (LoRA). It classifies text as either Human Written or AI Generated with sentence-level granularity, achieving perfect scores on a 3,000-sample held-out test set.

It powers the live API at ai-detector.tusarakon.com, serving real-time detection with per-sentence highlighting and tiered API access.


Performance

Held-Out Test Set (3,000 samples, never seen during training)

Metric HUMAN AI Overall
Precision 1.0000 1.0000 1.0000
Recall 1.0000 1.0000 1.0000
F1 Score 1.0000 1.0000 1.0000
Accuracy 100%

Confusion Matrix

                 Predicted AI    Predicted Human
True AI              1,515              0
True Human               0          1,485

False Positive Rate:  0.00%  (human text flagged as AI)
False Negative Rate:  0.00%  (AI text missed as human)

Training Curve

Epoch Train Loss Val Loss Accuracy F1
1 0.2408 0.1182 99.97% 0.9997
2 0.2382 0.1191 99.87% 0.9987
3 0.2340 0.1181 99.93% 0.9993
4 0.2339 0.1170 100% 1.0000
5–10 0.2339→0.2338 0.1169→0.1169 100% 1.0000

Live Inference Results

✅ [Casual Reddit]    Expected: HUMAN → Got: HUMAN
   AI 2.6%  [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Human 97.4%

✅ [GPT-style formal] Expected: AI    → Got: AI
   AI 96.0% [████████████████████████████░░] Human 4.0%

✅ [Personal story]   Expected: HUMAN → Got: HUMAN
   AI 2.6%  [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Human 97.4%

✅ [Academic AI]      Expected: AI    → Got: AI
   AI 95.4% [████████████████████████████░░] Human 4.6%

Score: 6/6 (100%)

Training Details

Architecture

Component Value
Base Model Qwen/Qwen3-0.6B
Method LoRA (PEFT)
LoRA Rank 64
LoRA Alpha 128
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Params 40,372,224 (6.34% of total)
Total Params 636,424,192
Precision bf16 (native A100)

Training Configuration

Hyperparameter Value
Epochs 10
Batch Size 32 per device
Gradient Accumulation 2 steps
Effective Batch Size 64
Learning Rate 3e-5
LR Scheduler Cosine decay
Warmup Ratio 5%
Weight Decay 0.01
Label Smoothing 0.05
Max Sequence Length 512 tokens
Hardware A100 80GB
Training Time 4h 32m

Dataset

50,000 balanced samples (25,000 human · 25,000 AI) across 6 diverse sources, collected concurrently via multi-threaded streaming:

Source Type Samples Writing Style
RAID Dataset Human 6,250 Formal — Wikipedia, news articles
ELI5 / Reddit Human 6,250 Casual — conversational Q&A
WritingPrompts Human 6,250 Creative — storytelling, fiction
ArXiv Abstracts Human 6,250 Academic — scientific writing
artem9k Detection Pile AI 12,500 Multi-model AI outputs
RAID AI Portion AI 12,500 11 different AI model outputs

Data split: 88% train (44,000) · 6% validation (3,000) · 6% test (3,000)

The diversity of human writing styles (formal, casual, creative, academic) is what enables the model to correctly classify both Reddit posts and news articles as human, without being tricked by writing style alone.


Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch

# Load
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

base = AutoModelForSequenceClassification.from_pretrained(
    "Qwen/Qwen3-0.6B",
    num_labels=2,
    id2label={0: "HUMAN", 1: "AI"},
    label2id={"HUMAN": 0, "AI": 1},
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "tusarway/refusion-3")
model.eval()

# Detect
def detect(text: str) -> dict:
    inputs = tokenizer(
        text, return_tensors="pt",
        truncation=True, max_length=512, padding=True
    )
    with torch.inference_mode():
        logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)[0]
    label = "AI" if probs[1] > probs[0] else "HUMAN"
    return {
        "label":       label,
        "ai_score":    round(float(probs[1]), 4),
        "human_score": round(float(probs[0]), 4),
        "confidence":  f"{max(probs[0], probs[1]):.1%}",
    }

# Example
result = detect("This is the text you want to analyze...")
print(result)
# → {'label': 'AI', 'ai_score': 0.9604, 'human_score': 0.0396, 'confidence': '96.0%'}

Sentence-Level Detection

import re

def detect_sentences(text: str) -> list[dict]:
    sentences = re.split(r'(?<=[.!?])\s+', text.strip())
    return [
        {"sentence": s, **detect(s)}
        for s in sentences if s.strip()
    ]

results = detect_sentences(your_text)
for r in results:
    icon = "🤖" if r["label"] == "AI" else "✍️"
    print(f"{icon} {r['confidence']}{r['sentence'][:80]}...")

Via REST API (Live)

# Free tier (500 words, 5 checks/day)
curl -X POST https://tusarway-tus-ai-detector-api.hf.space/detect \
  -H "Content-Type: application/json" \
  -H "X-API-Key: free-demo-key" \
  -d '{"text": "Your text to analyze goes here..."}'

Response:

{
  "verdict": "AI Generated",
  "ai_score": 0.9604,
  "human_score": 0.0396,
  "metrics": {
    "ai_percentage": 96.0,
    "human_percentage": 4.0,
    "total_words": 142,
    "total_chars": 891,
    "total_sentences": 6
  },
  "sentences": [
    {
      "text": "Learning guitar is a rewarding journey...",
      "is_ai": true,
      "ai_score": 0.9604,
      "human_score": 0.0396
    }
  ]
}

API Tiers

Tier Words / Check Checks / Day Access
Free 500 5 free-demo-key
Premium 10,000 100 Contact via LinkedIn
Premium Plus Unlimited Unlimited Contact via LinkedIn

Get a key: linkedin.com/in/imtrt


Version History

Version Model Samples Accuracy Notes
v1 Qwen3-0.6B (generative) 10,000 ~50% Mode collapse — always predicted AI
v2 Qwen3-0.6B (classifier) 10,000 ~85% Fixed architecture, dataset imbalance
v3 Qwen2.5-0.5B 10,000 ~91% Switched dataset to RAID
v4 Qwen3-0.6B 10,000 ~93% HC3 dataset attempt, reverted to RAID
v5 Qwen3-0.6B 10,000 ~99% Balanced dataset, proper regularization
ReFusion 3.0 Qwen3-0.6B 50,000 100% 6-source diverse dataset, r=64 LoRA, A100 native bf16

Limitations

  • Works best on texts of 100+ words. Short texts (< 30 words) may be unreliable.
  • Trained primarily on English text. Other languages are unsupported.
  • May have reduced accuracy on very recent AI models released after the training data cutoff.
  • 100% eval accuracy reflects strong generalization on this dataset; real-world accuracy on adversarial or paraphrased AI text may vary.

Citation

@misc{refusion3_2026,
  author    = {Tusar Akon},
  title     = {ReFusion 3.0: A Fine-Tuned AI Text Detection Model},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/tusarway/refusion-3}
}

About

Built by Tusar Akon as part of a fully open-source AI detection pipeline.

If this model helped you, consider starring the repo ⭐

Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using tusarway/refusion-3 1

Evaluation results