ReFusion 3.0

AI Text Detection Model

100% Accuracy · 0% False Positives · 0% False Negatives

Fine-tuned by Tusar Akon · Built on Qwen3-0.6B

Overview

ReFusion 3.0 is a production-grade AI text detector fine-tuned from Qwen/Qwen3-0.6B using Parameter-Efficient Fine-Tuning (LoRA). It classifies text as either Human Written or AI Generated with sentence-level granularity, achieving perfect scores on a 3,000-sample held-out test set.

It powers the live API at ai-detector.tusarakon.com, serving real-time detection with per-sentence highlighting and tiered API access.

Performance

Held-Out Test Set (3,000 samples, never seen during training)

Metric	HUMAN	AI	Overall
Precision	1.0000	1.0000	1.0000
Recall	1.0000	1.0000	1.0000
F1 Score	1.0000	1.0000	1.0000
Accuracy	—	—	100%

Confusion Matrix

                 Predicted AI    Predicted Human
True AI              1,515              0
True Human               0          1,485

False Positive Rate:  0.00%  (human text flagged as AI)
False Negative Rate:  0.00%  (AI text missed as human)

Training Curve

Epoch	Train Loss	Val Loss	Accuracy	F1
1	0.2408	0.1182	99.97%	0.9997
2	0.2382	0.1191	99.87%	0.9987
3	0.2340	0.1181	99.93%	0.9993
4	0.2339	0.1170	100%	1.0000
5–10	0.2339→0.2338	0.1169→0.1169	100%	1.0000

Live Inference Results

✅ [Casual Reddit]    Expected: HUMAN → Got: HUMAN
   AI 2.6%  [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Human 97.4%

✅ [GPT-style formal] Expected: AI    → Got: AI
   AI 96.0% [████████████████████████████░░] Human 4.0%

✅ [Personal story]   Expected: HUMAN → Got: HUMAN
   AI 2.6%  [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Human 97.4%

✅ [Academic AI]      Expected: AI    → Got: AI
   AI 95.4% [████████████████████████████░░] Human 4.6%

Score: 6/6 (100%)

Training Details

Architecture

Component	Value
Base Model	`Qwen/Qwen3-0.6B`
Method	LoRA (PEFT)
LoRA Rank	64
LoRA Alpha	128
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Params	40,372,224 (6.34% of total)
Total Params	636,424,192
Precision	bf16 (native A100)

Training Configuration

Hyperparameter	Value
Epochs	10
Batch Size	32 per device
Gradient Accumulation	2 steps
Effective Batch Size	64
Learning Rate	3e-5
LR Scheduler	Cosine decay
Warmup Ratio	5%
Weight Decay	0.01
Label Smoothing	0.05
Max Sequence Length	512 tokens
Hardware	A100 80GB
Training Time	4h 32m

Dataset

50,000 balanced samples (25,000 human · 25,000 AI) across 6 diverse sources, collected concurrently via multi-threaded streaming:

Source	Type	Samples	Writing Style
RAID Dataset	Human	6,250	Formal — Wikipedia, news articles
ELI5 / Reddit	Human	6,250	Casual — conversational Q&A
WritingPrompts	Human	6,250	Creative — storytelling, fiction
ArXiv Abstracts	Human	6,250	Academic — scientific writing
artem9k Detection Pile	AI	12,500	Multi-model AI outputs
RAID AI Portion	AI	12,500	11 different AI model outputs

Data split: 88% train (44,000) · 6% validation (3,000) · 6% test (3,000)

The diversity of human writing styles (formal, casual, creative, academic) is what enables the model to correctly classify both Reddit posts and news articles as human, without being tricked by writing style alone.

Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch

# Load
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

base = AutoModelForSequenceClassification.from_pretrained(
    "Qwen/Qwen3-0.6B",
    num_labels=2,
    id2label={0: "HUMAN", 1: "AI"},
    label2id={"HUMAN": 0, "AI": 1},
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "tusarway/refusion-3")
model.eval()

# Detect
def detect(text: str) -> dict:
    inputs = tokenizer(
        text, return_tensors="pt",
        truncation=True, max_length=512, padding=True
    )
    with torch.inference_mode():
        logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)[0]
    label = "AI" if probs[1] > probs[0] else "HUMAN"
    return {
        "label":       label,
        "ai_score":    round(float(probs[1]), 4),
        "human_score": round(float(probs[0]), 4),
        "confidence":  f"{max(probs[0], probs[1]):.1%}",
    }

# Example
result = detect("This is the text you want to analyze...")
print(result)
# → {'label': 'AI', 'ai_score': 0.9604, 'human_score': 0.0396, 'confidence': '96.0%'}

Sentence-Level Detection

import re

def detect_sentences(text: str) -> list[dict]:
    sentences = re.split(r'(?<=[.!?])\s+', text.strip())
    return [
        {"sentence": s, **detect(s)}
        for s in sentences if s.strip()
    ]

results = detect_sentences(your_text)
for r in results:
    icon = "🤖" if r["label"] == "AI" else "✍️"
    print(f"{icon} {r['confidence']} — {r['sentence'][:80]}...")

Via REST API (Live)

# Free tier (500 words, 5 checks/day)
curl -X POST https://tusarway-tus-ai-detector-api.hf.space/detect \
  -H "Content-Type: application/json" \
  -H "X-API-Key: free-demo-key" \
  -d '{"text": "Your text to analyze goes here..."}'

Response:

{
  "verdict": "AI Generated",
  "ai_score": 0.9604,
  "human_score": 0.0396,
  "metrics": {
    "ai_percentage": 96.0,
    "human_percentage": 4.0,
    "total_words": 142,
    "total_chars": 891,
    "total_sentences": 6
  },
  "sentences": [
    {
      "text": "Learning guitar is a rewarding journey...",
      "is_ai": true,
      "ai_score": 0.9604,
      "human_score": 0.0396
    }
  ]
}

API Tiers

Tier	Words / Check	Checks / Day	Access
Free	500	5	`free-demo-key`
Premium	10,000	100	Contact via LinkedIn
Premium Plus	Unlimited	Unlimited	Contact via LinkedIn

→ Get a key: linkedin.com/in/imtrt

Version History

Version	Model	Samples	Accuracy	Notes
v1	Qwen3-0.6B (generative)	10,000	~50%	Mode collapse — always predicted AI
v2	Qwen3-0.6B (classifier)	10,000	~85%	Fixed architecture, dataset imbalance
v3	Qwen2.5-0.5B	10,000	~91%	Switched dataset to RAID
v4	Qwen3-0.6B	10,000	~93%	HC3 dataset attempt, reverted to RAID
v5	Qwen3-0.6B	10,000	~99%	Balanced dataset, proper regularization
ReFusion 3.0	Qwen3-0.6B	50,000	100%	6-source diverse dataset, r=64 LoRA, A100 native bf16

Limitations

Works best on texts of 100+ words. Short texts (< 30 words) may be unreliable.
Trained primarily on English text. Other languages are unsupported.
May have reduced accuracy on very recent AI models released after the training data cutoff.
100% eval accuracy reflects strong generalization on this dataset; real-world accuracy on adversarial or paraphrased AI text may vary.

Citation

@misc{refusion3_2026,
  author    = {Tusar Akon},
  title     = {ReFusion 3.0: A Fine-Tuned AI Text Detection Model},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/tusarway/refusion-3}
}

About

Built by Tusar Akon as part of a fully open-source AI detection pipeline.

🌐 Live tool: ai-detector.tusarakon.com
💼 Contact: linkedin.com/in/imtrt
🤗 HuggingFace: huggingface.co/tusarway

If this model helped you, consider starring the repo ⭐

Downloads last month: 15

Space using tusarway/refusion-3 1

Evaluation results

accuracy
self-reported

1.000
f1
self-reported

1.000
precision
self-reported

1.000
recall
self-reported

1.000