ReFusion 3.0
AI Text Detection Model
100% Accuracy · 0% False Positives · 0% False Negatives
Fine-tuned by Tusar Akon · Built on Qwen3-0.6B
Overview
ReFusion 3.0 is a production-grade AI text detector fine-tuned from Qwen/Qwen3-0.6B using
Parameter-Efficient Fine-Tuning (LoRA). It classifies text as either Human Written or AI Generated
with sentence-level granularity, achieving perfect scores on a 3,000-sample held-out test set.
It powers the live API at ai-detector.tusarakon.com, serving real-time detection with per-sentence highlighting and tiered API access.
Performance
Held-Out Test Set (3,000 samples, never seen during training)
| Metric | HUMAN | AI | Overall |
|---|---|---|---|
| Precision | 1.0000 | 1.0000 | 1.0000 |
| Recall | 1.0000 | 1.0000 | 1.0000 |
| F1 Score | 1.0000 | 1.0000 | 1.0000 |
| Accuracy | — | — | 100% |
Confusion Matrix
Predicted AI Predicted Human
True AI 1,515 0
True Human 0 1,485
False Positive Rate: 0.00% (human text flagged as AI)
False Negative Rate: 0.00% (AI text missed as human)
Training Curve
| Epoch | Train Loss | Val Loss | Accuracy | F1 |
|---|---|---|---|---|
| 1 | 0.2408 | 0.1182 | 99.97% | 0.9997 |
| 2 | 0.2382 | 0.1191 | 99.87% | 0.9987 |
| 3 | 0.2340 | 0.1181 | 99.93% | 0.9993 |
| 4 | 0.2339 | 0.1170 | 100% | 1.0000 |
| 5–10 | 0.2339→0.2338 | 0.1169→0.1169 | 100% | 1.0000 |
Live Inference Results
✅ [Casual Reddit] Expected: HUMAN → Got: HUMAN
AI 2.6% [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Human 97.4%
✅ [GPT-style formal] Expected: AI → Got: AI
AI 96.0% [████████████████████████████░░] Human 4.0%
✅ [Personal story] Expected: HUMAN → Got: HUMAN
AI 2.6% [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Human 97.4%
✅ [Academic AI] Expected: AI → Got: AI
AI 95.4% [████████████████████████████░░] Human 4.6%
Score: 6/6 (100%)
Training Details
Architecture
| Component | Value |
|---|---|
| Base Model | Qwen/Qwen3-0.6B |
| Method | LoRA (PEFT) |
| LoRA Rank | 64 |
| LoRA Alpha | 128 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable Params | 40,372,224 (6.34% of total) |
| Total Params | 636,424,192 |
| Precision | bf16 (native A100) |
Training Configuration
| Hyperparameter | Value |
|---|---|
| Epochs | 10 |
| Batch Size | 32 per device |
| Gradient Accumulation | 2 steps |
| Effective Batch Size | 64 |
| Learning Rate | 3e-5 |
| LR Scheduler | Cosine decay |
| Warmup Ratio | 5% |
| Weight Decay | 0.01 |
| Label Smoothing | 0.05 |
| Max Sequence Length | 512 tokens |
| Hardware | A100 80GB |
| Training Time | 4h 32m |
Dataset
50,000 balanced samples (25,000 human · 25,000 AI) across 6 diverse sources, collected concurrently via multi-threaded streaming:
| Source | Type | Samples | Writing Style |
|---|---|---|---|
| RAID Dataset | Human | 6,250 | Formal — Wikipedia, news articles |
| ELI5 / Reddit | Human | 6,250 | Casual — conversational Q&A |
| WritingPrompts | Human | 6,250 | Creative — storytelling, fiction |
| ArXiv Abstracts | Human | 6,250 | Academic — scientific writing |
| artem9k Detection Pile | AI | 12,500 | Multi-model AI outputs |
| RAID AI Portion | AI | 12,500 | 11 different AI model outputs |
Data split: 88% train (44,000) · 6% validation (3,000) · 6% test (3,000)
The diversity of human writing styles (formal, casual, creative, academic) is what enables the model to correctly classify both Reddit posts and news articles as human, without being tricked by writing style alone.
Usage
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
# Load
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
base = AutoModelForSequenceClassification.from_pretrained(
"Qwen/Qwen3-0.6B",
num_labels=2,
id2label={0: "HUMAN", 1: "AI"},
label2id={"HUMAN": 0, "AI": 1},
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "tusarway/refusion-3")
model.eval()
# Detect
def detect(text: str) -> dict:
inputs = tokenizer(
text, return_tensors="pt",
truncation=True, max_length=512, padding=True
)
with torch.inference_mode():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]
label = "AI" if probs[1] > probs[0] else "HUMAN"
return {
"label": label,
"ai_score": round(float(probs[1]), 4),
"human_score": round(float(probs[0]), 4),
"confidence": f"{max(probs[0], probs[1]):.1%}",
}
# Example
result = detect("This is the text you want to analyze...")
print(result)
# → {'label': 'AI', 'ai_score': 0.9604, 'human_score': 0.0396, 'confidence': '96.0%'}
Sentence-Level Detection
import re
def detect_sentences(text: str) -> list[dict]:
sentences = re.split(r'(?<=[.!?])\s+', text.strip())
return [
{"sentence": s, **detect(s)}
for s in sentences if s.strip()
]
results = detect_sentences(your_text)
for r in results:
icon = "🤖" if r["label"] == "AI" else "✍️"
print(f"{icon} {r['confidence']} — {r['sentence'][:80]}...")
Via REST API (Live)
# Free tier (500 words, 5 checks/day)
curl -X POST https://tusarway-tus-ai-detector-api.hf.space/detect \
-H "Content-Type: application/json" \
-H "X-API-Key: free-demo-key" \
-d '{"text": "Your text to analyze goes here..."}'
Response:
{
"verdict": "AI Generated",
"ai_score": 0.9604,
"human_score": 0.0396,
"metrics": {
"ai_percentage": 96.0,
"human_percentage": 4.0,
"total_words": 142,
"total_chars": 891,
"total_sentences": 6
},
"sentences": [
{
"text": "Learning guitar is a rewarding journey...",
"is_ai": true,
"ai_score": 0.9604,
"human_score": 0.0396
}
]
}
API Tiers
| Tier | Words / Check | Checks / Day | Access |
|---|---|---|---|
| Free | 500 | 5 | free-demo-key |
| Premium | 10,000 | 100 | Contact via LinkedIn |
| Premium Plus | Unlimited | Unlimited | Contact via LinkedIn |
→ Get a key: linkedin.com/in/imtrt
Version History
| Version | Model | Samples | Accuracy | Notes |
|---|---|---|---|---|
| v1 | Qwen3-0.6B (generative) | 10,000 | ~50% | Mode collapse — always predicted AI |
| v2 | Qwen3-0.6B (classifier) | 10,000 | ~85% | Fixed architecture, dataset imbalance |
| v3 | Qwen2.5-0.5B | 10,000 | ~91% | Switched dataset to RAID |
| v4 | Qwen3-0.6B | 10,000 | ~93% | HC3 dataset attempt, reverted to RAID |
| v5 | Qwen3-0.6B | 10,000 | ~99% | Balanced dataset, proper regularization |
| ReFusion 3.0 | Qwen3-0.6B | 50,000 | 100% | 6-source diverse dataset, r=64 LoRA, A100 native bf16 |
Limitations
- Works best on texts of 100+ words. Short texts (< 30 words) may be unreliable.
- Trained primarily on English text. Other languages are unsupported.
- May have reduced accuracy on very recent AI models released after the training data cutoff.
- 100% eval accuracy reflects strong generalization on this dataset; real-world accuracy on adversarial or paraphrased AI text may vary.
Citation
@misc{refusion3_2026,
author = {Tusar Akon},
title = {ReFusion 3.0: A Fine-Tuned AI Text Detection Model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/tusarway/refusion-3}
}
About
Built by Tusar Akon as part of a fully open-source AI detection pipeline.
- 🌐 Live tool: ai-detector.tusarakon.com
- 💼 Contact: linkedin.com/in/imtrt
- 🤗 HuggingFace: huggingface.co/tusarway
If this model helped you, consider starring the repo ⭐
- Downloads last month
- 39
Space using tusarway/refusion-3 1
Evaluation results
- accuracyself-reported1.000
- f1self-reported1.000
- precisionself-reported1.000
- recallself-reported1.000