Update README.md

2411fe0 verified 3 months ago

4.31 kB

	---
	language: "en"
	license: "apache-2.0"
	datasets:
	- "silentone0725/ai-human-text-detection-v1"
	metrics:
	- "accuracy"
	- "f1"
	model-index:
	- name: "Text Detector Model v2"
	results:
	- task:
	type: "text-classification"
	name: "Human vs AI Text Detection"
	dataset:
	name: "AI vs Human Combined Dataset"
	type: "silentone0725/ai-human-text-detection-v1"
	metrics:
	- name: "Accuracy"
	type: "accuracy"
	value: 0.9967
	- name: "F1"
	type: "f1"
	value: 0.9967
	tags:
	- "ai-detection"
	- "text-classification"
	- "distilbert"
	- "human-vs-ai"
	- "nlp"
	- "huggingface"
	---

	# 🧠 Text Detector Model v2 — Fine-Tuned AI vs Human Text Classifier

	This model (`silentone0725/text-detector-model-v2`) is a fine-tuned text classifier that distinguishes between human-written and AI-generated text in English.
	It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs.

	---

	## 🧩 Model Lineage

	\| Stage \| Model \| Description \|
	\|--------\|--------\|-------------\|
	\| v2 \| `silentone0725/text-detector-model-v2` \| Fine-tuned with stronger regularization, early stopping, and expanded dataset. \|
	\| Base \| `silentone0725/text-detector-model` \| Your prior fine-tuned model on GPT-4 & human text dataset. \|
	\| Backbone \| `distilbert-base-uncased` \| Original pretrained transformer from Hugging Face. \|

	---

	## 📊 Model Details

	\| Property \| Description \|
	\|-----------\|-------------\|
	\| Task \| Binary Classification — Human (0) vs AI (1) \|
	\| Languages \| English \|
	\| Dataset \| [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) \|
	\| Split Ratio \| 70% Train / 15% Validation / 15% Test \|
	\| Regularization \| Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 \|
	\| Precision \| Mixed FP16 \|
	\| Optimizer \| AdamW \|

	---

	## 🧪 Evaluation Metrics

	\| Metric \| Validation \| Test \|
	\|:--\|:--:\|:--:\|
	\| Accuracy \| 99.67% \| 99.67% \|
	\| F1-Score \| 0.9967 \| 0.9967 \|
	\| Eval Loss \| 0.0156 \| 0.0156 \|

	---

	## 🧠 Training Configuration

	\| Hyperparameter \| Value \|
	\|----------------\|--------\|
	\| Learning Rate \| 2e-5 \|
	\| Batch Size \| 8 \|
	\| Epochs \| 6 \|
	\| Weight Decay \| 0.2 \|
	\| Warmup Ratio \| 0.1 \|
	\| Dropout \| 0.3 \|
	\| Max Grad Norm \| 1.0 \|
	\| Gradient Accumulation \| 2 \|
	\| Early Stopping Patience \| 2 \|
	\| Mixed Precision \| FP16 \|

	---

	## 🚀 Usage Example

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "silentone0725/text-detector-model-v2"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	text = "This paragraph was likely written by a machine learning model."
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	pred = torch.argmax(outputs.logits, dim=1).item()

	print("🧍 Human" if pred == 0 else "🤖 AI")
	```

	---

	## 📈 W&B Experiment Tracking

	Training metrics were logged using Weights & Biases (W&B).
	📊 [View Training Dashboard →](https://wandb.ai/silentone0725-manipal/huggingface)

	---

	## 📚 Citation

	If you use this model, please cite it as:

	```
	@misc{silentone0725_text_detector_v2_2025,
	author = {Thakuria, Daksh},
	title = {Text Detector Model v2 — Fine-Tuned DistilBERT for AI vs Human Text Detection},
	year = {2025},
	howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
	}
	```

	---

	## ⚠️ Limitations

	- Trained only on English data.
	- May overestimate AI probability on mixed or partially edited text.
	- Should not be used for moderation or legal decisions without human verification.

	---

	## ❤️ Credits

	- Developer: Daksh Thakuria (`@silentone0725`)
	- Base Model: [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model)
	- Backbone: [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
	- Frameworks: 🤗 Transformers, PyTorch, W&B

	---

	> 📦 Last updated: November 2025
	> 🚀 Developed and fine-tuned in Google Colab with W&B tracking