silentone0725's picture
Update README.md
2411fe0 verified
---
language: "en"
license: "apache-2.0"
datasets:
- "silentone0725/ai-human-text-detection-v1"
metrics:
- "accuracy"
- "f1"
model-index:
- name: "Text Detector Model v2"
results:
- task:
type: "text-classification"
name: "Human vs AI Text Detection"
dataset:
name: "AI vs Human Combined Dataset"
type: "silentone0725/ai-human-text-detection-v1"
metrics:
- name: "Accuracy"
type: "accuracy"
value: 0.9967
- name: "F1"
type: "f1"
value: 0.9967
tags:
- "ai-detection"
- "text-classification"
- "distilbert"
- "human-vs-ai"
- "nlp"
- "huggingface"
---
# 🧠 Text Detector Model v2 β€” Fine-Tuned AI vs Human Text Classifier
This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English.
It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs.
---
## 🧩 Model Lineage
| Stage | Model | Description |
|--------|--------|-------------|
| **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. |
| **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. |
| **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. |
---
## πŸ“Š Model Details
| Property | Description |
|-----------|-------------|
| **Task** | Binary Classification β€” *Human (0)* vs *AI (1)* |
| **Languages** | English |
| **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) |
| **Split Ratio** | 70% Train / 15% Validation / 15% Test |
| **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 |
| **Precision** | Mixed FP16 |
| **Optimizer** | AdamW |
---
## πŸ§ͺ Evaluation Metrics
| Metric | Validation | Test |
|:--|:--:|:--:|
| Accuracy | 99.67% | 99.67% |
| F1-Score | 0.9967 | 0.9967 |
| Eval Loss | 0.0156 | 0.0156 |
---
## 🧠 Training Configuration
| Hyperparameter | Value |
|----------------|--------|
| Learning Rate | 2e-5 |
| Batch Size | 8 |
| Epochs | 6 |
| Weight Decay | 0.2 |
| Warmup Ratio | 0.1 |
| Dropout | 0.3 |
| Max Grad Norm | 1.0 |
| Gradient Accumulation | 2 |
| Early Stopping Patience | 2 |
| Mixed Precision | FP16 |
---
## πŸš€ Usage Example
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "silentone0725/text-detector-model-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "This paragraph was likely written by a machine learning model."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
pred = torch.argmax(outputs.logits, dim=1).item()
print("🧍 Human" if pred == 0 else "πŸ€– AI")
```
---
## πŸ“ˆ W&B Experiment Tracking
Training metrics were logged using **Weights & Biases (W&B)**.
πŸ“Š [View Training Dashboard β†’](https://wandb.ai/silentone0725-manipal/huggingface)
---
## πŸ“š Citation
If you use this model, please cite it as:
```
@misc{silentone0725_text_detector_v2_2025,
author = {Thakuria, Daksh},
title = {Text Detector Model v2 β€” Fine-Tuned DistilBERT for AI vs Human Text Detection},
year = {2025},
howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
}
```
---
## ⚠️ Limitations
- Trained only on **English** data.
- May overestimate AI probability on mixed or partially edited text.
- Should not be used for moderation or legal decisions without human verification.
---
## ❀️ Credits
- **Developer:** Daksh Thakuria (`@silentone0725`)
- **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model)
- **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
- **Frameworks:** πŸ€— Transformers, PyTorch, W&B
---
> πŸ“¦ *Last updated:* November 2025
> πŸš€ *Developed and fine-tuned in Google Colab with W&B tracking*