|
|
--- |
|
|
language: "en" |
|
|
license: "apache-2.0" |
|
|
datasets: |
|
|
- "silentone0725/ai-human-text-detection-v1" |
|
|
metrics: |
|
|
- "accuracy" |
|
|
- "f1" |
|
|
model-index: |
|
|
- name: "Text Detector Model v2" |
|
|
results: |
|
|
- task: |
|
|
type: "text-classification" |
|
|
name: "Human vs AI Text Detection" |
|
|
dataset: |
|
|
name: "AI vs Human Combined Dataset" |
|
|
type: "silentone0725/ai-human-text-detection-v1" |
|
|
metrics: |
|
|
- name: "Accuracy" |
|
|
type: "accuracy" |
|
|
value: 0.9967 |
|
|
- name: "F1" |
|
|
type: "f1" |
|
|
value: 0.9967 |
|
|
tags: |
|
|
- "ai-detection" |
|
|
- "text-classification" |
|
|
- "distilbert" |
|
|
- "human-vs-ai" |
|
|
- "nlp" |
|
|
- "huggingface" |
|
|
--- |
|
|
|
|
|
# π§ Text Detector Model v2 β Fine-Tuned AI vs Human Text Classifier |
|
|
|
|
|
This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English. |
|
|
It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§© Model Lineage |
|
|
|
|
|
| Stage | Model | Description | |
|
|
|--------|--------|-------------| |
|
|
| **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. | |
|
|
| **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. | |
|
|
| **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Model Details |
|
|
|
|
|
| Property | Description | |
|
|
|-----------|-------------| |
|
|
| **Task** | Binary Classification β *Human (0)* vs *AI (1)* | |
|
|
| **Languages** | English | |
|
|
| **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) | |
|
|
| **Split Ratio** | 70% Train / 15% Validation / 15% Test | |
|
|
| **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 | |
|
|
| **Precision** | Mixed FP16 | |
|
|
| **Optimizer** | AdamW | |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ͺ Evaluation Metrics |
|
|
|
|
|
| Metric | Validation | Test | |
|
|
|:--|:--:|:--:| |
|
|
| Accuracy | 99.67% | 99.67% | |
|
|
| F1-Score | 0.9967 | 0.9967 | |
|
|
| Eval Loss | 0.0156 | 0.0156 | |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Training Configuration |
|
|
|
|
|
| Hyperparameter | Value | |
|
|
|----------------|--------| |
|
|
| Learning Rate | 2e-5 | |
|
|
| Batch Size | 8 | |
|
|
| Epochs | 6 | |
|
|
| Weight Decay | 0.2 | |
|
|
| Warmup Ratio | 0.1 | |
|
|
| Dropout | 0.3 | |
|
|
| Max Grad Norm | 1.0 | |
|
|
| Gradient Accumulation | 2 | |
|
|
| Early Stopping Patience | 2 | |
|
|
| Mixed Precision | FP16 | |
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage Example |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_name = "silentone0725/text-detector-model-v2" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
text = "This paragraph was likely written by a machine learning model." |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
outputs = model(**inputs) |
|
|
pred = torch.argmax(outputs.logits, dim=1).item() |
|
|
|
|
|
print("π§ Human" if pred == 0 else "π€ AI") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π W&B Experiment Tracking |
|
|
|
|
|
Training metrics were logged using **Weights & Biases (W&B)**. |
|
|
π [View Training Dashboard β](https://wandb.ai/silentone0725-manipal/huggingface) |
|
|
|
|
|
--- |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you use this model, please cite it as: |
|
|
|
|
|
``` |
|
|
@misc{silentone0725_text_detector_v2_2025, |
|
|
author = {Thakuria, Daksh}, |
|
|
title = {Text Detector Model v2 β Fine-Tuned DistilBERT for AI vs Human Text Detection}, |
|
|
year = {2025}, |
|
|
howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
- Trained only on **English** data. |
|
|
- May overestimate AI probability on mixed or partially edited text. |
|
|
- Should not be used for moderation or legal decisions without human verification. |
|
|
|
|
|
--- |
|
|
|
|
|
## β€οΈ Credits |
|
|
|
|
|
- **Developer:** Daksh Thakuria (`@silentone0725`) |
|
|
- **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model) |
|
|
- **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased) |
|
|
- **Frameworks:** π€ Transformers, PyTorch, W&B |
|
|
|
|
|
--- |
|
|
|
|
|
> π¦ *Last updated:* November 2025 |
|
|
> π *Developed and fine-tuned in Google Colab with W&B tracking* |
|
|
|