| | --- |
| | language: "en" |
| | license: "apache-2.0" |
| | datasets: |
| | - "silentone0725/ai-human-text-detection-v1" |
| | metrics: |
| | - "accuracy" |
| | - "f1" |
| | model-index: |
| | - name: "Text Detector Model v2" |
| | results: |
| | - task: |
| | type: "text-classification" |
| | name: "Human vs AI Text Detection" |
| | dataset: |
| | name: "AI vs Human Combined Dataset" |
| | type: "silentone0725/ai-human-text-detection-v1" |
| | metrics: |
| | - name: "Accuracy" |
| | type: "accuracy" |
| | value: 0.9967 |
| | - name: "F1" |
| | type: "f1" |
| | value: 0.9967 |
| | tags: |
| | - "ai-detection" |
| | - "text-classification" |
| | - "distilbert" |
| | - "human-vs-ai" |
| | - "nlp" |
| | - "huggingface" |
| | --- |
| | |
| | # π§ Text Detector Model v2 β Fine-Tuned AI vs Human Text Classifier |
| |
|
| | This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English. |
| | It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs. |
| |
|
| | --- |
| |
|
| | ## π§© Model Lineage |
| |
|
| | | Stage | Model | Description | |
| | |--------|--------|-------------| |
| | | **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. | |
| | | **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. | |
| | | **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. | |
| |
|
| | --- |
| |
|
| | ## π Model Details |
| |
|
| | | Property | Description | |
| | |-----------|-------------| |
| | | **Task** | Binary Classification β *Human (0)* vs *AI (1)* | |
| | | **Languages** | English | |
| | | **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) | |
| | | **Split Ratio** | 70% Train / 15% Validation / 15% Test | |
| | | **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 | |
| | | **Precision** | Mixed FP16 | |
| | | **Optimizer** | AdamW | |
| |
|
| | --- |
| |
|
| | ## π§ͺ Evaluation Metrics |
| |
|
| | | Metric | Validation | Test | |
| | |:--|:--:|:--:| |
| | | Accuracy | 99.67% | 99.67% | |
| | | F1-Score | 0.9967 | 0.9967 | |
| | | Eval Loss | 0.0156 | 0.0156 | |
| |
|
| | --- |
| |
|
| | ## π§ Training Configuration |
| |
|
| | | Hyperparameter | Value | |
| | |----------------|--------| |
| | | Learning Rate | 2e-5 | |
| | | Batch Size | 8 | |
| | | Epochs | 6 | |
| | | Weight Decay | 0.2 | |
| | | Warmup Ratio | 0.1 | |
| | | Dropout | 0.3 | |
| | | Max Grad Norm | 1.0 | |
| | | Gradient Accumulation | 2 | |
| | | Early Stopping Patience | 2 | |
| | | Mixed Precision | FP16 | |
| |
|
| | --- |
| |
|
| | ## π Usage Example |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | model_name = "silentone0725/text-detector-model-v2" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| | |
| | text = "This paragraph was likely written by a machine learning model." |
| | inputs = tokenizer(text, return_tensors="pt") |
| | outputs = model(**inputs) |
| | pred = torch.argmax(outputs.logits, dim=1).item() |
| | |
| | print("π§ Human" if pred == 0 else "π€ AI") |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π W&B Experiment Tracking |
| |
|
| | Training metrics were logged using **Weights & Biases (W&B)**. |
| | π [View Training Dashboard β](https://wandb.ai/silentone0725-manipal/huggingface) |
| |
|
| | --- |
| |
|
| | ## π Citation |
| |
|
| | If you use this model, please cite it as: |
| |
|
| | ``` |
| | @misc{silentone0725_text_detector_v2_2025, |
| | author = {Thakuria, Daksh}, |
| | title = {Text Detector Model v2 β Fine-Tuned DistilBERT for AI vs Human Text Detection}, |
| | year = {2025}, |
| | howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}}, |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## β οΈ Limitations |
| |
|
| | - Trained only on **English** data. |
| | - May overestimate AI probability on mixed or partially edited text. |
| | - Should not be used for moderation or legal decisions without human verification. |
| |
|
| | --- |
| |
|
| | ## β€οΈ Credits |
| |
|
| | - **Developer:** Daksh Thakuria (`@silentone0725`) |
| | - **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model) |
| | - **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased) |
| | - **Frameworks:** π€ Transformers, PyTorch, W&B |
| |
|
| | --- |
| |
|
| | > π¦ *Last updated:* November 2025 |
| | > π *Developed and fine-tuned in Google Colab with W&B tracking* |
| |
|