silentone0725
/

text-detector-model-v2

@@ -31,25 +31,34 @@ tags:
   - "huggingface"
 ---
-# 🧠 Text Detector Model v2 — DistilBERT Fine-Tuned on Human vs AI Text
-This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned DistilBERT classifier** designed to distinguish between **human-written** and **AI-generated** text.
-It builds on `silentone0725/text-detector-model` with enhanced regularization, early stopping, and a larger combined dataset for better generalization.
 ---
-## 🧩 Model Details
 | Property | Description |
 |-----------|-------------|
-| **Base Model** | `text-detector-model` |
-| **Architecture** | Transformer-based text classifier |
-| **Task** | Binary classification — *Human (0)* vs *AI (1)* |
 | **Languages** | English |
-| **Training Dataset** | Combined version of `silentone0725/ai-human-text-detection-v1` |
-| **Split Ratio** | 70% train / 15% validation / 15% test |
-| **Frameworks** | 🤗 Transformers, PyTorch |
-| **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Gradient Clipping, Early Stopping |
 ---
@@ -61,21 +70,9 @@ It builds on `silentone0725/text-detector-model` with enhanced regularization, e
 | F1-Score | 0.9967 | 0.9967 |
 | Eval Loss | 0.0156 | 0.0156 |
-These results indicate very high model confidence and balance, though performance should be further validated on unseen text domains.
 ---
-## 📊 Dataset Citation
-**Dataset:** [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1)
-**Size:** 52,492 samples (balanced)
-**Classes:**
-- 🧍 Human: 26,246 samples
-- 🤖 AI: 26,246 samples
----
-## 🧠 Training Setup
 | Hyperparameter | Value |
 |----------------|--------|
@@ -83,16 +80,16 @@ These results indicate very high model confidence and balance, though performanc
 | Batch Size | 8 |
 | Epochs | 6 |
 | Weight Decay | 0.2 |
-| Max Grad Norm | 1.0 |
 | Warmup Ratio | 0.1 |
 | Dropout | 0.3 |
 | Early Stopping Patience | 2 |
 | Mixed Precision | FP16 |
-| Optimizer | AdamW |
 ---
-## 🚀 How to Use
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
@@ -103,7 +100,7 @@ model_name = "silentone0725/text-detector-model-v2"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
-text = "This text was written by an intelligent agent."
 inputs = tokenizer(text, return_tensors="pt")
 outputs = model(**inputs)
 pred = torch.argmax(outputs.logits, dim=1).item()
@@ -115,9 +112,8 @@ print("🧍 Human" if pred == 0 else "🤖 AI")
 ## 📈 W&B Experiment Tracking
-All training and validation metrics were logged to **Weights & Biases (W&B)**.
-You can view the training dashboard here:
-🔗 [W&B Project: silentone0725-manipal/huggingface](https://wandb.ai/silentone0725-manipal/huggingface)
 ---
@@ -128,29 +124,30 @@ If you use this model, please cite it as:
 ```
 @misc{silentone0725_text_detector_v2_2025,
   author = {Thakuria, Daksh},
-  title = {Text Detector Model v2 — DistilBERT Fine-Tuned for AI vs Human Text Classification},
   year = {2025},
-  howpublished = {\\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
 }
 ```
 ---
-## ⚠️ Limitations & Bias
-- The dataset includes only **English** text.
-- Overfitting risk is minimized via dropout and early stopping, but may still appear on unseen domains.
-- Not intended for legal or automated moderation without human oversight.
 ---
-## ❤️ Acknowledgements
-- Base model: [DistilBERT (Hugging Face)](https://huggingface.co/distilbert-base-uncased)
-- Dataset curation and training: *Daksh Thakuria (silentone0725)*
-- Frameworks: 🤗 Transformers, PyTorch, Weights & Biases
 ---
 > 📦 *Last updated:* November 2025
-> 🚀 *Developed using Colab + Hugging Face + W&B logging pipeline*

   - "huggingface"
 ---
+# 🧠 Text Detector Model v2 — Fine-Tuned AI vs Human Text Classifier
+This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English.
+It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs.
 ---
+## 🧩 Model Lineage
+| Stage | Model | Description |
+|--------|--------|-------------|
+| **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. |
+| **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. |
+| **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. |
+---
+## 📊 Model Details
 | Property | Description |
 |-----------|-------------|
+| **Task** | Binary Classification — *Human (0)* vs *AI (1)* |
 | **Languages** | English |
+| **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) |
+| **Split Ratio** | 70% Train / 15% Validation / 15% Test |
+| **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 |
+| **Precision** | Mixed FP16 |
+| **Optimizer** | AdamW |
 ---
 | F1-Score | 0.9967 | 0.9967 |
 | Eval Loss | 0.0156 | 0.0156 |
 ---
+## 🧠 Training Configuration
 | Hyperparameter | Value |
 |----------------|--------|
 | Batch Size | 8 |
 | Epochs | 6 |
 | Weight Decay | 0.2 |
 | Warmup Ratio | 0.1 |
 | Dropout | 0.3 |
+| Max Grad Norm | 1.0 |
+| Gradient Accumulation | 2 |
 | Early Stopping Patience | 2 |
 | Mixed Precision | FP16 |
 ---
+## 🚀 Usage Example
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForSequenceClassification.from_pretrained(model_name)
+text = "This paragraph was likely written by a machine learning model."
 inputs = tokenizer(text, return_tensors="pt")
 outputs = model(**inputs)
 pred = torch.argmax(outputs.logits, dim=1).item()
 ## 📈 W&B Experiment Tracking
+Training metrics were logged using **Weights & Biases (W&B)**.
+📊 [View Training Dashboard →](https://wandb.ai/silentone0725-manipal/huggingface)
 ---
 ```
 @misc{silentone0725_text_detector_v2_2025,
   author = {Thakuria, Daksh},
+  title = {Text Detector Model v2 — Fine-Tuned DistilBERT for AI vs Human Text Detection},
   year = {2025},
+  howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
 }
 ```
 ---
+## ⚠️ Limitations
+- Trained only on **English** data.
+- May overestimate AI probability on mixed or partially edited text.
+- Should not be used for moderation or legal decisions without human verification.
 ---
+## ❤️ Credits
+- **Developer:** Daksh Thakuria (`@silentone0725`)
+- **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model)
+- **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
+- **Frameworks:** 🤗 Transformers, PyTorch, W&B
 ---
 > 📦 *Last updated:* November 2025
+> 🚀 *Developed and fine-tuned in Google Colab with W&B tracking*