silentone0725 commited on
Commit
2411fe0
Β·
verified Β·
1 Parent(s): ebdddc5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -42
README.md CHANGED
@@ -31,25 +31,34 @@ tags:
31
  - "huggingface"
32
  ---
33
 
34
- # 🧠 Text Detector Model v2 β€” DistilBERT Fine-Tuned on Human vs AI Text
35
 
36
- This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned DistilBERT classifier** designed to distinguish between **human-written** and **AI-generated** text.
37
- It builds on `silentone0725/text-detector-model` with enhanced regularization, early stopping, and a larger combined dataset for better generalization.
38
 
39
  ---
40
 
41
- ## 🧩 Model Details
 
 
 
 
 
 
 
 
 
 
42
 
43
  | Property | Description |
44
  |-----------|-------------|
45
- | **Base Model** | `text-detector-model` |
46
- | **Architecture** | Transformer-based text classifier |
47
- | **Task** | Binary classification β€” *Human (0)* vs *AI (1)* |
48
  | **Languages** | English |
49
- | **Training Dataset** | Combined version of `silentone0725/ai-human-text-detection-v1` |
50
- | **Split Ratio** | 70% train / 15% validation / 15% test |
51
- | **Frameworks** | πŸ€— Transformers, PyTorch |
52
- | **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Gradient Clipping, Early Stopping |
 
53
 
54
  ---
55
 
@@ -61,21 +70,9 @@ It builds on `silentone0725/text-detector-model` with enhanced regularization, e
61
  | F1-Score | 0.9967 | 0.9967 |
62
  | Eval Loss | 0.0156 | 0.0156 |
63
 
64
- These results indicate very high model confidence and balance, though performance should be further validated on unseen text domains.
65
-
66
  ---
67
 
68
- ## πŸ“Š Dataset Citation
69
-
70
- **Dataset:** [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1)
71
- **Size:** 52,492 samples (balanced)
72
- **Classes:**
73
- - 🧍 Human: 26,246 samples
74
- - πŸ€– AI: 26,246 samples
75
-
76
- ---
77
-
78
- ## 🧠 Training Setup
79
 
80
  | Hyperparameter | Value |
81
  |----------------|--------|
@@ -83,16 +80,16 @@ These results indicate very high model confidence and balance, though performanc
83
  | Batch Size | 8 |
84
  | Epochs | 6 |
85
  | Weight Decay | 0.2 |
86
- | Max Grad Norm | 1.0 |
87
  | Warmup Ratio | 0.1 |
88
  | Dropout | 0.3 |
 
 
89
  | Early Stopping Patience | 2 |
90
  | Mixed Precision | FP16 |
91
- | Optimizer | AdamW |
92
 
93
  ---
94
 
95
- ## πŸš€ How to Use
96
 
97
  ```python
98
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
@@ -103,7 +100,7 @@ model_name = "silentone0725/text-detector-model-v2"
103
  tokenizer = AutoTokenizer.from_pretrained(model_name)
104
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
105
 
106
- text = "This text was written by an intelligent agent."
107
  inputs = tokenizer(text, return_tensors="pt")
108
  outputs = model(**inputs)
109
  pred = torch.argmax(outputs.logits, dim=1).item()
@@ -115,9 +112,8 @@ print("🧍 Human" if pred == 0 else "πŸ€– AI")
115
 
116
  ## πŸ“ˆ W&B Experiment Tracking
117
 
118
- All training and validation metrics were logged to **Weights & Biases (W&B)**.
119
- You can view the training dashboard here:
120
- πŸ”— [W&B Project: silentone0725-manipal/huggingface](https://wandb.ai/silentone0725-manipal/huggingface)
121
 
122
  ---
123
 
@@ -128,29 +124,30 @@ If you use this model, please cite it as:
128
  ```
129
  @misc{silentone0725_text_detector_v2_2025,
130
  author = {Thakuria, Daksh},
131
- title = {Text Detector Model v2 β€” DistilBERT Fine-Tuned for AI vs Human Text Classification},
132
  year = {2025},
133
- howpublished = {\\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
134
  }
135
  ```
136
 
137
  ---
138
 
139
- ## ⚠️ Limitations & Bias
140
 
141
- - The dataset includes only **English** text.
142
- - Overfitting risk is minimized via dropout and early stopping, but may still appear on unseen domains.
143
- - Not intended for legal or automated moderation without human oversight.
144
 
145
  ---
146
 
147
- ## ❀️ Acknowledgements
148
 
149
- - Base model: [DistilBERT (Hugging Face)](https://huggingface.co/distilbert-base-uncased)
150
- - Dataset curation and training: *Daksh Thakuria (silentone0725)*
151
- - Frameworks: πŸ€— Transformers, PyTorch, Weights & Biases
 
152
 
153
  ---
154
 
155
  > πŸ“¦ *Last updated:* November 2025
156
- > πŸš€ *Developed using Colab + Hugging Face + W&B logging pipeline*
 
31
  - "huggingface"
32
  ---
33
 
34
+ # 🧠 Text Detector Model v2 β€” Fine-Tuned AI vs Human Text Classifier
35
 
36
+ This model (`silentone0725/text-detector-model-v2`) is a **fine-tuned text classifier** that distinguishes between **human-written** and **AI-generated** text in English.
37
+ It is trained on a large combined dataset of diverse genres and writing styles, built to generalize well on modern large language model (LLM) outputs.
38
 
39
  ---
40
 
41
+ ## 🧩 Model Lineage
42
+
43
+ | Stage | Model | Description |
44
+ |--------|--------|-------------|
45
+ | **v2** | `silentone0725/text-detector-model-v2` | Fine-tuned with stronger regularization, early stopping, and expanded dataset. |
46
+ | **Base** | `silentone0725/text-detector-model` | Your prior fine-tuned model on GPT-4 & human text dataset. |
47
+ | **Backbone** | `distilbert-base-uncased` | Original pretrained transformer from Hugging Face. |
48
+
49
+ ---
50
+
51
+ ## πŸ“Š Model Details
52
 
53
  | Property | Description |
54
  |-----------|-------------|
55
+ | **Task** | Binary Classification β€” *Human (0)* vs *AI (1)* |
 
 
56
  | **Languages** | English |
57
+ | **Dataset** | [`silentone0725/ai-human-text-detection-v1`](https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1) |
58
+ | **Split Ratio** | 70% Train / 15% Validation / 15% Test |
59
+ | **Regularization** | Dropout = 0.3, Weight Decay = 0.2, Early Stopping = 2 |
60
+ | **Precision** | Mixed FP16 |
61
+ | **Optimizer** | AdamW |
62
 
63
  ---
64
 
 
70
  | F1-Score | 0.9967 | 0.9967 |
71
  | Eval Loss | 0.0156 | 0.0156 |
72
 
 
 
73
  ---
74
 
75
+ ## 🧠 Training Configuration
 
 
 
 
 
 
 
 
 
 
76
 
77
  | Hyperparameter | Value |
78
  |----------------|--------|
 
80
  | Batch Size | 8 |
81
  | Epochs | 6 |
82
  | Weight Decay | 0.2 |
 
83
  | Warmup Ratio | 0.1 |
84
  | Dropout | 0.3 |
85
+ | Max Grad Norm | 1.0 |
86
+ | Gradient Accumulation | 2 |
87
  | Early Stopping Patience | 2 |
88
  | Mixed Precision | FP16 |
 
89
 
90
  ---
91
 
92
+ ## πŸš€ Usage Example
93
 
94
  ```python
95
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
 
100
  tokenizer = AutoTokenizer.from_pretrained(model_name)
101
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
102
 
103
+ text = "This paragraph was likely written by a machine learning model."
104
  inputs = tokenizer(text, return_tensors="pt")
105
  outputs = model(**inputs)
106
  pred = torch.argmax(outputs.logits, dim=1).item()
 
112
 
113
  ## πŸ“ˆ W&B Experiment Tracking
114
 
115
+ Training metrics were logged using **Weights & Biases (W&B)**.
116
+ πŸ“Š [View Training Dashboard β†’](https://wandb.ai/silentone0725-manipal/huggingface)
 
117
 
118
  ---
119
 
 
124
  ```
125
  @misc{silentone0725_text_detector_v2_2025,
126
  author = {Thakuria, Daksh},
127
+ title = {Text Detector Model v2 β€” Fine-Tuned DistilBERT for AI vs Human Text Detection},
128
  year = {2025},
129
+ howpublished = {\url{https://huggingface.co/silentone0725/text-detector-model-v2}},
130
  }
131
  ```
132
 
133
  ---
134
 
135
+ ## ⚠️ Limitations
136
 
137
+ - Trained only on **English** data.
138
+ - May overestimate AI probability on mixed or partially edited text.
139
+ - Should not be used for moderation or legal decisions without human verification.
140
 
141
  ---
142
 
143
+ ## ❀️ Credits
144
 
145
+ - **Developer:** Daksh Thakuria (`@silentone0725`)
146
+ - **Base Model:** [`silentone0725/text-detector-model`](https://huggingface.co/silentone0725/text-detector-model)
147
+ - **Backbone:** [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased)
148
+ - **Frameworks:** πŸ€— Transformers, PyTorch, W&B
149
 
150
  ---
151
 
152
  > πŸ“¦ *Last updated:* November 2025
153
+ > πŸš€ *Developed and fine-tuned in Google Colab with W&B tracking*