silentone0725 commited on
Commit
3d9fea5
·
verified ·
1 Parent(s): c700cb9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -0
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-classification
5
+ - ai-detection
6
+ - roberta
7
+ - nlp
8
+ ---
9
+
10
+ # Model Card for roberta-large-openai-detector-custom
11
+
12
+ This model detects **AI-generated vs human-written text** using a fine-tuned RoBERTa-Large architecture trained on modern LLM outputs.
13
+
14
+ ---
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ This model is a **binary text classifier** trained to identify AI-generated content from models such as GPT-4, GPT-3.5, Claude, and LLaMA. It improves over legacy GPT-2 detectors by adapting to modern generative patterns.
21
+
22
+ - **Developed by:** Daksh Thakuria
23
+ - **Model type:** Transformer-based sequence classification (RoBERTa-Large)
24
+ - **Language(s):** English
25
+ - **License:** Apache 2.0
26
+ - **Finetuned from model:** Community RoBERTa GPT-2 Detector
27
+
28
+ ### Model Sources
29
+
30
+ - **Repository:** https://huggingface.co/silentone0725/roberta-large-openai-detector-custom
31
+ - **Training Code:** https://github.com/silentone12725/Ai-Gen-Text-Detect
32
+
33
+ ---
34
+
35
+ ## Uses
36
+
37
+ ### Direct Use
38
+ Detecting AI-generated text in research, moderation, and academic integrity systems.
39
+
40
+ ### Downstream Use
41
+ Integration into content filtering pipelines, analytics tools, or research benchmarks.
42
+
43
+ ### Out-of-Scope Use
44
+ - Legal/forensic authorship claims
45
+ - Fully automated high-stakes decisions
46
+ - Guaranteed detection after heavy paraphrasing
47
+
48
+ ---
49
+
50
+ ## Bias, Risks, and Limitations
51
+
52
+ - May misclassify creative or structured human writing
53
+ - Performance drops under heavy paraphrasing
54
+ - English-focused
55
+ - Surface-text detector (no watermarking)
56
+
57
+ ### Recommendations
58
+ Use as a **decision-support tool**, not a final authority.
59
+
60
+ ---
61
+
62
+ ## How to Get Started
63
+
64
+ ```python
65
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
66
+ import torch
67
+
68
+ model_name = "silentone0725/roberta-large-openai-detector-custom"
69
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
70
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
71
+
72
+ text = "Sample text to evaluate"
73
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
74
+ outputs = model(**inputs)
75
+ prediction = torch.argmax(outputs.logits, dim=1)
76
+ print("AI-generated" if prediction.item() == 1 else "Human-written")
77
+ ```
78
+
79
+ ---
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+ Dataset: https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1
85
+ Contains human text + GPT-4, GPT-3.5, Claude, LLaMA outputs.
86
+
87
+ ### Training Procedure
88
+ Fine-tuned on Google Colab GPUs using PyTorch + HuggingFace Transformers.
89
+
90
+ #### Training Hyperparameters
91
+ - Learning rate: 2e-5
92
+ - Batch size: 8 (effective 16)
93
+ - Epochs: 6
94
+ - Mixed precision: FP16
95
+ - Weight decay: 0.2
96
+ - Dropout: 0.3
97
+
98
+ ---
99
+
100
+ ## Evaluation
101
+
102
+ ### Metrics
103
+
104
+ | Metric | Score |
105
+ |--------|------|
106
+ | Accuracy | 0.5904 |
107
+ | Precision | 0.5087 |
108
+ | Recall | 0.7524 |
109
+ | F1 Score | 0.6070 |
110
+ | AUC | 0.690 |
111
+
112
+ ---
113
+
114
+ ## Environmental Impact
115
+
116
+ - **Hardware Type:** NVIDIA T4 / A100
117
+ - **Cloud Provider:** Google Colab
118
+ - **Compute Region:** Global (Colab infrastructure)
119
+
120
+ ---
121
+
122
+ ## Technical Specifications
123
+
124
+ ### Architecture
125
+ RoBERTa-Large transformer with classification head.
126
+
127
+ ### Software
128
+ PyTorch, Transformers, scikit-learn.
129
+
130
+ ---
131
+
132
+ ## Citation
133
+
134
+ **APA:** Thakuria, D. (2026). AI-Generated Text Detection via Fine-Tuned RoBERTa-Large.
135
+
136
+ ---
137
+
138
+ ## Model Card Authors
139
+ Daksh Thakuria
140
+
141
+ ## Model Card Contact
142
+ Via Hugging Face profile.