silentone0725
/

roberta-large-openai-detector-custom

+---
+library_name: transformers
+tags:
+  - text-classification
+  - ai-detection
+  - roberta
+  - nlp
+---
+# Model Card for roberta-large-openai-detector-custom
+This model detects **AI-generated vs human-written text** using a fine-tuned RoBERTa-Large architecture trained on modern LLM outputs.
+---
+## Model Details
+### Model Description
+This model is a **binary text classifier** trained to identify AI-generated content from models such as GPT-4, GPT-3.5, Claude, and LLaMA. It improves over legacy GPT-2 detectors by adapting to modern generative patterns.
+- **Developed by:** Daksh Thakuria
+- **Model type:** Transformer-based sequence classification (RoBERTa-Large)
+- **Language(s):** English
+- **License:** Apache 2.0
+- **Finetuned from model:** Community RoBERTa GPT-2 Detector
+### Model Sources
+- **Repository:** https://huggingface.co/silentone0725/roberta-large-openai-detector-custom
+- **Training Code:** https://github.com/silentone12725/Ai-Gen-Text-Detect
+---
+## Uses
+### Direct Use
+Detecting AI-generated text in research, moderation, and academic integrity systems.
+### Downstream Use
+Integration into content filtering pipelines, analytics tools, or research benchmarks.
+### Out-of-Scope Use
+- Legal/forensic authorship claims
+- Fully automated high-stakes decisions
+- Guaranteed detection after heavy paraphrasing
+---
+## Bias, Risks, and Limitations
+- May misclassify creative or structured human writing
+- Performance drops under heavy paraphrasing
+- English-focused
+- Surface-text detector (no watermarking)
+### Recommendations
+Use as a **decision-support tool**, not a final authority.
+---
+## How to Get Started
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_name = "silentone0725/roberta-large-openai-detector-custom"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+text = "Sample text to evaluate"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
+outputs = model(**inputs)
+prediction = torch.argmax(outputs.logits, dim=1)
+print("AI-generated" if prediction.item() == 1 else "Human-written")
+```
+---
+## Training Details
+### Training Data
+Dataset: https://huggingface.co/datasets/silentone0725/ai-human-text-detection-v1
+Contains human text + GPT-4, GPT-3.5, Claude, LLaMA outputs.
+### Training Procedure
+Fine-tuned on Google Colab GPUs using PyTorch + HuggingFace Transformers.
+#### Training Hyperparameters
+- Learning rate: 2e-5
+- Batch size: 8 (effective 16)
+- Epochs: 6
+- Mixed precision: FP16
+- Weight decay: 0.2
+- Dropout: 0.3
+---
+## Evaluation
+### Metrics
+| Metric | Score |
+|--------|------|
+| Accuracy | 0.5904 |
+| Precision | 0.5087 |
+| Recall | 0.7524 |
+| F1 Score | 0.6070 |
+| AUC | 0.690 |
+---
+## Environmental Impact
+- **Hardware Type:** NVIDIA T4 / A100
+- **Cloud Provider:** Google Colab
+- **Compute Region:** Global (Colab infrastructure)
+---
+## Technical Specifications
+### Architecture
+RoBERTa-Large transformer with classification head.
+### Software
+PyTorch, Transformers, scikit-learn.
+---
+## Citation
+**APA:** Thakuria, D. (2026). AI-Generated Text Detection via Fine-Tuned RoBERTa-Large.
+---
+## Model Card Authors
+Daksh Thakuria
+## Model Card Contact
+Via Hugging Face profile.