followsci's picture
Create README.md
dc41bba verified
---
license: mit
language:
- en
metrics:
- accuracy
- f1
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
tags:
- text-classification
- ai-detection
- academic-text
- ai-generated-text-detection
model-index:
- name: bert-ai-text-detector
results:
- task:
type: text-classification
name: AI-Generated Text Detection
dataset:
name: Custom Academic Text Dataset
type: custom
metrics:
- type: accuracy
value: 0.9957
- type: f1
value: 0.9958
- type: precision
value: 0.9923
- type: recall
value: 0.9994
---
# BERT-based AI-Generated Academic Text Detector
A high-accuracy BERT model for detecting AI-generated academic text with **99.57% accuracy** on paragraph-level samples.
## Online Demo
🌐 **Try the model online**: [https://followsci.com/ai-detection](https://followsci.com/ai-detection)
Free web interface with real-time detection, no installation or API key required.
## Model Details
### Model Description
- **Model Type**: BERT-base-uncased fine-tuned for binary text classification
- **Architecture**: BERT-base-uncased (110M parameters)
- **Task**: Binary classification (Human-written vs AI-generated text)
- **Input**: Academic text paragraphs (up to 512 tokens)
- **Output**: Binary label (0 = Human-written, 1 = AI-generated) with confidence scores
### Training Information
- **Training Samples**: 1,487,400 paragraph-level samples
- **Validation Samples**: 185,930 paragraph-level samples
- **Test Samples**: 185,930 paragraph-level samples
- **Total Dataset**: 1,859,260 paragraphs
- **Training Data**:
- Human-written: Academic papers from arXiv
- AI-generated: Text generated by various large language models (GPT, Claude, etc.)
## Performance
### Test Set Results
| Metric | Value |
|--------|-------|
| **Accuracy** | **99.57%** |
| **F1-Score** | **99.58%** |
| Precision | 99.23% |
| Recall | 99.94% |
| False Positive Rate | 0.82% |
| False Negative Rate | 0.06% |
### Confusion Matrix (Test Set)
| | Predicted: Human | Predicted: AI |
|---|---|---|
| **Actual: Human** | 89,740 (TN) | 740 (FP) |
| **Actual: AI** | 60 (FN) | 95,390 (TP) |
**Inference Speed:** ~20,900 samples/second on RTX 3090 (batch size 64)
## Usage
### Quick Start
```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch
# Load model and tokenizer
model_name = "followsci/bert-ai-text-detector"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
model.eval()
# Detect AI text
text = "Your academic paragraph here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
ai_prob = probs[0][1].item() * 100
human_prob = probs[0][0].item() * 100
print(f"AI-generated probability: {ai_prob:.1f}%")
print(f"Human-written probability: {human_prob:.1f}%")
if ai_prob > 50:
print("Prediction: AI-generated")
else:
print("Prediction: Human-written")
```
### Batch Processing
```python
texts = [
"First paragraph...",
"Second paragraph...",
# ... more texts
]
inputs = tokenizer(
texts,
return_tensors="pt",
truncation=True,
max_length=512,
padding=True
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
for i, prob in enumerate(probs):
ai_prob = prob[1].item() * 100
print(f"Text {i+1}: AI probability = {ai_prob:.1f}%")
```
### Using with Transformers Pipeline
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="followsci/bert-ai-text-detector",
tokenizer="followsci/bert-ai-text-detector"
)
result = classifier("Your text here...")
print(result)
```
## Training Details
### Training Configuration
- **Base Model**: `bert-base-uncased`
- **Batch Size**: 64
- **Learning Rate**: 5e-5 (with linear warmup)
- **Warmup Steps**: 5,000
- **Max Sequence Length**: 512
- **Optimizer**: AdamW
- **Epochs**: 3
- **Training Time**: ~11 hours (on RTX 3090)
### Dataset Distribution
| Split | Total Samples | Human (Label 0) | AI (Label 1) |
|-------|--------------|-----------------|--------------|
| Train | 1,487,400 | 723,780 (48.7%) | 763,620 (51.3%) |
| Validation | 185,930 | 90,470 (48.7%) | 95,460 (51.3%) |
| Test | 185,930 | 90,480 (48.7%) | 95,450 (51.3%) |
## Limitations
1. **Domain Specificity**: The model is trained primarily on academic text. Performance may degrade on:
- Casual text or social media content
- Technical documentation
- Creative writing
2. **Binary Classification**: The model only distinguishes between "human" and "AI" text, without:
- Identifying which AI model generated the text
- Providing confidence intervals
- Detecting partially AI-assisted text
3. **Paragraph-Level Detection**: The model is optimized for paragraph-level samples:
- Performance on sentence-level or full-document level may vary
- Best results achieved with structured academic paragraphs
4. **False Positives**: Approximately 0.82% false positive rate means some human-written text may be flagged as AI-generated.
## Ethical Considerations
- **Use Case**: This model is intended as a tool for academic integrity and research purposes
- **Bias**: The model may reflect biases present in the training data
- **Misuse**: Should not be used as the sole criterion for academic misconduct decisions
- **Transparency**: Results should be interpreted with context and domain expertise
## License
This model is licensed under the MIT License.
## Contact
- **Email**: raffoduanedonnenfeld@gmail.com
---
<p align="center">
Made with ❤️ for Academic Integrity
</p>