|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
## AI-Generated Text Detector |
|
|
This repository contains a RoBERTa-based model trained to distinguish between AI-generated and human-written text. The model can help identify content created by large language models like ChatGPT, Claude, and other AI text generators. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
Architecture: RoBERTa-base fine-tuned for binary classification |
|
|
Task: Detecting whether text is written by a human (0) or generated by AI (1) |
|
|
Training Data: The model was trained on a balanced dataset of human-written and AI-generated texts |
|
|
Input: Text with maximum length of 256 tokens |
|
|
Output: Binary classification with probability score |
|
|
|
|
|
### Use Cases |
|
|
|
|
|
- **Content moderation**: Identify AI-generated content in submissions |
|
|
- **Academic integrity**: Help detect AI-generated essays or assignments |
|
|
- **Research**: Study the differences between human and AI writing patterns |
|
|
- **Media verification**: Support efforts to label AI-generated content |
|
|
|
|
|
### Limitations |
|
|
|
|
|
The model may not perform as well on: |
|
|
- Very short texts |
|
|
- Highly technical or specialized content |
|
|
- Content from newer AI models it wasn't trained on |
|
|
- Text that has been deliberately edited to evade detection |
|
|
|
|
|
Made with ❤️ by Abuzaid |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## How to use |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "Abuzaid01/Ai_Human_text_detect" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# Prepare text for classification |
|
|
text = "Your text to classify goes here." |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True) |
|
|
|
|
|
# Run inference |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
|
|
|
# Get the predicted class and probabilities |
|
|
probabilities = torch.nn.functional.softmax(logits, dim=1) |
|
|
predicted_class_idx = torch.argmax(probabilities, dim=1).item() |
|
|
confidence = probabilities[0][predicted_class_idx].item() |
|
|
|
|
|
# Map class index to label |
|
|
labels = ["Human-written", "AI-generated"] |
|
|
predicted_label = labels[predicted_class_idx] |
|
|
|
|
|
print(f"Prediction: {predicted_label}") |
|
|
print(f"Confidence: {confidence:.4f}") |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|