ahmadreza13/human-vs-Ai-generated-dataset
Viewer • Updated • 3.61M • 70 • 9
This model is designed to detect AI-generated content by analyzing text using a combination of RoBERTa embeddings, Word2Vec embeddings, and engineered linguistic features.
The model utilizes a hybrid architecture that combines:
The model architecture consists of:
| Metric | Score |
|---|---|
| Precision | {f1:0.9079} |
| Recall | {f1:0.9089} |
| F1 Score | {f1:0.907} |
| ROC AUC | {roc_auc:0.908} |
This model is intended to be used as a tool for:
This model should NOT be used to:
# Load model and tokenizer
from transformers import RobertaTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
def predict_with_huggingface_model(text, repo_id="prasoonmhwr/ai_detection_model", device="cuda"):
"""
Predicts using a model from the Hugging Face Model Hub.
Args:
text (str): The text to predict on.
repo_id (str): The repository ID of the model on Hugging Face Hub.
device (str): "cuda" if GPU is available, "cpu" otherwise
Returns:
float: The prediction probability (between 0 and 1).
"""
# 1. Load the tokenizer
tokenizer = RobertaTokenizer.from_pretrained(repo_id)
# 2. Load the model
model = AutoModelForSequenceClassification.from_pretrained(repo_id).to(device)
model.eval() # Set the model to evaluation mode
# 3. Tokenize the input text
inputs = tokenizer(text,
add_special_tokens=True,
max_length=128,
padding='max_length',
truncation=True,
return_tensors='pt').to(device) # Move inputs to device
# 4. Make the prediction (no gradient calculation needed)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.sigmoid(logits).cpu().numpy().flatten() # Get probabilities, move to CPU
return probabilities[0] # Return the probability for the positive class
if __name__ == '__main__':
# Example usage:
text_to_predict = "This is a sample text to check if it was written by a human or AI"
# text_to_predict = "This text was generated by an AI model." # uncomment to test on an AI generated text
# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"
repo_id = "prasoonmhwr/ai_detection_model"
# Make the prediction
prediction = predict_with_huggingface_model(text_to_predict, repo_id, device)
# Print the result
print(f"Text: '{text_to_predict}'")
print(f"Prediction (Probability of being AI-generated): {prediction:.4f}")
if prediction > 0.5:
print("The model predicts this text is likely AI-generated.")
else:
print("The model predicts this text is likely human-generated.")
If you use this model in your research, please cite:
@misc{ai_detection_model,
author = {Prasoon Mahawar},
title = {AI-Generated Content Detection Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/prasoonmhwr/ai_detection_model}
}