|
|
---
|
|
|
title: AI Content Source Identifier
|
|
|
emoji: π
|
|
|
colorFrom: yellow
|
|
|
colorTo: yellow
|
|
|
sdk: gradio
|
|
|
sdk_version: 5.16.1
|
|
|
app_file: app.py
|
|
|
pinned: false
|
|
|
license: apache-2.0
|
|
|
short_description: 'AI Text Classifier: Human vs AI vs Paraphrased'
|
|
|
---
|
|
|
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
Model Card for AI Content Classification
|
|
|
Model Description
|
|
|
This model classifies text into one of three categories:
|
|
|
|
|
|
Human-Written
|
|
|
AI-Generated
|
|
|
Paraphrased
|
|
|
It leverages the vai0511/ai-content-classifier model, which is based on state-of-the-art NLP techniques and trained on diverse datasets for accurate content identification.
|
|
|
|
|
|
Uses
|
|
|
Direct Use
|
|
|
Detecting AI-generated content
|
|
|
Identifying paraphrased text
|
|
|
Assisting in content moderation
|
|
|
Out-of-Scope Use
|
|
|
β Not suitable for legal or forensic content verification.
|
|
|
β Should not be used as the sole basis for plagiarism detection.
|
|
|
|
|
|
Limitations & Biases
|
|
|
β Potential Bias β The model is trained on a limited dataset, which may not generalize well across all writing styles and languages.
|
|
|
β False Positives/Negatives β AI-generated or paraphrased text may be misclassified.
|
|
|
β Adversarial Attacks β Text with subtle modifications may bypass detection.
|
|
|
|
|
|
Recommendation: Use this model as an assistive tool rather than a definitive classifier. Always verify results manually.
|
|
|
|
|
|
How to Use
|
|
|
Install dependencies:
|
|
|
|
|
|
bash
|
|
|
Copy
|
|
|
Edit
|
|
|
pip install transformers torch
|
|
|
Load the model:
|
|
|
|
|
|
python
|
|
|
Copy
|
|
|
Edit
|
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
|
|
import torch
|
|
|
|
|
|
model_name = "vai0511/ai-content-classifier"
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
|
|
|
def classify_text(text):
|
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
|
|
|
with torch.no_grad():
|
|
|
outputs = model(**inputs)
|
|
|
predicted_class = torch.argmax(outputs.logits, dim=1).item()
|
|
|
labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"}
|
|
|
return labels[predicted_class]
|
|
|
|
|
|
print(classify_text("This is an example text."))
|
|
|
Training Details
|
|
|
Base Model: ELECTRA
|
|
|
Dataset: 46,181 text samples
|
|
|
Batch Size: 8 - 16
|
|
|
Epochs: 3
|
|
|
Learning Rate: 2e-5 - 3e-5
|
|
|
Optimizer: AdamW
|
|
|
Max Token Length: 512
|
|
|
Preprocessing:
|
|
|
|
|
|
Removed duplicates, special characters, and excessive whitespace.
|
|
|
Tokenization performed using Hugging Faceβs AutoTokenizer.
|
|
|
License & Attribution
|
|
|
This model is built upon vai0511/ai-content-classifier, which is licensed under Apache 2.0.
|
|
|
|
|
|
π Original Model: vai0511/ai-content-classifier
|
|
|
π License Details: Apache 2.0 License
|
|
|
|
|
|
Disclaimer
|
|
|
This model is intended for research and educational purposes. It may not always produce accurate results, and users should manually verify its classifications before making critical decisions.
|
|
|
|
|
|
|