shreeramy
Add application file
ec054c0
---
title: AI Content Source Identifier
emoji: πŸ‘€
colorFrom: yellow
colorTo: yellow
sdk: gradio
sdk_version: 5.16.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: 'AI Text Classifier: Human vs AI vs Paraphrased'
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Model Card for AI Content Classification
Model Description
This model classifies text into one of three categories:
Human-Written
AI-Generated
Paraphrased
It leverages the vai0511/ai-content-classifier model, which is based on state-of-the-art NLP techniques and trained on diverse datasets for accurate content identification.
Uses
Direct Use
Detecting AI-generated content
Identifying paraphrased text
Assisting in content moderation
Out-of-Scope Use
❌ Not suitable for legal or forensic content verification.
❌ Should not be used as the sole basis for plagiarism detection.
Limitations & Biases
⚠ Potential Bias – The model is trained on a limited dataset, which may not generalize well across all writing styles and languages.
⚠ False Positives/Negatives – AI-generated or paraphrased text may be misclassified.
⚠ Adversarial Attacks – Text with subtle modifications may bypass detection.
Recommendation: Use this model as an assistive tool rather than a definitive classifier. Always verify results manually.
How to Use
Install dependencies:
bash
Copy
Edit
pip install transformers torch
Load the model:
python
Copy
Edit
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "vai0511/ai-content-classifier"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
def classify_text(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()
labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"}
return labels[predicted_class]
print(classify_text("This is an example text."))
Training Details
Base Model: ELECTRA
Dataset: 46,181 text samples
Batch Size: 8 - 16
Epochs: 3
Learning Rate: 2e-5 - 3e-5
Optimizer: AdamW
Max Token Length: 512
Preprocessing:
Removed duplicates, special characters, and excessive whitespace.
Tokenization performed using Hugging Face’s AutoTokenizer.
License & Attribution
This model is built upon vai0511/ai-content-classifier, which is licensed under Apache 2.0.
πŸ”— Original Model: vai0511/ai-content-classifier
πŸ”— License Details: Apache 2.0 License
Disclaimer
This model is intended for research and educational purposes. It may not always produce accurate results, and users should manually verify its classifications before making critical decisions.