Spaces:
Runtime error
Runtime error
| title: AI Content Source Identifier | |
| emoji: π | |
| colorFrom: yellow | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 5.16.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: 'AI Text Classifier: Human vs AI vs Paraphrased' | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| Model Card for AI Content Classification | |
| Model Description | |
| This model classifies text into one of three categories: | |
| Human-Written | |
| AI-Generated | |
| Paraphrased | |
| It leverages the vai0511/ai-content-classifier model, which is based on state-of-the-art NLP techniques and trained on diverse datasets for accurate content identification. | |
| Uses | |
| Direct Use | |
| Detecting AI-generated content | |
| Identifying paraphrased text | |
| Assisting in content moderation | |
| Out-of-Scope Use | |
| β Not suitable for legal or forensic content verification. | |
| β Should not be used as the sole basis for plagiarism detection. | |
| Limitations & Biases | |
| β Potential Bias β The model is trained on a limited dataset, which may not generalize well across all writing styles and languages. | |
| β False Positives/Negatives β AI-generated or paraphrased text may be misclassified. | |
| β Adversarial Attacks β Text with subtle modifications may bypass detection. | |
| Recommendation: Use this model as an assistive tool rather than a definitive classifier. Always verify results manually. | |
| How to Use | |
| Install dependencies: | |
| bash | |
| Copy | |
| Edit | |
| pip install transformers torch | |
| Load the model: | |
| python | |
| Copy | |
| Edit | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| import torch | |
| model_name = "vai0511/ai-content-classifier" | |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| def classify_text(text): | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| predicted_class = torch.argmax(outputs.logits, dim=1).item() | |
| labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"} | |
| return labels[predicted_class] | |
| print(classify_text("This is an example text.")) | |
| Training Details | |
| Base Model: ELECTRA | |
| Dataset: 46,181 text samples | |
| Batch Size: 8 - 16 | |
| Epochs: 3 | |
| Learning Rate: 2e-5 - 3e-5 | |
| Optimizer: AdamW | |
| Max Token Length: 512 | |
| Preprocessing: | |
| Removed duplicates, special characters, and excessive whitespace. | |
| Tokenization performed using Hugging Faceβs AutoTokenizer. | |
| License & Attribution | |
| This model is built upon vai0511/ai-content-classifier, which is licensed under Apache 2.0. | |
| π Original Model: vai0511/ai-content-classifier | |
| π License Details: Apache 2.0 License | |
| Disclaimer | |
| This model is intended for research and educational purposes. It may not always produce accurate results, and users should manually verify its classifications before making critical decisions. | |