Spaces:

MUSKAN17
/

AI_Content_Source_Identifier

Runtime error

App Files Files Community

AI_Content_Source_Identifier / README.md

shreeramy

Add application file

ec054c0 12 months ago

preview code

raw

history blame contribute delete

2.87 kB

	---
	title: AI Content Source Identifier
	emoji: 👀
	colorFrom: yellow
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.16.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: 'AI Text Classifier: Human vs AI vs Paraphrased'
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


	Model Card for AI Content Classification
	Model Description
	This model classifies text into one of three categories:

	Human-Written
	AI-Generated
	Paraphrased
	It leverages the vai0511/ai-content-classifier model, which is based on state-of-the-art NLP techniques and trained on diverse datasets for accurate content identification.

	Uses
	Direct Use
	Detecting AI-generated content
	Identifying paraphrased text
	Assisting in content moderation
	Out-of-Scope Use
	❌ Not suitable for legal or forensic content verification.
	❌ Should not be used as the sole basis for plagiarism detection.

	Limitations & Biases
	⚠ Potential Bias – The model is trained on a limited dataset, which may not generalize well across all writing styles and languages.
	⚠ False Positives/Negatives – AI-generated or paraphrased text may be misclassified.
	⚠ Adversarial Attacks – Text with subtle modifications may bypass detection.

	Recommendation: Use this model as an assistive tool rather than a definitive classifier. Always verify results manually.

	How to Use
	Install dependencies:

	bash
	Copy
	Edit
	pip install transformers torch
	Load the model:

	python
	Copy
	Edit
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	model_name = "vai0511/ai-content-classifier"
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	def classify_text(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
	with torch.no_grad():
	outputs = model(**inputs)
	predicted_class = torch.argmax(outputs.logits, dim=1).item()
	labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"}
	return labels[predicted_class]

	print(classify_text("This is an example text."))
	Training Details
	Base Model: ELECTRA
	Dataset: 46,181 text samples
	Batch Size: 8 - 16
	Epochs: 3
	Learning Rate: 2e-5 - 3e-5
	Optimizer: AdamW
	Max Token Length: 512
	Preprocessing:

	Removed duplicates, special characters, and excessive whitespace.
	Tokenization performed using Hugging Face’s AutoTokenizer.
	License & Attribution
	This model is built upon vai0511/ai-content-classifier, which is licensed under Apache 2.0.

	🔗 Original Model: vai0511/ai-content-classifier
	🔗 License Details: Apache 2.0 License

	Disclaimer
	This model is intended for research and educational purposes. It may not always produce accurate results, and users should manually verify its classifications before making critical decisions.