Spaces:

MUSKAN17
/

AI_Content_Source_Identifier

Runtime error

App Files Files Community

shreeramy commited on Feb 19, 2025

Commit

ec054c0

1 Parent(s): bf4f13d

Add application file

Browse files

Files changed (4) hide show

.gitattributes +3 -0
README.md +78 -2
app.py +35 -0
requirement.txt +0 -0

.gitattributes CHANGED Viewed

@@ -1,3 +1,6 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text

+# Ignore virtual environments
+env/
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 title: AI Content Source Identifier
 emoji: 👀
-colorFrom: blue
-colorTo: purple
 sdk: gradio
 sdk_version: 5.16.1
 app_file: app.py
@@ -12,3 +12,79 @@ short_description: 'AI Text Classifier: Human vs AI vs Paraphrased'
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: AI Content Source Identifier
 emoji: 👀
+colorFrom: yellow
+colorTo: yellow
 sdk: gradio
 sdk_version: 5.16.1
 app_file: app.py
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+Model Card for AI Content Classification
+Model Description
+This model classifies text into one of three categories:
+Human-Written
+AI-Generated
+Paraphrased
+It leverages the vai0511/ai-content-classifier model, which is based on state-of-the-art NLP techniques and trained on diverse datasets for accurate content identification.
+Uses
+Direct Use
+Detecting AI-generated content
+Identifying paraphrased text
+Assisting in content moderation
+Out-of-Scope Use
+❌ Not suitable for legal or forensic content verification.
+❌ Should not be used as the sole basis for plagiarism detection.
+Limitations & Biases
+⚠ Potential Bias – The model is trained on a limited dataset, which may not generalize well across all writing styles and languages.
+⚠ False Positives/Negatives – AI-generated or paraphrased text may be misclassified.
+⚠ Adversarial Attacks – Text with subtle modifications may bypass detection.
+Recommendation: Use this model as an assistive tool rather than a definitive classifier. Always verify results manually.
+How to Use
+Install dependencies:
+bash
+Copy
+Edit
+pip install transformers torch
+Load the model:
+python
+Copy
+Edit
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+model_name = "vai0511/ai-content-classifier"
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+def classify_text(text):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
+    with torch.no_grad():
+        outputs = model(**inputs)
+    predicted_class = torch.argmax(outputs.logits, dim=1).item()
+    labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"}
+    return labels[predicted_class]
+print(classify_text("This is an example text."))
+Training Details
+Base Model: ELECTRA
+Dataset: 46,181 text samples
+Batch Size: 8 - 16
+Epochs: 3
+Learning Rate: 2e-5 - 3e-5
+Optimizer: AdamW
+Max Token Length: 512
+Preprocessing:
+Removed duplicates, special characters, and excessive whitespace.
+Tokenization performed using Hugging Face’s AutoTokenizer.
+License & Attribution
+This model is built upon vai0511/ai-content-classifier, which is licensed under Apache 2.0.
+🔗 Original Model: vai0511/ai-content-classifier
+🔗 License Details: Apache 2.0 License
+Disclaimer
+This model is intended for research and educational purposes. It may not always produce accurate results, and users should manually verify its classifications before making critical decisions.

app.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import gradio as gr
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+import torch.nn.functional as F
+# Load the Hugging Face model and tokenizer for text classification
+model_name = "vai0511/ai-content-classifier"
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Function to classify text (Synchronous Function)
+def classify_text(text: str):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
+    with torch.no_grad():  # Disable gradient calculations for inference
+        outputs = model(**inputs)
+    logits = outputs.logits  # Raw model predictions (logits)
+    probabilities = F.softmax(logits, dim=1)  # Convert logits to probabilities using softmax
+    percentages = probabilities[0].tolist()  # Convert probabilities to a list for easy access
+    labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"}
+    predicted_class = torch.argmax(logits, dim=1).item()
+    result = labels[predicted_class]
+    percentages = {labels[i]: round(percentages[i] * 100, 2) for i in range(len(percentages))}
+    return result, percentages
+# Create Gradio interface
+iface = gr.Interface(
+    fn=classify_text,
+    inputs=gr.Textbox(label="Enter Text to Classify"),
+    outputs=[gr.Textbox(label="Classification Result"), gr.JSON(label="Classification Percentages")],
+    live=True
+)
+# Launch Gradio interface
+iface.launch()

requirement.txt ADDED Viewed

Binary file (2.18 kB). View file