shreeramy commited on
Commit
ec054c0
Β·
1 Parent(s): bf4f13d

Add application file

Browse files
Files changed (4) hide show
  1. .gitattributes +3 -0
  2. README.md +78 -2
  3. app.py +35 -0
  4. requirement.txt +0 -0
.gitattributes CHANGED
@@ -1,3 +1,6 @@
 
 
 
1
  *.7z filter=lfs diff=lfs merge=lfs -text
2
  *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
 
1
+ # Ignore virtual environments
2
+ env/
3
+
4
  *.7z filter=lfs diff=lfs merge=lfs -text
5
  *.arrow filter=lfs diff=lfs merge=lfs -text
6
  *.bin filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  title: AI Content Source Identifier
3
  emoji: πŸ‘€
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.16.1
8
  app_file: app.py
@@ -12,3 +12,79 @@ short_description: 'AI Text Classifier: Human vs AI vs Paraphrased'
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: AI Content Source Identifier
3
  emoji: πŸ‘€
4
+ colorFrom: yellow
5
+ colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 5.16.1
8
  app_file: app.py
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+
17
+ Model Card for AI Content Classification
18
+ Model Description
19
+ This model classifies text into one of three categories:
20
+
21
+ Human-Written
22
+ AI-Generated
23
+ Paraphrased
24
+ It leverages the vai0511/ai-content-classifier model, which is based on state-of-the-art NLP techniques and trained on diverse datasets for accurate content identification.
25
+
26
+ Uses
27
+ Direct Use
28
+ Detecting AI-generated content
29
+ Identifying paraphrased text
30
+ Assisting in content moderation
31
+ Out-of-Scope Use
32
+ ❌ Not suitable for legal or forensic content verification.
33
+ ❌ Should not be used as the sole basis for plagiarism detection.
34
+
35
+ Limitations & Biases
36
+ ⚠ Potential Bias – The model is trained on a limited dataset, which may not generalize well across all writing styles and languages.
37
+ ⚠ False Positives/Negatives – AI-generated or paraphrased text may be misclassified.
38
+ ⚠ Adversarial Attacks – Text with subtle modifications may bypass detection.
39
+
40
+ Recommendation: Use this model as an assistive tool rather than a definitive classifier. Always verify results manually.
41
+
42
+ How to Use
43
+ Install dependencies:
44
+
45
+ bash
46
+ Copy
47
+ Edit
48
+ pip install transformers torch
49
+ Load the model:
50
+
51
+ python
52
+ Copy
53
+ Edit
54
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
55
+ import torch
56
+
57
+ model_name = "vai0511/ai-content-classifier"
58
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
59
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
60
+
61
+ def classify_text(text):
62
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
63
+ with torch.no_grad():
64
+ outputs = model(**inputs)
65
+ predicted_class = torch.argmax(outputs.logits, dim=1).item()
66
+ labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"}
67
+ return labels[predicted_class]
68
+
69
+ print(classify_text("This is an example text."))
70
+ Training Details
71
+ Base Model: ELECTRA
72
+ Dataset: 46,181 text samples
73
+ Batch Size: 8 - 16
74
+ Epochs: 3
75
+ Learning Rate: 2e-5 - 3e-5
76
+ Optimizer: AdamW
77
+ Max Token Length: 512
78
+ Preprocessing:
79
+
80
+ Removed duplicates, special characters, and excessive whitespace.
81
+ Tokenization performed using Hugging Face’s AutoTokenizer.
82
+ License & Attribution
83
+ This model is built upon vai0511/ai-content-classifier, which is licensed under Apache 2.0.
84
+
85
+ πŸ”— Original Model: vai0511/ai-content-classifier
86
+ πŸ”— License Details: Apache 2.0 License
87
+
88
+ Disclaimer
89
+ This model is intended for research and educational purposes. It may not always produce accurate results, and users should manually verify its classifications before making critical decisions.
90
+
app.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
3
+ import torch
4
+ import torch.nn.functional as F
5
+
6
+ # Load the Hugging Face model and tokenizer for text classification
7
+ model_name = "vai0511/ai-content-classifier"
8
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
9
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
10
+
11
+ # Function to classify text (Synchronous Function)
12
+ def classify_text(text: str):
13
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
14
+ with torch.no_grad(): # Disable gradient calculations for inference
15
+ outputs = model(**inputs)
16
+
17
+ logits = outputs.logits # Raw model predictions (logits)
18
+ probabilities = F.softmax(logits, dim=1) # Convert logits to probabilities using softmax
19
+ percentages = probabilities[0].tolist() # Convert probabilities to a list for easy access
20
+ labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"}
21
+ predicted_class = torch.argmax(logits, dim=1).item()
22
+ result = labels[predicted_class]
23
+ percentages = {labels[i]: round(percentages[i] * 100, 2) for i in range(len(percentages))}
24
+ return result, percentages
25
+
26
+ # Create Gradio interface
27
+ iface = gr.Interface(
28
+ fn=classify_text,
29
+ inputs=gr.Textbox(label="Enter Text to Classify"),
30
+ outputs=[gr.Textbox(label="Classification Result"), gr.JSON(label="Classification Percentages")],
31
+ live=True
32
+ )
33
+
34
+ # Launch Gradio interface
35
+ iface.launch()
requirement.txt ADDED
Binary file (2.18 kB). View file