Spaces:
Sleeping
Sleeping
File size: 30,426 Bytes
30c60ea 59f5880 30c60ea 45d10f4 f659ec0 4743b44 30c60ea f659ec0 45d10f4 59f5880 f659ec0 59f5880 f659ec0 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 59f5880 f659ec0 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 f659ec0 59f5880 4743b44 f304cbc 59f5880 f659ec0 59f5880 f659ec0 59f5880 4743b44 59f5880 4743b44 59f5880 5234a81 59f5880 4743b44 59f5880 5234a81 59f5880 5234a81 59f5880 f659ec0 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 4743b44 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 f659ec0 59f5880 f304cbc 4743b44 59f5880 f659ec0 f304cbc f659ec0 59f5880 4743b44 59f5880 5234a81 59f5880 5234a81 f659ec0 4743b44 f659ec0 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 5234a81 59f5880 f659ec0 59f5880 f659ec0 59f5880 5234a81 59f5880 4743b44 5234a81 59f5880 f659ec0 59f5880 5234a81 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 4743b44 59f5880 4743b44 59f5880 5234a81 59f5880 5234a81 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 4743b44 59f5880 f659ec0 59f5880 f659ec0 59f5880 f659ec0 59f5880 4743b44 59f5880 4743b44 59f5880 4743b44 59f5880 f659ec0 59f5880 f659ec0 59f5880 5234a81 59f5880 5234a81 59f5880 f659ec0 59f5880 f659ec0 f304cbc 30c60ea f659ec0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 |
import gradio as gr
import torch
import torch.nn.functional as F
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
GPT2LMHeadModel,
GPT2TokenizerFast,
BertTokenizer,
BertForSequenceClassification
)
import numpy as np
import re
from collections import Counter
import math
import warnings
warnings.filterwarnings('ignore')
class AdvancedAITextDetector:
def __init__(self):
"""Initialize with multiple specialized detection models"""
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.models = {}
self.tokenizers = {}
self.load_all_models()
def load_all_models(self):
"""Load ensemble of detection models"""
print("Loading detection models...")
# Priority 1: GPTZero-like detection using DeBERTa
try:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "unitary/unbiased-toxic-roberta" # Fallback model
# Try to load a better model if available
try:
model_name = "PirateXX/AI-Content-Detector"
self.tokenizers['pirate'] = AutoTokenizer.from_pretrained(model_name)
self.models['pirate'] = AutoModelForSequenceClassification.from_pretrained(model_name)
self.models['pirate'].to(self.device)
self.models['pirate'].eval()
print("β Loaded PirateXX AI detector")
except:
pass
except Exception as e:
print(f"Could not load priority model: {e}")
# Priority 2: Synthetic text detector
try:
model_name = "Hello-SimpleAI/chatgpt-detector-roberta-chinese" # Multi-lingual tends to be better
self.tokenizers['multilingual'] = AutoTokenizer.from_pretrained(model_name)
self.models['multilingual'] = AutoModelForSequenceClassification.from_pretrained(model_name)
self.models['multilingual'].to(self.device)
self.models['multilingual'].eval()
print("β Loaded multilingual detector")
except:
try:
# Fallback to English version
model_name = "Hello-SimpleAI/chatgpt-detector-roberta"
self.tokenizers['roberta_detector'] = AutoTokenizer.from_pretrained(model_name)
self.models['roberta_detector'] = AutoModelForSequenceClassification.from_pretrained(model_name)
self.models['roberta_detector'].to(self.device)
self.models['roberta_detector'].eval()
print("β Loaded SimpleAI ChatGPT detector")
except Exception as e:
print(f"Could not load SimpleAI detector: {e}")
# Priority 3: OpenAI's detector
try:
model_name = "roberta-base-openai-detector"
self.tokenizers['openai'] = AutoTokenizer.from_pretrained(model_name)
self.models['openai'] = AutoModelForSequenceClassification.from_pretrained(model_name)
self.models['openai'].to(self.device)
self.models['openai'].eval()
print("β Loaded OpenAI RoBERTa detector")
except Exception as e:
print(f"Could not load OpenAI detector: {e}")
# Priority 4: GPT-2 for perplexity
try:
self.tokenizers['gpt2'] = GPT2TokenizerFast.from_pretrained("gpt2-medium")
self.models['gpt2'] = GPT2LMHeadModel.from_pretrained("gpt2-medium")
self.models['gpt2'].to(self.device)
self.models['gpt2'].eval()
self.tokenizers['gpt2'].pad_token = self.tokenizers['gpt2'].eos_token
print("β Loaded GPT-2 Medium for perplexity")
except:
try:
self.tokenizers['gpt2'] = GPT2TokenizerFast.from_pretrained("gpt2")
self.models['gpt2'] = GPT2LMHeadModel.from_pretrained("gpt2")
self.models['gpt2'].to(self.device)
self.models['gpt2'].eval()
self.tokenizers['gpt2'].pad_token = self.tokenizers['gpt2'].eos_token
print("β Loaded GPT-2 for perplexity")
except Exception as e:
print(f"Could not load GPT-2: {e}")
if not self.models:
print("WARNING: No models loaded, using statistical methods only")
def calculate_perplexity(self, text):
"""Calculate perplexity - lower values indicate AI text"""
if 'gpt2' not in self.models:
return None
try:
encodings = self.tokenizers['gpt2'](
text,
return_tensors='pt',
truncation=True,
max_length=512,
padding=True
).to(self.device)
with torch.no_grad():
outputs = self.models['gpt2'](**encodings, labels=encodings.input_ids)
loss = outputs.loss
perplexity = torch.exp(loss).item()
# Lower perplexity (< 30) strongly suggests AI
# Higher perplexity (> 50) suggests human
if perplexity < 20:
return 0.9 # Very likely AI
elif perplexity < 30:
return 0.7 # Likely AI
elif perplexity < 50:
return 0.5 # Uncertain
elif perplexity < 100:
return 0.3 # Likely human
else:
return 0.1 # Very likely human
except Exception as e:
print(f"Perplexity calculation error: {e}")
return None
def detect_with_model(self, text, model_name):
"""Generic detection using any loaded model"""
if model_name not in self.models:
return None
try:
inputs = self.tokenizers[model_name](
text,
return_tensors="pt",
truncation=True,
max_length=512,
padding=True
).to(self.device)
with torch.no_grad():
outputs = self.models[model_name](**inputs)
logits = outputs.logits
# Handle different model output formats
if model_name == 'openai':
# OpenAI detector: 0=Real, 1=Fake
probs = F.softmax(logits, dim=-1)
ai_prob = probs[0][1].item()
elif model_name in ['roberta_detector', 'multilingual']:
# SimpleAI: typically 1=AI
probs = F.softmax(logits, dim=-1)
ai_prob = probs[0][1].item() if probs.shape[1] > 1 else probs[0][0].item()
elif model_name == 'pirate':
# May have different class arrangement
probs = F.softmax(logits, dim=-1)
# Assuming binary classification
ai_prob = probs[0][1].item() if probs.shape[1] > 1 else probs[0][0].item()
else:
probs = F.softmax(logits, dim=-1)
ai_prob = probs[0][1].item() if probs.shape[1] > 1 else 0.5
return ai_prob
except Exception as e:
print(f"Error with {model_name}: {e}")
return None
def advanced_linguistic_analysis(self, text):
"""Comprehensive linguistic analysis for AI detection"""
scores = {}
# 1. Sentence-level analysis
sentences = [s.strip() for s in re.split(r'[.!?]+', text) if s.strip()]
if len(sentences) > 1:
# Sentence length variance (AI is more consistent)
sent_lengths = [len(s.split()) for s in sentences]
scores['sent_length_std'] = np.std(sent_lengths) / (np.mean(sent_lengths) + 1)
# Sentence starter diversity (AI often starts sentences similarly)
starters = [s.split()[0].lower() for s in sentences if s.split()]
starter_diversity = len(set(starters)) / len(starters) if starters else 0
scores['starter_diversity'] = starter_diversity
# Human writing indicators - sentence length variety
short_sentences = sum(1 for length in sent_lengths if length < 8)
long_sentences = sum(1 for length in sent_lengths if length > 20)
scores['sentence_variety'] = (short_sentences + long_sentences) / len(sent_lengths)
# Conversational patterns (human indicators)
conversational_starters = ['so', 'well', 'actually', 'basically', 'like', 'you know', 'i mean', 'anyway']
conv_count = sum(1 for starter in starters if starter in conversational_starters)
scores['conversational_patterns'] = conv_count / len(starters) if starters else 0
# 2. N-gram analysis
words = text.lower().split()
if len(words) > 3:
# Trigram repetition (AI repeats phrases more)
trigrams = [tuple(words[i:i+3]) for i in range(len(words)-2)]
trigram_counts = Counter(trigrams)
repeated_trigrams = sum(1 for c in trigram_counts.values() if c > 1)
scores['trigram_repetition'] = repeated_trigrams / len(trigrams) if trigrams else 0
# Bigram diversity
bigrams = [tuple(words[i:i+2]) for i in range(len(words)-1)]
bigram_diversity = len(set(bigrams)) / len(bigrams) if bigrams else 0
scores['bigram_diversity'] = bigram_diversity
# 3. ChatGPT-specific patterns
chatgpt_score = 0
human_score = 0
# Common ChatGPT phrases (weighted by specificity)
high_confidence_phrases = [
"it's important to note", "it's worth noting", "it's crucial to",
"in conclusion", "to summarize", "in summary",
"let me explain", "let me break", "I'll explain",
"here's a", "here are some", "this involves",
"additionally", "furthermore", "moreover",
"essentially", "basically", "fundamentally",
"it's essential to", "remember that", "keep in mind"
]
medium_confidence_phrases = [
"however", "therefore", "thus", "hence",
"for example", "for instance", "specifically",
"generally", "typically", "usually", "often",
"in other words", "that being said", "that said"
]
# Human writing indicators
human_indicators = [
"i think", "i feel", "i believe", "i guess", "i suppose",
"honestly", "frankly", "personally", "in my opinion",
"you know", "right", "like", "um", "uh", "well",
"actually", "basically", "literally", "totally", "really",
"so", "anyway", "btw", "lol", "haha", "omg"
]
text_lower = text.lower()
# Check high confidence phrases
for phrase in high_confidence_phrases:
if phrase in text_lower:
chatgpt_score += 0.15
# Check medium confidence phrases
for phrase in medium_confidence_phrases:
if phrase in text_lower:
chatgpt_score += 0.08
# Check human indicators
for phrase in human_indicators:
if phrase in text_lower:
human_score += 0.1
# Check for structured lists (very common in ChatGPT)
has_numbered = bool(re.search(r'\n\s*\d+[\.\)]\s', text))
has_bullets = bool(re.search(r'\n\s*[-β’*]\s', text))
has_colons = text.count(':') > 2
if has_numbered:
chatgpt_score += 0.25
if has_bullets:
chatgpt_score += 0.20
if has_colons:
chatgpt_score += 0.10
# Formal tone indicators (AI) vs informal (human)
formal_words = ['utilize', 'implement', 'facilitate', 'enhance', 'optimize',
'comprehensive', 'significant', 'substantial', 'various', 'numerous']
informal_words = ['gonna', 'wanna', 'gotta', 'kinda', 'sorta', 'yeah', 'nah',
'awesome', 'cool', 'sucks', 'crazy', 'insane', 'ridiculous']
formal_count = sum(1 for word in formal_words if word in text_lower)
informal_count = sum(1 for word in informal_words if word in text_lower)
chatgpt_score += min(formal_count * 0.05, 0.25)
human_score += min(informal_count * 0.08, 0.3)
# Contractions and casual language
contractions = ['don\'t', 'won\'t', 'can\'t', 'isn\'t', 'aren\'t', 'wasn\'t', 'weren\'t',
'i\'m', 'you\'re', 'he\'s', 'she\'s', 'it\'s', 'we\'re', 'they\'re']
contraction_count = sum(1 for word in contractions if word in text_lower)
human_score += min(contraction_count * 0.05, 0.2)
scores['chatgpt_patterns'] = min(chatgpt_score, 1.0)
scores['human_patterns'] = min(human_score, 1.0)
# 4. Complexity uniformity (AI has uniform complexity)
if len(sentences) > 2:
complexities = []
for sent in sentences:
words_in_sent = sent.split()
if words_in_sent:
avg_word_len = np.mean([len(w) for w in words_in_sent])
complexity = len(words_in_sent) * avg_word_len / 5
complexities.append(complexity)
if complexities:
cv = np.std(complexities) / (np.mean(complexities) + 1)
scores['complexity_variance'] = cv
# 5. Paragraph structure (AI has consistent paragraphs)
paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
if len(paragraphs) > 1:
para_lengths = [len(p.split()) for p in paragraphs]
para_cv = np.std(para_lengths) / (np.mean(para_lengths) + 1)
scores['paragraph_consistency'] = 1 - min(para_cv, 1.0)
# Calculate final statistical score
# Weight the features based on their importance for detection
weights = {
'chatgpt_patterns': 0.25, # AI patterns
'human_patterns': -0.20, # Human patterns (negative weight)
'sent_length_std': -0.10, # Lower std = more AI
'starter_diversity': -0.08, # Lower diversity = more AI
'sentence_variety': -0.12, # More variety = more human
'conversational_patterns': -0.15, # More conversational = more human
'trigram_repetition': 0.10, # More repetition = more AI
'bigram_diversity': -0.08, # Lower diversity = more AI
'complexity_variance': -0.08, # Lower variance = more AI
'paragraph_consistency': 0.10 # More consistency = more AI
}
final_score = 0.5 # Start neutral
for feature, value in scores.items():
if feature in weights:
weight = weights[feature]
if weight < 0:
# Inverse relationship - human indicators reduce AI score
final_score += abs(weight) * (1 - value)
else:
# Direct relationship - AI indicators increase AI score
final_score += weight * value
# Apply confidence boost for strong human indicators
if scores.get('human_patterns', 0) > 0.3 and scores.get('conversational_patterns', 0) > 0.2:
final_score *= 0.7 # Reduce AI probability for strong human indicators
# Apply confidence boost for strong AI indicators
if scores.get('chatgpt_patterns', 0) > 0.4 and scores.get('paragraph_consistency', 0) > 0.7:
final_score = min(final_score * 1.2, 0.95) # Increase AI probability
return min(max(final_score, 0), 1), scores
def detect(self, text):
"""Main detection combining all methods"""
if not text or len(text.strip()) < 30:
return {
"ai_probability": 50.0,
"classification": "Text Too Short",
"confidence": "N/A",
"explanation": "Please provide at least 30 characters of text for analysis.",
"detailed_scores": {}
}
all_scores = []
all_weights = []
model_results = {}
# 1. Try each model
model_weights = {
'pirate': 0.30, # If specialized detector available
'openai': 0.25, # OpenAI's own detector
'multilingual': 0.20, # Multilingual detector
'roberta_detector': 0.20,
'perplexity': 0.25
}
# Get model predictions
for model_name in ['pirate', 'openai', 'multilingual', 'roberta_detector']:
if model_name in self.models:
score = self.detect_with_model(text, model_name)
if score is not None:
all_scores.append(score)
all_weights.append(model_weights.get(model_name, 0.15))
model_results[model_name] = score
# Get perplexity score
perp_score = self.calculate_perplexity(text)
if perp_score is not None:
all_scores.append(perp_score)
all_weights.append(model_weights['perplexity'])
model_results['perplexity'] = perp_score
# 2. Statistical analysis
stat_score, stat_details = self.advanced_linguistic_analysis(text)
all_scores.append(stat_score)
all_weights.append(0.20)
model_results['statistical'] = stat_score
# 3. Calculate weighted final score
if all_scores:
# Normalize weights
total_weight = sum(all_weights)
normalized_weights = [w/total_weight for w in all_weights]
# Weighted average
final_score = sum(s * w for s, w in zip(all_scores, normalized_weights))
# Apply model agreement boost with more conservative thresholds
agreement_scores = [s for s in all_scores if s > 0.75 or s < 0.25]
if len(agreement_scores) >= 2:
avg_agreement = np.mean(agreement_scores)
if avg_agreement > 0.75:
final_score = min(final_score * 1.05, 0.90) # More conservative boost
elif avg_agreement < 0.25:
final_score = max(final_score * 0.95, 0.10) # More conservative reduction
# Additional human text protection - if statistical analysis strongly suggests human
if stat_score < 0.3 and len([s for s in all_scores if s < 0.4]) >= 2:
final_score = max(final_score * 0.8, 0.15) # Strong protection for human text
else:
final_score = 0.5
# 4. Classification with improved thresholds to reduce false positives
if final_score >= 0.75:
classification = "AI-Generated (High Confidence)"
confidence = "HIGH"
elif final_score >= 0.60:
classification = "Likely AI-Generated"
confidence = "MEDIUM-HIGH"
elif final_score >= 0.40:
classification = "Uncertain"
confidence = "LOW"
elif final_score >= 0.25:
classification = "Likely Human-Written"
confidence = "MEDIUM"
else:
classification = "Human-Written (High Confidence)"
confidence = "HIGH"
# 5. Generate explanation
explanation = self._create_explanation(final_score, model_results, stat_details)
return {
"ai_probability": round(final_score * 100, 2),
"classification": classification,
"confidence": confidence,
"explanation": explanation,
"model_scores": model_results,
"statistical_analysis": stat_details
}
def _create_explanation(self, score, model_results, stat_details):
"""Create detailed explanation"""
exp = []
# Overall assessment with improved thresholds
if score >= 0.75:
exp.append("π€ STRONG AI INDICATORS: The text exhibits multiple characteristics typical of AI-generated content.")
elif score >= 0.60:
exp.append("β οΈ PROBABLE AI: Several AI patterns detected, suggesting machine generation.")
elif score >= 0.40:
exp.append("β INCONCLUSIVE: Mixed signals - could be AI-assisted or edited content.")
elif score >= 0.25:
exp.append("βοΈ PROBABLE HUMAN: More human-like characteristics than AI patterns.")
else:
exp.append("π€ STRONG HUMAN INDICATORS: Text shows natural human writing patterns.")
# Model consensus
if model_results:
high_ai = [name for name, s in model_results.items() if s > 0.70]
high_human = [name for name, s in model_results.items() if s < 0.30]
if len(high_ai) >= 2:
exp.append(f"\n\nβ Multiple models detect AI: {', '.join(high_ai)}")
elif len(high_human) >= 2:
exp.append(f"\n\nβ Multiple models detect human writing: {', '.join(high_human)}")
# AI-specific indicators
if stat_details.get('chatgpt_patterns', 0) > 0.4:
exp.append("\n\nβ‘ High density of ChatGPT-style phrases and structures detected")
if stat_details.get('sent_length_std', 1) < 0.3:
exp.append("\nπ Unusually consistent sentence lengths (AI characteristic)")
if stat_details.get('trigram_repetition', 0) > 0.1:
exp.append("\nπ Repeated phrase patterns detected")
# Human-specific indicators
if stat_details.get('human_patterns', 0) > 0.3:
exp.append("\n\n㪠Strong human conversational patterns detected")
if stat_details.get('conversational_patterns', 0) > 0.2:
exp.append("\nπ£οΈ Conversational language and casual expressions found")
if stat_details.get('sentence_variety', 0) > 0.4:
exp.append("\nπ Natural sentence length variation (human characteristic)")
return " ".join(exp)
# Initialize detector
print("Initializing AI Text Detector...")
detector = AdvancedAITextDetector()
def analyze_text(text):
"""Gradio interface function"""
if not text:
return "Please enter some text to analyze."
result = detector.detect(text)
# Format output
output = f"""# π AI Detection Results
## **{result['classification']}**
### π AI Probability: **{result['ai_probability']}%**
### π― Confidence: **{result['confidence']}**
---
## π Analysis Summary
{result['explanation']}
---
## π Model Scores
"""
if result.get('model_scores'):
for model, score in result['model_scores'].items():
if score is not None:
percentage = round(score * 100, 1)
bar_length = int(percentage / 5)
bar = 'β' * bar_length + 'β' * (20 - bar_length)
model_display = {
'openai': 'π· OpenAI Detector',
'roberta_detector': 'π€ RoBERTa ChatGPT',
'multilingual': 'π Multilingual',
'pirate': 'π΄ββ οΈ PirateXX',
'perplexity': 'π Perplexity',
'statistical': 'π Statistical'
}.get(model, model)
output += f"\n**{model_display}:** {bar} {percentage}%"
# Statistical details
if result.get('statistical_analysis'):
output += "\n\n---\n\n## π¬ Detailed Linguistic Analysis\n"
analysis = result['statistical_analysis']
# AI indicators
if 'chatgpt_patterns' in analysis:
output += f"\n- **ChatGPT Pattern Score:** {analysis['chatgpt_patterns']:.2f}/1.00"
if 'sent_length_std' in analysis:
output += f"\n- **Sentence Variance:** {analysis['sent_length_std']:.3f} (lower = more AI-like)"
if 'trigram_repetition' in analysis:
output += f"\n- **Phrase Repetition:** {analysis['trigram_repetition']:.3f}"
if 'starter_diversity' in analysis:
output += f"\n- **Sentence Starter Diversity:** {analysis['starter_diversity']:.3f}"
# Human indicators
if 'human_patterns' in analysis:
output += f"\n- **Human Pattern Score:** {analysis['human_patterns']:.2f}/1.00"
if 'conversational_patterns' in analysis:
output += f"\n- **Conversational Patterns:** {analysis['conversational_patterns']:.3f}"
if 'sentence_variety' in analysis:
output += f"\n- **Sentence Variety:** {analysis['sentence_variety']:.3f} (higher = more human-like)"
# Visual representation
ai_prob = result['ai_probability']
human_prob = 100 - ai_prob
output += f"""
---
## π― Final Verdict
```
AI Generated: {'β' * int(ai_prob/5)}{'β' * (20-int(ai_prob/5))} {ai_prob:.1f}%
Human Written: {'β' * int(human_prob/5)}{'β' * (20-int(human_prob/5))} {human_prob:.1f}%
```
"""
# Add disclaimer for low confidence
if result['confidence'] == "LOW":
output += "\n\nβ οΈ **Note:** Low confidence result. Consider getting human verification."
return output
# Create Gradio interface
interface = gr.Interface(
fn=analyze_text,
inputs=gr.Textbox(
lines=12,
placeholder="Paste text here to check if it's AI-generated...\n\nFor best results, provide at least 100 words.",
label="Text to Analyze"
),
outputs=gr.Markdown(label="Detection Results"),
title="π Advanced ChatGPT & AI Text Detector",
description="""
## State-of-the-art AI text detection using multiple methods:
### π₯ Detection Methods:
- **Multiple AI Detection Models** - Ensemble of specialized detectors
- **Perplexity Analysis** - Measures text predictability (AI text is more predictable)
- **Pattern Recognition** - Detects ChatGPT-specific writing patterns
- **Linguistic Analysis** - Analyzes sentence structure, vocabulary, and style
### π‘ Best Practices:
- Provide at least **100-200 words** for accurate detection
- Longer texts generally give more reliable results
- Works best with English text
- Detection is probabilistic - use as guidance, not absolute proof
### π― What This Detects:
- ChatGPT (GPT-3.5/GPT-4)
- Claude, Gemini, and other LLMs
- AI-assisted or heavily edited content
- Paraphrased AI content
**Note:** No detector is 100% accurate. This tool provides sophisticated analysis but should be used alongside human judgment.
""",
examples=[
# ChatGPT example
["Artificial intelligence has revolutionized numerous industries in recent years. It's important to note that this technology offers both opportunities and challenges. Machine learning algorithms can process vast amounts of data, identify patterns, and make predictions with remarkable accuracy. Furthermore, AI applications span various domains including healthcare, finance, and transportation. However, it's crucial to consider the ethical implications. Issues such as bias in algorithms, job displacement, and privacy concerns require careful consideration. Additionally, the development of AI must be guided by responsible practices. In conclusion, while AI presents tremendous potential for innovation and progress, we must approach its implementation thoughtfully and ethically."],
# Human example - conversational
["So yesterday I'm at the coffee shop, right? And this guy next to me is having the LOUDEST phone conversation about his crypto investments. Like, dude, we get it, you bought Dogecoin. But here's the thing - he kept saying he was gonna be a millionaire by next week. Next week! I almost choked on my latte. The barista and I made eye contact and we both just tried not to laugh. I mean, good luck to him and all, but maybe don't count those chickens yet? Anyway, that's my coffee shop drama for the week. Still better than working from home where my cat judges me all day."],
# Human example - personal reflection
["I've been thinking about this whole social media thing lately. You know, I used to post everything - what I ate for breakfast, random thoughts, selfies. But now I'm kinda over it? Like, I still check Instagram and stuff, but I don't feel the need to share every little thing anymore. Maybe I'm getting old, or maybe I just realized that most people don't actually care about my lunch. It's weird how we went from sharing everything to being more private. I think it's actually healthier this way, but I miss the old days sometimes when social media felt more fun and less performative."],
# Mixed/edited example
["The impact of social media on society has been profound. Studies show that people spend an average of 2.5 hours daily on social platforms. But honestly, I think it's probably way more than that - I know I'm constantly checking my phone! These platforms have transformed how we communicate, share information, and even how we see ourselves. There are definitely benefits, like staying connected with friends and family across distances. However, we're also seeing rises in anxiety and depression linked to social media use, especially among teenagers. It's a complex issue that deserves our attention."]
],
theme=gr.themes.Soft(
primary_hue="blue",
secondary_hue="indigo",
neutral_hue="slate"
),
analytics_enabled=False,
cache_examples=False
)
if __name__ == "__main__":
interface.launch() |