Spaces:

ayshajavd
/

code-security-analyzer

Running

App Files Files Community

ayshajavd commited on 25 days ago

Commit

bde8632

verified ·

1 Parent(s): 11d02f9

Updated app with JSON API endpoint, better UI, and REST API docs

Browse files

Files changed (1) hide show

app.py +264 -288

app.py CHANGED Viewed

@@ -1,10 +1,13 @@
 """
-Code Security Risk Analyzer - Gradio UI
 Analyzes code for OWASP Top 10, CWE vulnerabilities.
 Outputs structured security report with vulnerability details, severity, and fixes.
 """
 import json
 import re
 import torch
 import gradio as gr
 from transformers import (
@@ -110,36 +113,36 @@ SEVERITY_MAP = {
 }
 EXPLANATIONS = {
-    "CWE-89": "**SQL Injection** means an attacker can manipulate your database queries by injecting malicious SQL code through user inputs. This could let them steal, modify, or delete ALL your data. Imagine someone typing `'; DROP TABLE users; --` into a login form.",
     "CWE-79": "**Cross-Site Scripting (XSS)** lets attackers inject malicious JavaScript into your web pages. When other users visit the page, the script runs in their browser - stealing cookies, session tokens, or redirecting them to fake sites.",
-    "CWE-78": "**OS Command Injection** means user input is being passed directly to system commands. An attacker could run ANY command on your server - install malware, steal files, or take complete control.",
-    "CWE-94": "**Code Injection** allows attackers to inject and execute arbitrary code in your application. Functions like `eval()`, `exec()`, or dynamic code compilation with untrusted input are the usual culprits.",
-    "CWE-119": "**Buffer Overflow** happens when your code writes data beyond the allocated memory buffer. Attackers can exploit this to crash your program, corrupt data, or even execute malicious code.",
-    "CWE-125": "**Out-of-bounds Read** means your code reads memory outside the intended buffer. This can leak sensitive data like passwords, encryption keys, or other users' data from memory.",
-    "CWE-190": "**Integer Overflow** occurs when an arithmetic operation produces a value too large for the data type. This can cause crashes, infinite loops, or be chained with buffer overflows for code execution.",
-    "CWE-200": "**Information Exposure** means sensitive data (API keys, passwords, internal paths, stack traces) is being leaked to unauthorized parties through error messages, logs, or responses.",
-    "CWE-264": "**Improper Access Control** means users can access resources or perform actions they shouldn't be authorized for. Missing permission checks let attackers escalate privileges.",
-    "CWE-287": "**Authentication Bypass** means the login/identity verification can be circumvented. Attackers could access any account without knowing the password.",
-    "CWE-310": "**Cryptographic Issues** - you're using weak, broken, or improperly configured encryption. Data you think is protected may be easily decryptable by attackers.",
-    "CWE-352": "**CSRF** tricks authenticated users into performing unwanted actions on your site. An attacker's page could make a user unknowingly transfer money, change their email, or delete their account.",
-    "CWE-362": "**Race Condition** means two operations compete for the same resource without proper synchronization. Attackers can exploit the timing window to bypass security checks.",
-    "CWE-416": "**Use After Free** - memory is being used after it's been freed. Attackers can manipulate the freed memory to execute arbitrary code or crash the application.",
-    "CWE-434": "**Unrestricted File Upload** lets attackers upload malicious files (like web shells) to your server. They could then execute code remotely and take full control.",
-    "CWE-476": "**NULL Pointer Dereference** - your code tries to use a pointer/reference that's NULL. This crashes the program and can be exploited for denial-of-service attacks.",
-    "CWE-502": "**Insecure Deserialization** means untrusted data is being deserialized without validation. Attackers can craft malicious serialized objects that execute code when deserialized.",
-    "CWE-601": "**Open Redirect** lets attackers redirect users from your trusted site to a malicious one. This is commonly used in phishing attacks to steal credentials.",
-    "CWE-787": "**Out-of-bounds Write** - data is written outside the intended memory buffer. This is a severe vulnerability that often leads to remote code execution.",
-    "CWE-798": "**Hardcoded Credentials** - passwords, API keys, or tokens are embedded directly in the source code. Anyone with access to the code (or the compiled binary) can extract them.",
-    "CWE-918": "**SSRF** lets attackers make your server send requests to internal systems. They could scan your network, access internal APIs, or read cloud metadata to steal credentials.",
-    "CWE-22": "**Path Traversal** means user input is used in file paths without sanitization. Attackers can use `../` sequences to access any file on the server - config files, passwords, source code.",
-    "CWE-269": "**Privilege Escalation** - a user can gain higher privileges than intended. A regular user might become an admin, accessing sensitive operations and data.",
-    "CWE-276": "**Incorrect Permissions** - files or resources have permissions that are too permissive. Sensitive files might be world-readable, exposing secrets.",
-    "CWE-327": "**Broken Cryptography** - you're using algorithms like MD5 or SHA1 that are cryptographically broken. Attackers can forge hashes or decrypt data.",
-    "CWE-330": "**Insufficient Randomness** - security-critical random values (tokens, keys, IDs) are predictable. Attackers can guess session tokens or API keys.",
-    "CWE-399": "**Resource Management Issues** - improper handling of system resources can lead to denial of service through resource exhaustion.",
-    "CWE-401": "**Memory Leak** - memory is allocated but never freed. Over time, the application uses more and more memory until it crashes.",
-    "CWE-20": "**Improper Input Validation** - user input isn't properly checked before use. This is the root cause of many other vulnerabilities like injection and overflow attacks.",
-    "CWE-284": "**Broken Access Control** - authorization checks are missing or incorrectly implemented. Users can access other users' data or admin functionality.",
 }
 # ============================================================
@@ -156,16 +159,15 @@ try:
     CLASSIFIER_LOADED = True
     print("Classifier loaded successfully")
 except Exception as e:
-    print(f"Classifier not available yet: {e}")
-    cls_tokenizer = AutoTokenizer.from_pretrained("microsoft/graphcodebert-base")
     cls_model = AutoModelForSequenceClassification.from_pretrained(
-        "microsoft/graphcodebert-base",
         num_labels=31,
         problem_type="multi_label_classification",
     )
     cls_model.eval()
     CLASSIFIER_LOADED = False
-    print("Loaded base GraphCodeBERT as fallback")
 print("Loading fix generator...")
 try:
@@ -175,214 +177,163 @@ try:
     FIXER_LOADED = True
     print("Fix generator loaded successfully")
 except Exception as e:
-    print(f"Fix generator not available yet: {e}")
     fix_tokenizer = AutoTokenizer.from_pretrained("Salesforce/codet5p-220m")
     fix_model = T5ForConditionalGeneration.from_pretrained("Salesforce/codet5p-220m")
     fix_model.eval()
     FIXER_LOADED = False
-    print("Loaded base CodeT5+ as fallback")
 def detect_language(code: str) -> str:
-    """Detect programming language from code content."""
     code_lower = code[:500].lower()
-    if "<?php" in code_lower:
-        return "PHP"
-    if "package main" in code_lower and "func " in code_lower:
-        return "Go"
     if "#include" in code_lower:
-        if "class " in code_lower or "std::" in code_lower or "cout" in code_lower:
-            return "C++"
         return "C"
-    if "import java" in code_lower or "public class" in code_lower or "public static void main" in code_lower:
-        return "Java"
-    if re.search(r'\b(const |let |var |function |=>|require\(|module\.exports)', code_lower):
-        return "JavaScript"
-    if re.search(r'\b(def |import |from |class |self\.|print\()', code_lower):
-        return "Python"
-    if "fn " in code_lower and "let mut" in code_lower:
-        return "Rust"
     return "Unknown"
-def generate_fix(code: str, language: str) -> str:
-    """Generate a security fix for vulnerable code."""
-    prefix = f"fix {language.lower()}: "
-    input_text = prefix + code
-    input_ids = fix_tokenizer(
-        input_text, return_tensors="pt",
-        max_length=512, truncation=True
-    ).input_ids
     with torch.no_grad():
-        output_ids = fix_model.generate(
-            input_ids,
-            max_length=512,
-            num_beams=5,
-            early_stopping=True,
-            no_repeat_ngram_size=3,
-        )
-    return fix_tokenizer.decode(output_ids[0], skip_special_tokens=True)
-def analyze_code(code: str) -> str:
-    """Main analysis function - returns formatted security report."""
-    if not code or not code.strip():
-        return "Please paste some code to analyze."
-    language = detect_language(code)
-    # Classify
-    inputs = cls_tokenizer(
-        code, return_tensors="pt",
-        max_length=512, truncation=True, padding=True
-    )
     with torch.no_grad():
-        outputs = cls_model(**inputs)
-        logits = outputs.logits
-        probs = torch.sigmoid(logits).squeeze().numpy()
-    # Get detected vulnerabilities (threshold 0.3 for sensitivity)
-    threshold = 0.3
-    detected = []
-    for i, (cwe, prob) in enumerate(zip(TARGET_CWES, probs)):
-        if cwe == "safe":
-            continue
-        if prob > threshold:
-            detected.append((cwe, float(prob)))
-    # Sort by confidence
-    detected.sort(key=lambda x: x[1], reverse=True)
-    safe_prob = float(probs[0])
-    # Build report
-    report = []
-    report.append("# Code Security Analysis Report\n")
-    report.append(f"**Language Detected:** {language}")
-    model_status = "Trained Model" if CLASSIFIER_LOADED else "Base Model (untrained - results are for demo only)"
-    fix_status = "Trained" if FIXER_LOADED else "Base Model"
-    report.append(f"**Model Status:** {model_status}")
-    report.append(f"**Fix Generator:** {fix_status}\n")
     if not detected:
-        overall_score = max(0, int(100 * safe_prob))
-        report.append(f"## No Vulnerabilities Detected")
-        report.append(f"**Overall Risk Score:** {100 - overall_score}/100 (Low Risk)")
-        report.append(f"**Safe Code Confidence:** {safe_prob:.1%}\n")
-        report.append("The analyzed code appears to be safe based on our detection model. "
-                      "However, always review code manually and use additional static analysis tools.")
-        return "\n".join(report)
-    # Calculate overall risk score
-    max_severity = max(SEVERITY_MAP.get(cwe, ("Low", 30))[1] for cwe, _ in detected)
-    avg_confidence = sum(p for _, p in detected) / len(detected)
-    overall_risk = min(100, int(max_severity * avg_confidence * 1.2))
-    if overall_risk >= 80:
-        risk_level = "Critical"
-    elif overall_risk >= 60:
-        risk_level = "High"
-    elif overall_risk >= 40:
-        risk_level = "Medium"
-    else:
         risk_level = "Low"
-    report.append(f"## {len(detected)} Vulnerability(ies) Detected\n")
-    report.append(f"**Overall Risk Score:** {overall_risk}/100 ({risk_level})")
-    report.append(f"**Safe Code Probability:** {safe_prob:.1%}\n")
-    report.append("---\n")
-    # Detail each vulnerability
-    for idx, (cwe, confidence) in enumerate(detected, 1):
-        name = CWE_NAMES.get(cwe, cwe)
-        owasp = CWE_TO_OWASP.get(cwe, "N/A")
-        severity, score = SEVERITY_MAP.get(cwe, ("Medium", 50))
-        explanation = EXPLANATIONS.get(cwe, "This vulnerability could pose a security risk to your application.")
-        exploit_likelihood = min(100, int(confidence * score))
-        report.append(f"### {idx}. {name}")
-        report.append(f"| Property | Value |")
-        report.append(f"|----------|-------|")
-        report.append(f"| **CWE ID** | {cwe} |")
-        report.append(f"| **OWASP Category** | {owasp} |")
-        report.append(f"| **Severity** | {severity} ({score}/100) |")
-        report.append(f"| **Detection Confidence** | {confidence:.1%} |")
-        report.append(f"| **Exploit Likelihood** | {exploit_likelihood}% |")
-        report.append(f"\n**Why This Is Dangerous:**\n{explanation}\n")
-    # Attack chain analysis
     if len(detected) > 1:
-        report.append("---\n")
-        report.append("## Attack Chain Analysis\n")
-        report.append("Multiple vulnerabilities can be chained together for a more severe attack:\n")
-        chain_steps = []
-        has_input = any(c in ["CWE-20", "CWE-89", "CWE-79", "CWE-78", "CWE-94"] for c, _ in detected)
-        has_access = any(c in ["CWE-264", "CWE-269", "CWE-284", "CWE-287"] for c, _ in detected)
-        has_data = any(c in ["CWE-200", "CWE-22", "CWE-125"] for c, _ in detected)
-        has_exec = any(c in ["CWE-119", "CWE-416", "CWE-787", "CWE-502"] for c, _ in detected)
-        step = 1
-        if has_input:
-            chain_steps.append(f"{step}. **Initial Access** - Exploit input validation weakness to inject malicious payload")
-            step += 1
-        if has_access:
-            chain_steps.append(f"{step}. **Privilege Escalation** - Bypass access controls to gain elevated permissions")
-            step += 1
-        if has_data:
-            chain_steps.append(f"{step}. **Data Exfiltration** - Read sensitive files or memory to extract secrets")
-            step += 1
-        if has_exec:
-            chain_steps.append(f"{step}. **Remote Code Execution** - Exploit memory corruption or deserialization for code execution")
-            step += 1
-        if chain_steps:
-            report.append("\n".join(chain_steps))
-        else:
-            vuln_names = [CWE_NAMES.get(c, c) for c, _ in detected[:3]]
-            report.append(f"The combination of **{' + '.join(vuln_names)}** increases the attack surface. "
-                         f"An attacker could exploit one vulnerability to amplify the impact of another.")
-    # Generate fix
-    report.append("\n---\n")
-    report.append("## Suggested Secure Fix\n")
     try:
-        fix = generate_fix(code, language)
-        if fix and fix.strip():
-            report.append(f"```{language.lower()}\n{fix}\n```\n")
-        else:
-            report.append("*Fix generation returned empty result. Please review manually.*\n")
-    except Exception as e:
-        report.append(f"*Fix generation failed: {str(e)}. Please review manually.*\n")
-    report.append("---\n")
-    report.append("*This report was generated by an AI model. Always verify findings with manual code review and additional security tools.*")
-    return "\n".join(report)
 # ============================================================
-# Example Code Snippets
 # ============================================================
 EXAMPLES = [
     ["""import sqlite3
 def get_user(username):
     conn = sqlite3.connect('users.db')
-    cursor = conn.cursor()
     query = f"SELECT * FROM users WHERE username = '{username}'"
-    cursor.execute(query)
-    return cursor.fetchone()
 def login(request):
-    username = request.form['username']
-    password = request.form['password']
-    user = get_user(username)
-    if user and user[2] == password:
         return "Login successful"
     return "Login failed"
 """],
@@ -396,9 +347,7 @@ void process_input(char *user_input) {
 }
 int main(int argc, char *argv[]) {
-    if (argc > 1) {
-        process_input(argv[1]);
-    }
     return 0;
 }
 """],
@@ -407,58 +356,43 @@ const app = express();
 app.get('/search', (req, res) => {
     const query = req.query.q;
-    res.send(`<h1>Search Results for: ${query}</h1>
-              <p>No results found for "${query}"</p>`);
 });
 app.get('/profile/:id', (req, res) => {
-    const userId = req.params.id;
-    db.query('SELECT * FROM users WHERE id = ' + userId, (err, user) => {
-        res.send(`<h2>${user.name}</h2><p>${user.bio}</p>`);
     });
 });
 """],
-    ["""import requests
-import hashlib
-API_KEY = "sk-proj-abc123def456ghi789"
 DB_PASSWORD = "admin123"
-SECRET_KEY = "super_secret_key_2024"
 def connect_to_api():
-    headers = {"Authorization": f"Bearer {API_KEY}"}
-    response = requests.get("https://api.example.com/data", headers=headers)
-    return response.json()
 def hash_password(password):
     return hashlib.md5(password.encode()).hexdigest()
-def verify_admin(token):
-    if token == SECRET_KEY:
-        return True
-    return False
 """],
     ["""import sqlite3
 from hashlib import sha256
-import hmac
-import secrets
 def get_user(username):
     conn = sqlite3.connect('users.db')
-    cursor = conn.cursor()
-    cursor.execute("SELECT * FROM users WHERE username = ?", (username,))
-    return cursor.fetchone()
 def hash_password(password, salt=None):
-    if salt is None:
-        salt = secrets.token_hex(16)
-    hashed = sha256((salt + password).encode()).hexdigest()
-    return f"{salt}:{hashed}"
-def verify_password(password, stored_hash):
-    salt, expected_hash = stored_hash.split(':')
-    actual_hash = sha256((salt + password).encode()).hexdigest()
-    return hmac.compare_digest(actual_hash, expected_hash)
 """],
 ]
@@ -468,68 +402,110 @@ def verify_password(password, stored_hash):
 with gr.Blocks(
     title="Code Security Risk Analyzer",
     theme=gr.themes.Soft(),
-    css="""
-    .report-output { font-size: 14px; }
-    .gradio-container { max-width: 1200px; margin: auto; }
-    """
 ) as demo:
     gr.Markdown("""
-    # AI-Powered Code Security Risk Analyzer
-    ### Detect OWASP Top 10, CWE vulnerabilities, and get secure fixes
-    Paste any code snippet (Python, JavaScript, Java, C, C++, PHP, Go) and get a comprehensive security audit.
-    **Powered by:** GraphCodeBERT (vulnerability detection) + CodeT5+ (fix generation)
     """)
     with gr.Row():
         with gr.Column(scale=1):
-            code_input = gr.Code(
-                label="Paste Your Code Here",
-                language="python",
-                lines=20,
-            )
-            analyze_btn = gr.Button("Analyze Security", variant="primary", size="lg")
         with gr.Column(scale=1):
-            report_output = gr.Markdown(
-                label="Security Report",
-                elem_classes=["report-output"],
-            )
-    gr.Examples(
-        examples=EXAMPLES,
-        inputs=[code_input],
-        label="Example Code Snippets (click to load)",
-    )
-    analyze_btn.click(
-        fn=analyze_code,
-        inputs=[code_input],
-        outputs=[report_output],
     )
     gr.Markdown("""
-    ---
-    ### Vulnerability Categories Covered
-    | OWASP Category | Vulnerabilities |
-    |---|---|
-    | **A01: Broken Access Control** | Path Traversal, IDOR, Missing Auth, Privilege Escalation, CSRF, Open Redirect |
-    | **A02: Cryptographic Failures** | Weak Crypto (MD5/SHA1), Insufficient Randomness, Broken Algorithms |
-    | **A03: Injection** | SQL Injection, XSS, Command Injection, Code Injection, Buffer Overflow |
-    | **A04: Insecure Design** | Race Conditions, Unrestricted Upload, Resource Management |
-    | **A07: Auth Failures** | Improper Authentication, Hardcoded Credentials |
-    | **A08: Integrity Failures** | Insecure Deserialization |
-    | **A10: SSRF** | Server-Side Request Forgery |
-    **Languages:** Python, JavaScript, Java, C, C++, PHP, Go
-    **Models:** [GraphCodeBERT](https://huggingface.co/microsoft/graphcodebert-base) (detection) | [CodeT5+](https://huggingface.co/Salesforce/codet5p-220m) (fix generation)
-    **Dataset:** [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset) - 175K samples from BigVul, PrimeVul, CyberNative DPO
     """)
 if __name__ == "__main__":
     demo.launch()

 """
+Code Security Risk Analyzer - Gradio UI + REST API
 Analyzes code for OWASP Top 10, CWE vulnerabilities.
 Outputs structured security report with vulnerability details, severity, and fixes.
+REST API: Use gradio_client or POST to /api/predict
 """
 import json
 import re
+import time
 import torch
 import gradio as gr
 from transformers import (
 }
 EXPLANATIONS = {
+    "CWE-89": "**SQL Injection** means an attacker can manipulate your database queries by injecting malicious SQL code through user inputs. This could let them steal, modify, or delete ALL your data.",
     "CWE-79": "**Cross-Site Scripting (XSS)** lets attackers inject malicious JavaScript into your web pages. When other users visit the page, the script runs in their browser - stealing cookies, session tokens, or redirecting them to fake sites.",
+    "CWE-78": "**OS Command Injection** means user input is being passed directly to system commands. An attacker could run ANY command on your server.",
+    "CWE-94": "**Code Injection** allows attackers to inject and execute arbitrary code. Functions like `eval()`, `exec()`, or dynamic code compilation with untrusted input are the usual culprits.",
+    "CWE-119": "**Buffer Overflow** happens when your code writes data beyond the allocated memory buffer. Attackers can exploit this to crash your program or execute malicious code.",
+    "CWE-125": "**Out-of-bounds Read** means your code reads memory outside the intended buffer. This can leak sensitive data like passwords or encryption keys.",
+    "CWE-190": "**Integer Overflow** occurs when an arithmetic operation produces a value too large for the data type, which can be chained with buffer overflows for code execution.",
+    "CWE-200": "**Information Exposure** means sensitive data (API keys, passwords, stack traces) is being leaked to unauthorized parties.",
+    "CWE-264": "**Improper Access Control** means users can access resources or perform actions they shouldn't be authorized for.",
+    "CWE-287": "**Authentication Bypass** means the login/identity verification can be circumvented.",
+    "CWE-310": "**Cryptographic Issues** - you're using weak, broken, or improperly configured encryption.",
+    "CWE-352": "**CSRF** tricks authenticated users into performing unwanted actions on your site.",
+    "CWE-362": "**Race Condition** means two operations compete for the same resource without proper synchronization.",
+    "CWE-416": "**Use After Free** - memory is being used after it's been freed. Attackers can exploit this for arbitrary code execution.",
+    "CWE-434": "**Unrestricted File Upload** lets attackers upload malicious files (like web shells) to your server.",
+    "CWE-476": "**NULL Pointer Dereference** - your code tries to use a pointer that's NULL, causing crashes.",
+    "CWE-502": "**Insecure Deserialization** means untrusted data is deserialized without validation, enabling code execution.",
+    "CWE-601": "**Open Redirect** lets attackers redirect users from your trusted site to a malicious one for phishing.",
+    "CWE-787": "**Out-of-bounds Write** - data is written outside the intended memory buffer, often leading to remote code execution.",
+    "CWE-798": "**Hardcoded Credentials** - passwords, API keys, or tokens are embedded directly in the source code.",
+    "CWE-918": "**SSRF** lets attackers make your server send requests to internal systems, accessing internal APIs or cloud metadata.",
+    "CWE-22": "**Path Traversal** means user input is used in file paths without sanitization. Attackers can use `../` to access any file on the server.",
+    "CWE-269": "**Privilege Escalation** - a user can gain higher privileges than intended.",
+    "CWE-276": "**Incorrect Permissions** - files or resources have permissions that are too permissive.",
+    "CWE-327": "**Broken Cryptography** - you're using algorithms like MD5 or SHA1 that are cryptographically broken.",
+    "CWE-330": "**Insufficient Randomness** - security-critical random values (tokens, keys) are predictable.",
+    "CWE-399": "**Resource Management Issues** - improper handling of system resources can lead to denial of service.",
+    "CWE-401": "**Memory Leak** - memory is allocated but never freed, eventually causing crashes.",
+    "CWE-20": "**Improper Input Validation** - user input isn't properly checked before use, enabling many other vulnerabilities.",
+    "CWE-284": "**Broken Access Control** - authorization checks are missing or incorrectly implemented.",
 }
 # ============================================================
     CLASSIFIER_LOADED = True
     print("Classifier loaded successfully")
 except Exception as e:
+    print(f"Classifier not available: {e}")
+    cls_tokenizer = AutoTokenizer.from_pretrained("huggingface/CodeBERTa-small-v1")
     cls_model = AutoModelForSequenceClassification.from_pretrained(
+        "huggingface/CodeBERTa-small-v1",
         num_labels=31,
         problem_type="multi_label_classification",
     )
     cls_model.eval()
     CLASSIFIER_LOADED = False
 print("Loading fix generator...")
 try:
     FIXER_LOADED = True
     print("Fix generator loaded successfully")
 except Exception as e:
+    print(f"Fix generator not available: {e}")
     fix_tokenizer = AutoTokenizer.from_pretrained("Salesforce/codet5p-220m")
     fix_model = T5ForConditionalGeneration.from_pretrained("Salesforce/codet5p-220m")
     fix_model.eval()
     FIXER_LOADED = False
 def detect_language(code: str) -> str:
     code_lower = code[:500].lower()
+    if "<?php" in code_lower: return "PHP"
+    if "package main" in code_lower and "func " in code_lower: return "Go"
     if "#include" in code_lower:
+        if "class " in code_lower or "std::" in code_lower or "cout" in code_lower: return "C++"
         return "C"
+    if "import java" in code_lower or "public class" in code_lower: return "Java"
+    if re.search(r'\b(const |let |var |function |=>|require\(|module\.exports)', code_lower): return "JavaScript"
+    if re.search(r'\b(def |import |from |class |self\.|print\()', code_lower): return "Python"
     return "Unknown"
+def classify_code(code):
+    inputs = cls_tokenizer(code, return_tensors="pt", max_length=512, truncation=True, padding=True)
     with torch.no_grad():
+        probs = torch.sigmoid(cls_model(**inputs).logits).squeeze().numpy()
+    detected = [(cwe, float(p)) for i, (cwe, p) in enumerate(zip(TARGET_CWES, probs)) if cwe != "safe" and p > 0.3]
+    detected.sort(key=lambda x: x[1], reverse=True)
+    return detected, float(probs[0]), {cwe: float(p) for cwe, p in zip(TARGET_CWES, probs)}
+def generate_fix(code, language):
+    input_ids = fix_tokenizer(f"fix {language.lower()}: " + code, return_tensors="pt", max_length=512, truncation=True).input_ids
     with torch.no_grad():
+        out = fix_model.generate(input_ids, max_length=512, num_beams=5, early_stopping=True, no_repeat_ngram_size=3)
+    return fix_tokenizer.decode(out[0], skip_special_tokens=True)
+def build_json_report(code):
+    language = detect_language(code)
+    detected, safe_prob, all_probs = classify_code(code)
     if not detected:
+        overall_risk = max(0, int(100 - 100 * safe_prob))
         risk_level = "Low"
+    else:
+        max_sev = max(SEVERITY_MAP.get(c, ("Low", 30))[1] for c, _ in detected)
+        avg_conf = sum(p for _, p in detected) / len(detected)
+        overall_risk = min(100, int(max_sev * avg_conf * 1.2))
+        risk_level = "Critical" if overall_risk >= 80 else "High" if overall_risk >= 60 else "Medium" if overall_risk >= 40 else "Low"
+    vulns = []
+    for cwe, conf in detected:
+        sev, score = SEVERITY_MAP.get(cwe, ("Medium", 50))
+        vulns.append({
+            "cwe_id": cwe, "name": CWE_NAMES.get(cwe, cwe),
+            "owasp_category": CWE_TO_OWASP.get(cwe, "N/A"),
+            "severity": sev, "severity_score": score,
+            "detection_confidence": round(conf, 4),
+            "exploit_likelihood": min(100, int(conf * score)),
+            "explanation": EXPLANATIONS.get(cwe, "Security risk detected.").replace("**", ""),
+        })
+    # Attack chain
+    chain = None
     if len(detected) > 1:
+        steps = []
+        cats = {c for c, _ in detected}
+        if cats & {"CWE-20","CWE-89","CWE-79","CWE-78","CWE-94"}:
+            steps.append({"step": len(steps)+1, "phase": "Initial Access", "description": "Exploit input validation weakness"})
+        if cats & {"CWE-264","CWE-269","CWE-284","CWE-287"}:
+            steps.append({"step": len(steps)+1, "phase": "Privilege Escalation", "description": "Bypass access controls"})
+        if cats & {"CWE-200","CWE-22","CWE-125"}:
+            steps.append({"step": len(steps)+1, "phase": "Data Exfiltration", "description": "Read sensitive files or memory"})
+        if cats & {"CWE-119","CWE-416","CWE-787","CWE-502"}:
+            steps.append({"step": len(steps)+1, "phase": "Code Execution", "description": "Exploit memory corruption"})
+        if steps: chain = steps
+    fix = None
     try:
+        f = generate_fix(code, language)
+        if f and f.strip(): fix = f
+    except: pass
+    return {
+        "language": language,
+        "model_status": {"classifier": "trained" if CLASSIFIER_LOADED else "base_model", "fix_generator": "trained" if FIXER_LOADED else "base_model"},
+        "overall_risk_score": overall_risk, "risk_level": risk_level,
+        "safe_probability": round(safe_prob, 4), "num_vulnerabilities": len(vulns),
+        "vulnerabilities": vulns, "attack_chain": chain, "suggested_fix": fix,
+        "all_class_probabilities": all_probs,
+        "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
+    }
+def analyze_code(code):
+    if not code or not code.strip(): return "⚠️ Please paste some code to analyze."
+    data = build_json_report(code)
+    r = ["# 🔒 Code Security Analysis Report\n"]
+    r.append(f"**Language:** {data['language']}")
+    r.append(f"**Classifier:** {'✅ Trained' if data['model_status']['classifier']=='trained' else '⚠️ Base Model (demo)'}")
+    r.append(f"**Fix Generator:** {'✅ Trained' if data['model_status']['fix_generator']=='trained' else '⚠️ Base Model'}\n")
+    if data['num_vulnerabilities'] == 0:
+        r.append("## ✅ No Vulnerabilities Detected")
+        r.append(f"**Risk Score:** {data['overall_risk_score']}/100 | **Safe Confidence:** {data['safe_probability']:.1%}\n")
+        r.append("Code appears safe. Always supplement with manual review and SAST tools.")
+        return "\n".join(r)
+    emoji = {"Critical":"🔴","High":"🟠","Medium":"🟡","Low":"🟢"}.get(data['risk_level'],"⚪")
+    r.append(f"## {emoji} {data['num_vulnerabilities']} Vulnerability(ies) Detected\n")
+    r.append(f"**Risk Score:** {data['overall_risk_score']}/100 ({data['risk_level']}) | **Safe Probability:** {data['safe_probability']:.1%}\n---\n")
+    for i, v in enumerate(data['vulnerabilities'], 1):
+        se = {"Critical":"🔴","High":"🟠","Medium":"🟡","Low":"🟢"}.get(v['severity'],"⚪")
+        r.append(f"### {i}. {se} {v['name']}")
+        r.append("| Property | Value |\n|----------|-------|")
+        r.append(f"| **CWE ID** | {v['cwe_id']} |")
+        r.append(f"| **OWASP** | {v['owasp_category']} |")
+        r.append(f"| **Severity** | {v['severity']} ({v['severity_score']}/100) |")
+        r.append(f"| **Confidence** | {v['detection_confidence']:.1%} |")
+        r.append(f"| **Exploit Likelihood** | {v['exploit_likelihood']}% |")
+        r.append(f"\n**Why Dangerous:** {v['explanation']}\n")
+    if data['attack_chain']:
+        r.append("---\n## ⛓️ Attack Chain\n")
+        for s in data['attack_chain']:
+            r.append(f"{s['step']}. **{s['phase']}** — {s['description']}")
+    r.append("\n---\n## 🔧 Suggested Fix\n")
+    if data['suggested_fix']:
+        r.append(f"```{data['language'].lower()}\n{data['suggested_fix']}\n```")
+    else:
+        r.append("*Fix generation unavailable. Please review manually.*")
+    r.append("\n---\n*AI-generated report. Verify with manual review and SAST tools.*")
+    return "\n".join(r)
+def get_json_report(code):
+    if not code or not code.strip(): return {"error": "No code provided"}
+    return build_json_report(code)
 # ============================================================
+# Example Snippets
 # ============================================================
 EXAMPLES = [
     ["""import sqlite3
 def get_user(username):
     conn = sqlite3.connect('users.db')
     query = f"SELECT * FROM users WHERE username = '{username}'"
+    return conn.execute(query).fetchone()
 def login(request):
+    user = get_user(request.form['username'])
+    if user and user[2] == request.form['password']:
         return "Login successful"
     return "Login failed"
 """],
 }
 int main(int argc, char *argv[]) {
+    if (argc > 1) process_input(argv[1]);
     return 0;
 }
 """],
 app.get('/search', (req, res) => {
     const query = req.query.q;
+    res.send(`<h1>Results for: ${query}</h1>`);
 });
 app.get('/profile/:id', (req, res) => {
+    db.query('SELECT * FROM users WHERE id = ' + req.params.id, (err, user) => {
+        res.send(`<h2>${user.name}</h2>`);
     });
 });
 """],
+    ["""import requests, hashlib
+API_KEY = "sk-proj-abc123def456"
 DB_PASSWORD = "admin123"
 def connect_to_api():
+    return requests.get("https://api.example.com/data",
+        headers={"Authorization": f"Bearer {API_KEY}"}).json()
 def hash_password(password):
     return hashlib.md5(password.encode()).hexdigest()
 """],
     ["""import sqlite3
 from hashlib import sha256
+import hmac, secrets
 def get_user(username):
     conn = sqlite3.connect('users.db')
+    conn.execute("SELECT * FROM users WHERE username = ?", (username,))
+    return conn.fetchone()
 def hash_password(password, salt=None):
+    salt = salt or secrets.token_hex(16)
+    return f"{salt}:{sha256((salt+password).encode()).hexdigest()}"
+def verify_password(password, stored):
+    salt, expected = stored.split(':')
+    return hmac.compare_digest(sha256((salt+password).encode()).hexdigest(), expected)
 """],
 ]
 with gr.Blocks(
     title="Code Security Risk Analyzer",
     theme=gr.themes.Soft(),
+    css=".gradio-container { max-width: 1200px; margin: auto; }"
 ) as demo:
     gr.Markdown("""
+    # 🔒 AI-Powered Code Security Risk Analyzer
+    ### Detect OWASP Top 10 & CWE vulnerabilities with secure fix suggestions
+    Paste code in Python, JavaScript, Java, C, C++, PHP, or Go.
+    **Models:** [GraphCodeBERT](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) (detection) + [CodeT5+](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) (fixes) | **Dataset:** [175K samples](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset)
     """)
     with gr.Row():
         with gr.Column(scale=1):
+            code_input = gr.Code(label="Paste Your Code Here", language="python", lines=20)
+            with gr.Row():
+                analyze_btn = gr.Button("🔍 Analyze Security", variant="primary", size="lg")
+                json_btn = gr.Button("📋 JSON Report", variant="secondary", size="lg")
         with gr.Column(scale=1):
+            report_output = gr.Markdown(label="Security Report")
+            json_output = gr.JSON(label="JSON Report", visible=False)
+    gr.Examples(examples=EXAMPLES, inputs=[code_input], label="Example Code Snippets")
+    def show_json(code):
+        return gr.update(visible=True, value=get_json_report(code))
+    analyze_btn.click(fn=analyze_code, inputs=[code_input], outputs=[report_output])
+    json_btn.click(fn=show_json, inputs=[code_input], outputs=[json_output])
+    with gr.Accordion("🌐 REST API Documentation", open=False):
+        gr.Markdown("""
+### Python Client
+```python
+from gradio_client import Client
+client = Client("ayshajavd/code-security-analyzer")
+# Get markdown report
+report = client.predict(code="your code here", api_name="/analyze")
+# Get structured JSON report
+json_report = client.predict(code="your code here", api_name="/get_json_report")
+```
+### cURL
+```bash
+# Markdown report
+curl -X POST https://ayshajavd-code-security-analyzer.hf.space/call/analyze \\
+  -H "Content-Type: application/json" \\
+  -d '{"data": ["your code here"]}'
+# JSON report
+curl -X POST https://ayshajavd-code-security-analyzer.hf.space/call/get_json_report \\
+  -H "Content-Type: application/json" \\
+  -d '{"data": ["your code here"]}'
+```
+### JSON Response Schema
+```json
+{
+  "language": "Python",
+  "overall_risk_score": 85,
+  "risk_level": "Critical",
+  "safe_probability": 0.12,
+  "num_vulnerabilities": 2,
+  "vulnerabilities": [{
+    "cwe_id": "CWE-89",
+    "name": "SQL Injection",
+    "owasp_category": "A03:2021 - Injection",
+    "severity": "Critical",
+    "severity_score": 95,
+    "detection_confidence": 0.92,
+    "exploit_likelihood": 87,
+    "explanation": "..."
+  }],
+  "attack_chain": [{"step": 1, "phase": "Initial Access", "description": "..."}],
+  "suggested_fix": "...",
+  "timestamp": "2025-01-01T00:00:00Z"
+}
+```
+        """)
+    # Explicit API endpoint for JSON reports
+    json_api = gr.Interface(
+        fn=get_json_report,
+        inputs=gr.Textbox(label="code"),
+        outputs=gr.JSON(label="report"),
+        api_name="get_json_report",
     )
     gr.Markdown("""
+---
+### 30 CWE Vulnerability Classes → OWASP Top 10
+| OWASP Category | CWEs |
+|---|---|
+| **A01: Broken Access Control** | CWE-22, 200, 264, 269, 276, 284, 352, 601 |
+| **A02: Cryptographic Failures** | CWE-310, 327, 330 |
+| **A03: Injection** | CWE-20, 78, 79, 89, 94, 119, 125, 190, 401, 416, 476, 787 |
+| **A04: Insecure Design** | CWE-362, 399, 434 |
+| **A07: Auth Failures** | CWE-287, 798 |
+| **A08: Integrity Failures** | CWE-502 |
+| **A10: SSRF** | CWE-918 |
     """)
 if __name__ == "__main__":
     demo.launch()