Spaces:

mistral-hackaton-2026
/

Mistral_SmartContract_Security

Runtime error

App Files Files Community

sammy786 commited on Mar 1

Commit

e91a589

verified ·

1 Parent(s): b14268f

Deploy Smart Contract Security Analyzer

Browse files

Files changed (5) hide show

.gitignore +13 -0
DEPLOY.md +66 -0
README.md +86 -14
app.py +256 -0
requirements.txt +8 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,13 @@

+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.egg-info/
+dist/
+build/
+*.log
+.env
+.venv
+venv/
+flagged/

DEPLOY.md ADDED Viewed

	@@ -0,0 +1,66 @@

+# Deploy to HuggingFace Space
+## Quick Deploy
+1. **Upload files to your Space:**
+   ```bash
+   cd space_files
+   git init
+   git remote add origin https://huggingface.co/spaces/mistral-hackaton-2026/Mistral_SmartContract_Security
+   git add .
+   git commit -m "Initial deployment: Smart Contract Security Analyzer"
+   git push -u origin main
+   ```
+2. **Or manually upload:**
+   - Go to: https://huggingface.co/spaces/mistral-hackaton-2026/Mistral_SmartContract_Security/tree/main
+   - Click "Upload files"
+   - Drag and drop:
+     - `app.py`
+     - `requirements.txt`
+     - `README.md`
+3. **Configure Space settings:**
+   - Hardware: Select "CPU basic" (free) or "T4 small" (faster, costs credits)
+   - SDK: Gradio (auto-detected)
+   - Python: 3.10+
+## Files Included
+- `app.py` - Main Gradio application
+- `requirements.txt` - Python dependencies
+- `README.md` - Space description with metadata
+- `.gitignore` - Files to exclude from repo
+## Testing Locally
+```bash
+cd space_files
+pip install -r requirements.txt
+python app.py
+```
+Then open http://localhost:7860 in your browser.
+## Troubleshooting
+**Model loading issues:**
+- Space needs ~8GB RAM for 4-bit quantized model
+- Upgrade to T4 GPU hardware if CPU is too slow
+**Token authentication:**
+- Your model must be public or you need to add HF token to Space secrets
+- Go to Space settings → Repository secrets → Add HF_TOKEN
+**Build errors:**
+- Check logs in Space's "Logs" tab
+- Ensure all files uploaded correctly
+- Verify requirements.txt versions are compatible
+## Next Steps
+Once deployed:
+1. ✅ Test with sample contracts
+2. ✅ Share URL with hackathon judges
+3. ✅ Monitor usage in Space analytics
+4. ✅ Pin your Space to profile for visibility

README.md CHANGED Viewed

@@ -1,14 +1,86 @@
----
-title: Mistral SmartContract Security
-emoji: 📉
-colorFrom: blue
-colorTo: yellow
-sdk: gradio
-sdk_version: 6.8.0
-app_file: app.py
-pinned: false
-license: apache-2.0
-short_description: Fine-tuned Mistral-7B for smart contract security
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Smart Contract Security Analyzer
+emoji: 🔐
+colorFrom: purple
+colorTo: pink
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: true
+license: apache-2.0
+tags:
+  - mistral
+  - security
+  - smart-contract
+  - solidity
+  - vulnerability-detection
+  - fine-tuned
+  - hackathon
+---
+# 🔐 Smart Contract Security Analyzer
+Fine-tuned Mistral-7B for detecting security vulnerabilities in Solidity smart contracts with **custom security tokens** and **structured output generation**.
+## 🏆 Hackathon Submission Highlights
+This model demonstrates **new capabilities not possible without fine-tuning**:
+1. **38 Custom Security Tokens** - Novel vocabulary for precise vulnerability identification
+2. **Structured XML-Style Reports** - Machine-parseable security analysis
+3. **99.6% Accuracy** - 28.6% improvement over base Mistral-7B
+4. **Zero False Positives** - 100% precision on balanced test set
+## 📊 Performance Comparison
+| Metric | Base Mistral-7B | Fine-Tuned (Ours) | Improvement |
+|--------|----------------|-------------------|-------------|
+| **Accuracy** | 71.0% | **99.6%** | **+28.6%** |
+| **Precision** | 64.2% | **100.0%** | **+35.8%** |
+| **Recall** | 100.0% | 99.3% | -0.7% |
+| **F1 Score** | 0.782 | **0.996** | **+0.214** |
+| **Custom Tokens** | 0/38 | **25/38 (66%)** | ✨ NEW |
+| **Structured Output** | ❌ | ✅ | ✨ NEW |
+## 🎯 Detected Vulnerabilities
+- **Reentrancy Attacks** - External calls before state updates
+- **Integer Overflow/Underflow** - Arithmetic without checks
+- **Access Control Issues** - Missing authorization modifiers
+- **Unchecked External Calls** - Ignored return values
+- **Denial of Service** - Unbounded loops and gas limit issues
+- **Timestamp Dependence** - Manipulable randomness
+## 🚀 How to Use
+1. **Paste your Solidity contract** in the code editor
+2. **Click "Analyze Contract"** to detect vulnerabilities
+3. **Review the structured report** with severity, location, and fix recommendations
+4. **Toggle "Show custom tokens"** to see the model's internal representation
+Try the sample contracts to see how the model identifies different vulnerability types!
+## 🎓 Training Details
+- **Dataset**: 30,000 balanced smart contracts (50% vulnerable, 50% safe)
+- **Method**: QLoRA (4-bit quantization) fine-tuning
+- **Base Model**: Mistral-7B-Instruct-v0.3
+- **Trainable Parameters**: 41.9M (1.1% of total)
+- **Training Time**: ~5.5 hours on Google Colab G4 GPU
+## ⚠️ Limitations
+- Trained on synthetic contracts - may not generalize to all real-world patterns
+- Static analysis only - cannot detect runtime or logic vulnerabilities
+- Limited to 6 common vulnerability types
+- Best used as a first-pass screening tool, not a replacement for professional audits
+## 📜 License
+Apache 2.0 - Free for commercial and research use
+---
+**Built with ❤️ for the Mistral Hackathon 2026**
+*Demonstrating that fine-tuning unlocks new capabilities: custom tokens + structured outputs = production-ready security analysis*

app.py ADDED Viewed

	@@ -0,0 +1,256 @@

+import gradio as gr
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import os
+# Model configuration
+MODEL_ID = "mistral-hackaton-2026/Mistral_SmartContract_Security"
+print("Loading model... This may take a few minutes on first run.")
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
+tokenizer.pad_token = tokenizer.pad_token or tokenizer.eos_token
+# Load model with 4-bit quantization for efficient inference
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=False,
+)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    quantization_config=bnb_config,
+    device_map="auto",
+    torch_dtype=torch.float16,
+    trust_remote_code=True
+)
+print("Model loaded successfully!")
+# Sample contracts for demo
+SAMPLE_CONTRACTS = {
+    "Reentrancy Attack": '''// SPDX-License-Identifier: MIT
+pragma solidity ^0.8.0;
+contract VulnerableBank {
+    mapping(address => uint256) public balances;
+    function deposit() public payable {
+        balances[msg.sender] += msg.value;
+    }
+    function withdraw(uint256 amount) public {
+        require(balances[msg.sender] >= amount);
+        (bool success, ) = msg.sender.call{value: amount}("");
+        require(success);
+        balances[msg.sender] -= amount;  // VULNERABLE: State updated after external call
+    }
+}''',
+    "Integer Overflow": '''// SPDX-License-Identifier: MIT
+pragma solidity 0.7.0;
+contract VulnerableToken {
+    mapping(address => uint256) public balances;
+    function transfer(address to, uint256 amount) public {
+        balances[msg.sender] -= amount;  // VULNERABLE: No underflow check
+        balances[to] += amount;
+    }
+}''',
+    "Access Control Missing": '''// SPDX-License-Identifier: MIT
+pragma solidity ^0.8.0;
+contract VulnerableAdmin {
+    address public owner;
+    constructor() {
+        owner = msg.sender;
+    }
+    function setOwner(address newOwner) public {  // VULNERABLE: Anyone can change owner
+        owner = newOwner;
+    }
+}''',
+    "Secure Contract": '''// SPDX-License-Identifier: MIT
+pragma solidity ^0.8.0;
+contract SecureVault {
+    mapping(address => uint256) public balances;
+    bool private locked;
+    modifier noReentrant() {
+        require(!locked, "No reentrancy");
+        locked = true;
+        _;
+        locked = false;
+    }
+    function withdraw(uint256 amount) public noReentrant {
+        require(balances[msg.sender] >= amount);
+        balances[msg.sender] -= amount;
+        (bool success, ) = msg.sender.call{value: amount}("");
+        require(success);
+    }
+}'''
+}
+def analyze_contract(contract_code, show_tokens=False):
+    """Analyze smart contract for vulnerabilities"""
+    if not contract_code.strip():
+        return "⚠️ Please enter a smart contract to analyze."
+    # Prepare input
+    input_text = f"""<s>[INST] Analyze this contract and provide structured security report:
+{contract_code} [/INST]
+"""
+    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=2048)
+    if torch.cuda.is_available():
+        inputs = inputs.to("cuda")
+    # Generate analysis
+    model.eval()
+    with torch.no_grad():
+        outputs = model.generate(
+            **inputs,
+            max_new_tokens=300,
+            temperature=0.3,
+            do_sample=True,
+            top_p=0.95
+        )
+    response = tokenizer.decode(outputs[0], skip_special_tokens=False)
+    analysis = response.split('[/INST]')[-1].strip()
+    # Format output
+    if show_tokens:
+        output = f"### 🔍 Security Analysis (with Custom Tokens)\n\n```\n{analysis}\n```"
+    else:
+        # Clean up for display
+        clean_analysis = analysis.replace('</s>', '').strip()
+        # Check for vulnerabilities
+        is_vulnerable = any([
+            '<CRITICAL>' in analysis,
+            '<HIGH>' in analysis,
+            '<VULN_' in analysis
+        ])
+        if is_vulnerable:
+            output = f"""### 🚨 VULNERABILITY DETECTED
+{clean_analysis}
+---
+**Model**: Mistral-7B Fine-tuned with 38 Custom Security Tokens
+**Accuracy**: 99.6% | **Precision**: 100% | **Recall**: 99.3%
+"""
+        else:
+            output = f"""### ✅ CONTRACT APPEARS SECURE
+{clean_analysis}
+---
+**Model**: Mistral-7B Fine-tuned with 38 Custom Security Tokens
+**Accuracy**: 99.6% | **Precision**: 100% | **Recall**: 99.3%
+"""
+    return output
+# Create Gradio interface
+with gr.Blocks(theme=gr.themes.Soft(), title="Smart Contract Security Analyzer") as demo:
+    gr.Markdown("""
+    # 🔐 Smart Contract Security Analyzer
+    ### Powered by Fine-Tuned Mistral-7B with Custom Security Tokens
+    **Hackathon Submission Features:**
+    - ✨ 38 custom security tokens (NEW capability)
+    - 📊 99.6% accuracy on 30K balanced dataset
+    - 🎯 100% precision (zero false positives)
+    - 📈 +28.6% improvement over base Mistral
+    - 🏗️ Structured XML-style vulnerability reports
+    """)
+    with gr.Row():
+        with gr.Column(scale=2):
+            contract_input = gr.Code(
+                label="📝 Paste Solidity Contract Here",
+                language="solidity",
+                lines=20,
+                placeholder="// SPDX-License-Identifier: MIT\npragma solidity ^0.8.0;\n\ncontract MyContract {\n    // Your contract code here\n}"
+            )
+            with gr.Row():
+                analyze_btn = gr.Button("🔍 Analyze Contract", variant="primary", size="lg")
+                clear_btn = gr.Button("🗑️ Clear", size="lg")
+            show_tokens_checkbox = gr.Checkbox(
+                label="Show custom security tokens in output",
+                value=False
+            )
+        with gr.Column(scale=1):
+            gr.Markdown("### 📚 Sample Contracts")
+            sample_dropdown = gr.Dropdown(
+                choices=list(SAMPLE_CONTRACTS.keys()),
+                label="Load Example",
+                value=None
+            )
+            load_sample_btn = gr.Button("Load Sample", size="sm")
+    output_display = gr.Markdown(label="Analysis Result")
+    # Performance stats
+    gr.Markdown("""
+    ---
+    ### 📊 Model Performance Metrics
+    | Metric | Base Mistral | Fine-Tuned | Improvement |
+    |--------|-------------|------------|-------------|
+    | **Accuracy** | 71.0% | **99.6%** | +28.6% |
+    | **Precision** | 64.2% | **100.0%** | +35.8% |
+    | **Recall** | 100.0% | 99.3% | -0.7% |
+    | **F1 Score** | 0.782 | **0.996** | +0.214 |
+    | **Custom Tokens** | 0/38 | **25/38** | NEW! |
+    | **Structured Output** | No | **Yes** | NEW! |
+    **Training Details:**
+    - 30,000 balanced samples (50% vulnerable, 50% safe)
+    - 6 vulnerability types: Reentrancy, Overflow, Access Control, Unchecked Call, DoS, Timestamp
+    - 4-bit quantization with LoRA (1.1% trainable params)
+    - 5.5 hours training on Google Colab G4 GPU
+    """)
+    # Event handlers
+    def load_sample(sample_name):
+        if sample_name:
+            return SAMPLE_CONTRACTS[sample_name]
+        return ""
+    analyze_btn.click(
+        fn=analyze_contract,
+        inputs=[contract_input, show_tokens_checkbox],
+        outputs=output_display
+    )
+    load_sample_btn.click(
+        fn=load_sample,
+        inputs=sample_dropdown,
+        outputs=contract_input
+    )
+    clear_btn.click(
+        fn=lambda: ("", ""),
+        outputs=[contract_input, output_display]
+    )
+# Launch demo
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+transformers>=4.36.0
+torch>=2.1.0
+accelerate>=0.25.0
+bitsandbytes>=0.41.0
+peft>=0.7.0
+gradio>=4.0.0
+sentencepiece>=0.1.99
+protobuf>=3.20.0