sammy786 commited on
Commit
e91a589
Β·
verified Β·
1 Parent(s): b14268f

Deploy Smart Contract Security Analyzer

Browse files
Files changed (5) hide show
  1. .gitignore +13 -0
  2. DEPLOY.md +66 -0
  3. README.md +86 -14
  4. app.py +256 -0
  5. requirements.txt +8 -0
.gitignore ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.py[cod]
3
+ *$py.class
4
+ *.so
5
+ .Python
6
+ *.egg-info/
7
+ dist/
8
+ build/
9
+ *.log
10
+ .env
11
+ .venv
12
+ venv/
13
+ flagged/
DEPLOY.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deploy to HuggingFace Space
2
+
3
+ ## Quick Deploy
4
+
5
+ 1. **Upload files to your Space:**
6
+ ```bash
7
+ cd space_files
8
+ git init
9
+ git remote add origin https://huggingface.co/spaces/mistral-hackaton-2026/Mistral_SmartContract_Security
10
+ git add .
11
+ git commit -m "Initial deployment: Smart Contract Security Analyzer"
12
+ git push -u origin main
13
+ ```
14
+
15
+ 2. **Or manually upload:**
16
+ - Go to: https://huggingface.co/spaces/mistral-hackaton-2026/Mistral_SmartContract_Security/tree/main
17
+ - Click "Upload files"
18
+ - Drag and drop:
19
+ - `app.py`
20
+ - `requirements.txt`
21
+ - `README.md`
22
+
23
+ 3. **Configure Space settings:**
24
+ - Hardware: Select "CPU basic" (free) or "T4 small" (faster, costs credits)
25
+ - SDK: Gradio (auto-detected)
26
+ - Python: 3.10+
27
+
28
+ ## Files Included
29
+
30
+ - `app.py` - Main Gradio application
31
+ - `requirements.txt` - Python dependencies
32
+ - `README.md` - Space description with metadata
33
+ - `.gitignore` - Files to exclude from repo
34
+
35
+ ## Testing Locally
36
+
37
+ ```bash
38
+ cd space_files
39
+ pip install -r requirements.txt
40
+ python app.py
41
+ ```
42
+
43
+ Then open http://localhost:7860 in your browser.
44
+
45
+ ## Troubleshooting
46
+
47
+ **Model loading issues:**
48
+ - Space needs ~8GB RAM for 4-bit quantized model
49
+ - Upgrade to T4 GPU hardware if CPU is too slow
50
+
51
+ **Token authentication:**
52
+ - Your model must be public or you need to add HF token to Space secrets
53
+ - Go to Space settings β†’ Repository secrets β†’ Add HF_TOKEN
54
+
55
+ **Build errors:**
56
+ - Check logs in Space's "Logs" tab
57
+ - Ensure all files uploaded correctly
58
+ - Verify requirements.txt versions are compatible
59
+
60
+ ## Next Steps
61
+
62
+ Once deployed:
63
+ 1. βœ… Test with sample contracts
64
+ 2. βœ… Share URL with hackathon judges
65
+ 3. βœ… Monitor usage in Space analytics
66
+ 4. βœ… Pin your Space to profile for visibility
README.md CHANGED
@@ -1,14 +1,86 @@
1
- ---
2
- title: Mistral SmartContract Security
3
- emoji: πŸ“‰
4
- colorFrom: blue
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 6.8.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: Fine-tuned Mistral-7B for smart contract security
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Smart Contract Security Analyzer
3
+ emoji: πŸ”
4
+ colorFrom: purple
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 4.0.0
8
+ app_file: app.py
9
+ pinned: true
10
+ license: apache-2.0
11
+ tags:
12
+ - mistral
13
+ - security
14
+ - smart-contract
15
+ - solidity
16
+ - vulnerability-detection
17
+ - fine-tuned
18
+ - hackathon
19
+ ---
20
+
21
+ # πŸ” Smart Contract Security Analyzer
22
+
23
+ Fine-tuned Mistral-7B for detecting security vulnerabilities in Solidity smart contracts with **custom security tokens** and **structured output generation**.
24
+
25
+ ## πŸ† Hackathon Submission Highlights
26
+
27
+ This model demonstrates **new capabilities not possible without fine-tuning**:
28
+
29
+ 1. **38 Custom Security Tokens** - Novel vocabulary for precise vulnerability identification
30
+ 2. **Structured XML-Style Reports** - Machine-parseable security analysis
31
+ 3. **99.6% Accuracy** - 28.6% improvement over base Mistral-7B
32
+ 4. **Zero False Positives** - 100% precision on balanced test set
33
+
34
+ ## πŸ“Š Performance Comparison
35
+
36
+ | Metric | Base Mistral-7B | Fine-Tuned (Ours) | Improvement |
37
+ |--------|----------------|-------------------|-------------|
38
+ | **Accuracy** | 71.0% | **99.6%** | **+28.6%** |
39
+ | **Precision** | 64.2% | **100.0%** | **+35.8%** |
40
+ | **Recall** | 100.0% | 99.3% | -0.7% |
41
+ | **F1 Score** | 0.782 | **0.996** | **+0.214** |
42
+ | **Custom Tokens** | 0/38 | **25/38 (66%)** | ✨ NEW |
43
+ | **Structured Output** | ❌ | βœ… | ✨ NEW |
44
+
45
+ ## 🎯 Detected Vulnerabilities
46
+
47
+ - **Reentrancy Attacks** - External calls before state updates
48
+ - **Integer Overflow/Underflow** - Arithmetic without checks
49
+ - **Access Control Issues** - Missing authorization modifiers
50
+ - **Unchecked External Calls** - Ignored return values
51
+ - **Denial of Service** - Unbounded loops and gas limit issues
52
+ - **Timestamp Dependence** - Manipulable randomness
53
+
54
+ ## πŸš€ How to Use
55
+
56
+ 1. **Paste your Solidity contract** in the code editor
57
+ 2. **Click "Analyze Contract"** to detect vulnerabilities
58
+ 3. **Review the structured report** with severity, location, and fix recommendations
59
+ 4. **Toggle "Show custom tokens"** to see the model's internal representation
60
+
61
+ Try the sample contracts to see how the model identifies different vulnerability types!
62
+
63
+ ## πŸŽ“ Training Details
64
+
65
+ - **Dataset**: 30,000 balanced smart contracts (50% vulnerable, 50% safe)
66
+ - **Method**: QLoRA (4-bit quantization) fine-tuning
67
+ - **Base Model**: Mistral-7B-Instruct-v0.3
68
+ - **Trainable Parameters**: 41.9M (1.1% of total)
69
+ - **Training Time**: ~5.5 hours on Google Colab G4 GPU
70
+
71
+ ## ⚠️ Limitations
72
+
73
+ - Trained on synthetic contracts - may not generalize to all real-world patterns
74
+ - Static analysis only - cannot detect runtime or logic vulnerabilities
75
+ - Limited to 6 common vulnerability types
76
+ - Best used as a first-pass screening tool, not a replacement for professional audits
77
+
78
+ ## πŸ“œ License
79
+
80
+ Apache 2.0 - Free for commercial and research use
81
+
82
+ ---
83
+
84
+ **Built with ❀️ for the Mistral Hackathon 2026**
85
+
86
+ *Demonstrating that fine-tuning unlocks new capabilities: custom tokens + structured outputs = production-ready security analysis*
app.py ADDED
@@ -0,0 +1,256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
4
+ import os
5
+
6
+ # Model configuration
7
+ MODEL_ID = "mistral-hackaton-2026/Mistral_SmartContract_Security"
8
+
9
+ print("Loading model... This may take a few minutes on first run.")
10
+
11
+ # Load tokenizer
12
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
13
+ tokenizer.pad_token = tokenizer.pad_token or tokenizer.eos_token
14
+
15
+ # Load model with 4-bit quantization for efficient inference
16
+ bnb_config = BitsAndBytesConfig(
17
+ load_in_4bit=True,
18
+ bnb_4bit_quant_type="nf4",
19
+ bnb_4bit_compute_dtype=torch.float16,
20
+ bnb_4bit_use_double_quant=False,
21
+ )
22
+
23
+ model = AutoModelForCausalLM.from_pretrained(
24
+ MODEL_ID,
25
+ quantization_config=bnb_config,
26
+ device_map="auto",
27
+ torch_dtype=torch.float16,
28
+ trust_remote_code=True
29
+ )
30
+
31
+ print("Model loaded successfully!")
32
+
33
+ # Sample contracts for demo
34
+ SAMPLE_CONTRACTS = {
35
+ "Reentrancy Attack": '''// SPDX-License-Identifier: MIT
36
+ pragma solidity ^0.8.0;
37
+
38
+ contract VulnerableBank {
39
+ mapping(address => uint256) public balances;
40
+
41
+ function deposit() public payable {
42
+ balances[msg.sender] += msg.value;
43
+ }
44
+
45
+ function withdraw(uint256 amount) public {
46
+ require(balances[msg.sender] >= amount);
47
+ (bool success, ) = msg.sender.call{value: amount}("");
48
+ require(success);
49
+ balances[msg.sender] -= amount; // VULNERABLE: State updated after external call
50
+ }
51
+ }''',
52
+ "Integer Overflow": '''// SPDX-License-Identifier: MIT
53
+ pragma solidity 0.7.0;
54
+
55
+ contract VulnerableToken {
56
+ mapping(address => uint256) public balances;
57
+
58
+ function transfer(address to, uint256 amount) public {
59
+ balances[msg.sender] -= amount; // VULNERABLE: No underflow check
60
+ balances[to] += amount;
61
+ }
62
+ }''',
63
+ "Access Control Missing": '''// SPDX-License-Identifier: MIT
64
+ pragma solidity ^0.8.0;
65
+
66
+ contract VulnerableAdmin {
67
+ address public owner;
68
+
69
+ constructor() {
70
+ owner = msg.sender;
71
+ }
72
+
73
+ function setOwner(address newOwner) public { // VULNERABLE: Anyone can change owner
74
+ owner = newOwner;
75
+ }
76
+ }''',
77
+ "Secure Contract": '''// SPDX-License-Identifier: MIT
78
+ pragma solidity ^0.8.0;
79
+
80
+ contract SecureVault {
81
+ mapping(address => uint256) public balances;
82
+ bool private locked;
83
+
84
+ modifier noReentrant() {
85
+ require(!locked, "No reentrancy");
86
+ locked = true;
87
+ _;
88
+ locked = false;
89
+ }
90
+
91
+ function withdraw(uint256 amount) public noReentrant {
92
+ require(balances[msg.sender] >= amount);
93
+ balances[msg.sender] -= amount;
94
+ (bool success, ) = msg.sender.call{value: amount}("");
95
+ require(success);
96
+ }
97
+ }'''
98
+ }
99
+
100
+ def analyze_contract(contract_code, show_tokens=False):
101
+ """Analyze smart contract for vulnerabilities"""
102
+
103
+ if not contract_code.strip():
104
+ return "⚠️ Please enter a smart contract to analyze."
105
+
106
+ # Prepare input
107
+ input_text = f"""<s>[INST] Analyze this contract and provide structured security report:
108
+
109
+ {contract_code} [/INST]
110
+
111
+ """
112
+
113
+ inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=2048)
114
+
115
+ if torch.cuda.is_available():
116
+ inputs = inputs.to("cuda")
117
+
118
+ # Generate analysis
119
+ model.eval()
120
+ with torch.no_grad():
121
+ outputs = model.generate(
122
+ **inputs,
123
+ max_new_tokens=300,
124
+ temperature=0.3,
125
+ do_sample=True,
126
+ top_p=0.95
127
+ )
128
+
129
+ response = tokenizer.decode(outputs[0], skip_special_tokens=False)
130
+ analysis = response.split('[/INST]')[-1].strip()
131
+
132
+ # Format output
133
+ if show_tokens:
134
+ output = f"### πŸ” Security Analysis (with Custom Tokens)\n\n```\n{analysis}\n```"
135
+ else:
136
+ # Clean up for display
137
+ clean_analysis = analysis.replace('</s>', '').strip()
138
+
139
+ # Check for vulnerabilities
140
+ is_vulnerable = any([
141
+ '<CRITICAL>' in analysis,
142
+ '<HIGH>' in analysis,
143
+ '<VULN_' in analysis
144
+ ])
145
+
146
+ if is_vulnerable:
147
+ output = f"""### 🚨 VULNERABILITY DETECTED
148
+
149
+ {clean_analysis}
150
+
151
+ ---
152
+ **Model**: Mistral-7B Fine-tuned with 38 Custom Security Tokens
153
+ **Accuracy**: 99.6% | **Precision**: 100% | **Recall**: 99.3%
154
+ """
155
+ else:
156
+ output = f"""### βœ… CONTRACT APPEARS SECURE
157
+
158
+ {clean_analysis}
159
+
160
+ ---
161
+ **Model**: Mistral-7B Fine-tuned with 38 Custom Security Tokens
162
+ **Accuracy**: 99.6% | **Precision**: 100% | **Recall**: 99.3%
163
+ """
164
+
165
+ return output
166
+
167
+ # Create Gradio interface
168
+ with gr.Blocks(theme=gr.themes.Soft(), title="Smart Contract Security Analyzer") as demo:
169
+ gr.Markdown("""
170
+ # πŸ” Smart Contract Security Analyzer
171
+ ### Powered by Fine-Tuned Mistral-7B with Custom Security Tokens
172
+
173
+ **Hackathon Submission Features:**
174
+ - ✨ 38 custom security tokens (NEW capability)
175
+ - πŸ“Š 99.6% accuracy on 30K balanced dataset
176
+ - 🎯 100% precision (zero false positives)
177
+ - πŸ“ˆ +28.6% improvement over base Mistral
178
+ - πŸ—οΈ Structured XML-style vulnerability reports
179
+ """)
180
+
181
+ with gr.Row():
182
+ with gr.Column(scale=2):
183
+ contract_input = gr.Code(
184
+ label="πŸ“ Paste Solidity Contract Here",
185
+ language="solidity",
186
+ lines=20,
187
+ placeholder="// SPDX-License-Identifier: MIT\npragma solidity ^0.8.0;\n\ncontract MyContract {\n // Your contract code here\n}"
188
+ )
189
+
190
+ with gr.Row():
191
+ analyze_btn = gr.Button("πŸ” Analyze Contract", variant="primary", size="lg")
192
+ clear_btn = gr.Button("πŸ—‘οΈ Clear", size="lg")
193
+
194
+ show_tokens_checkbox = gr.Checkbox(
195
+ label="Show custom security tokens in output",
196
+ value=False
197
+ )
198
+
199
+ with gr.Column(scale=1):
200
+ gr.Markdown("### πŸ“š Sample Contracts")
201
+ sample_dropdown = gr.Dropdown(
202
+ choices=list(SAMPLE_CONTRACTS.keys()),
203
+ label="Load Example",
204
+ value=None
205
+ )
206
+ load_sample_btn = gr.Button("Load Sample", size="sm")
207
+
208
+ output_display = gr.Markdown(label="Analysis Result")
209
+
210
+ # Performance stats
211
+ gr.Markdown("""
212
+ ---
213
+ ### πŸ“Š Model Performance Metrics
214
+
215
+ | Metric | Base Mistral | Fine-Tuned | Improvement |
216
+ |--------|-------------|------------|-------------|
217
+ | **Accuracy** | 71.0% | **99.6%** | +28.6% |
218
+ | **Precision** | 64.2% | **100.0%** | +35.8% |
219
+ | **Recall** | 100.0% | 99.3% | -0.7% |
220
+ | **F1 Score** | 0.782 | **0.996** | +0.214 |
221
+ | **Custom Tokens** | 0/38 | **25/38** | NEW! |
222
+ | **Structured Output** | No | **Yes** | NEW! |
223
+
224
+ **Training Details:**
225
+ - 30,000 balanced samples (50% vulnerable, 50% safe)
226
+ - 6 vulnerability types: Reentrancy, Overflow, Access Control, Unchecked Call, DoS, Timestamp
227
+ - 4-bit quantization with LoRA (1.1% trainable params)
228
+ - 5.5 hours training on Google Colab G4 GPU
229
+ """)
230
+
231
+ # Event handlers
232
+ def load_sample(sample_name):
233
+ if sample_name:
234
+ return SAMPLE_CONTRACTS[sample_name]
235
+ return ""
236
+
237
+ analyze_btn.click(
238
+ fn=analyze_contract,
239
+ inputs=[contract_input, show_tokens_checkbox],
240
+ outputs=output_display
241
+ )
242
+
243
+ load_sample_btn.click(
244
+ fn=load_sample,
245
+ inputs=sample_dropdown,
246
+ outputs=contract_input
247
+ )
248
+
249
+ clear_btn.click(
250
+ fn=lambda: ("", ""),
251
+ outputs=[contract_input, output_display]
252
+ )
253
+
254
+ # Launch demo
255
+ if __name__ == "__main__":
256
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ transformers>=4.36.0
2
+ torch>=2.1.0
3
+ accelerate>=0.25.0
4
+ bitsandbytes>=0.41.0
5
+ peft>=0.7.0
6
+ gradio>=4.0.0
7
+ sentencepiece>=0.1.99
8
+ protobuf>=3.20.0