| --- |
| language: |
| - en |
| license: llama2 |
| base_model: codellama/CodeLlama-7b-instruct-hf |
| tags: |
| - code |
| - security |
| - peft |
| - lora |
| - qlora |
| - vulnerability-detection |
| - api-security |
| - causal-lm |
| datasets: |
| - custom |
| pipeline_tag: text-generation |
| --- |
| |
| # API Security QLoRA — Code Llama 7B |
|
|
| A QLoRA fine-tuned adapter on top of **CodeLlama-7b-instruct-hf**, trained to detect security vulnerabilities in API endpoint source code. Given a raw code snippet, the model produces a structured analysis identifying vulnerability type, severity, CWE, and a remediated version of the code. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | **Base Model** | `codellama/CodeLlama-7b-instruct-hf` | |
| | **Fine-tuning Method** | QLoRA (4-bit NF4 quantization) | |
| | **LoRA Rank (r)** | 16 | |
| | **LoRA Alpha** | 32 | |
| | **LoRA Dropout** | 0.05 | |
| | **Target Modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj` | |
| | **Task** | Causal LM / Code Security Analysis | |
| | **Training Steps** | 531 | |
| | **Training Hardware** | Google Colab T4 (16GB VRAM) | |
|
|
| --- |
|
|
| ## Training Data |
|
|
| Fine-tuned on a custom dataset of **10,000 API-specific vulnerability samples** (synthetic + augmented) covering 19 vulnerability types mapped to OWASP API Top 10. |
|
|
| ### Language Distribution |
|
|
| | Language | Share | Frameworks | |
| |---|---|---| |
| | Python | 46% | Flask, FastAPI, Django | |
| | JavaScript | 25% | Express.js, NestJS | |
| | Java | 15% | Spring Boot | |
| | PHP / Go / Ruby / C# | 14% | Laravel, Gin, Rails, ASP.NET | |
|
|
| ### Vulnerability Distribution |
|
|
| | Vulnerability | Samples | CWE | |
| |---|---|---| |
| | SQL Injection | 2,425 | CWE-89 | |
| | Mass Assignment | 1,307 | CWE-915 | |
| | Path Traversal | 943 | CWE-22 | |
| | IDOR | 860 | CWE-639 | |
| | Broken Authorization | 792 | CWE-285 | |
| | Command Injection | 600 | CWE-78 | |
|
|
| ### Severity Breakdown |
|
|
| - **Critical (43%)**: RCE, SQLi, unauthorized admin access |
| - **High (41%)**: Data leaks, IDOR, authorization bypass |
| - **Medium / Clean (16%)**: XSS, input validation warnings, baseline clean samples |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| from peft import PeftModel |
| import torch |
| |
| base_model_id = "codellama/CodeLlama-7b-instruct-hf" |
| adapter_id = "harsharajkumar273/api-security-qlora" |
| |
| tokenizer = AutoTokenizer.from_pretrained(adapter_id, use_fast=False) |
| |
| base = AutoModelForCausalLM.from_pretrained( |
| base_model_id, |
| torch_dtype=torch.float16, |
| device_map="auto", |
| ) |
| model = PeftModel.from_pretrained(base, adapter_id) |
| model.eval() |
| |
| code_snippet = """ |
| @app.route('/user/<int:user_id>') |
| def get_user(user_id): |
| query = f"SELECT * FROM users WHERE id = {user_id}" |
| result = db.execute(query) |
| return jsonify(result) |
| """ |
| |
| prompt = f"[INST] Analyze this API endpoint for security vulnerabilities:\n\n{code_snippet} [/INST]" |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| --- |
|
|
| ## Integration with API Security Scanner |
|
|
| This adapter is the default model in the [API Security Scanner](https://github.com/harsharajkumar/api-security) project. It is loaded automatically — no manual path configuration needed: |
|
|
| ```bash |
| git clone https://github.com/harsharajkumar/api-security |
| cd api-security |
| pip install -r requirements.txt |
| streamlit run app.py |
| ``` |
|
|
| The scanner will download this adapter from the Hub on first run and cache it locally. |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| - Automated API security auditing in CI/CD pipelines |
| - Developer tooling for identifying vulnerable endpoint patterns |
| - Security research and OWASP API Top 10 education |
|
|
| ## Out of Scope |
|
|
| - General-purpose code generation |
| - Non-API code (UI components, data processing scripts, etc.) |
| - Production security decisions without human review |
|
|
| --- |
|
|
| ## Credits |
|
|
| Developed as part of **CS6380 — API Security Project** |
|
|
| **Authors:** Siddhanth Nilesh Jagtap · Tanuj Kenchannavar · Harsha Raj Kumar |
|
|