Spaces:
Running
Running
File size: 3,831 Bytes
4b445f6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | """
Security Agent
===============
The Security Agent acts as a senior application security engineer (AppSec).
It reviews every changed line through the lens of exploitability, data exposure,
and authentication integrity.
Architecture:
1. Run static analysis tools (Bandit + detect-secrets) on changed files
2. Combine static results with PR diff and full file contents
3. Send everything to Groq's Llama-3.1-70B with a security-focused system prompt
4. LLM produces structured JSON findings with CWE IDs and suggested fixes
Why both static tools AND an LLM?
Static tools (Bandit):
β
Fast, deterministic, zero false negatives for known patterns
β
Free β no API cost
β Can't understand context (doesn't know if input is already sanitized)
β Only catches patterns it has rules for
LLM (Llama-3.1-70B):
β
Understands context, intent, data flow between functions
β
Can catch novel vulnerability patterns
β
Provides natural language explanations and fixes
β Can hallucinate findings (false positives)
β Costs API calls (though Groq's free tier is generous)
Together: static tools provide HIGH-CONFIDENCE anchors, the LLM provides DEPTH.
The Synthesizer (Week 7) will merge and deduplicate their outputs.
"""
from __future__ import annotations
from pathlib import Path
import structlog
from app.agents.base_agent import BaseAgent
from app.github.client import PRData
from app.tools.bandit_tool import run_bandit
from app.tools.detect_secrets_tool import run_detect_secrets
logger = structlog.get_logger()
class SecurityAgent(BaseAgent):
"""
Security-focused code review agent.
Inherits from BaseAgent which provides:
- Groq LLM client (ChatGroq with Llama-3.1-70B)
- Structured output parsing (with_structured_output)
- Error handling and timing
- The review() method that orchestrates the flow
This class only needs to provide:
- agent_name: "security"
- system_prompt: loaded from prompts/security_system.md
- run_static_analysis(): runs Bandit + detect-secrets
"""
@property
def agent_name(self) -> str:
return "security"
@property
def system_prompt(self) -> str:
"""
Load the system prompt from the Markdown file.
We store prompts as separate files (not inline strings) because:
1. They're long (50+ lines) β inline strings clutter the code
2. They change frequently during prompt tuning (Week 9)
3. Non-engineers (product managers) can review/edit them
4. Git diff shows prompt changes clearly
"""
prompt_path = Path(__file__).resolve().parent.parent.parent / "prompts" / "security_system.md"
return prompt_path.read_text(encoding="utf-8")
async def run_static_analysis(self, pr_data: PRData) -> str:
"""
Run security-specific static analysis tools.
We run Bandit and detect-secrets in sequence (not parallel) because:
1. Each takes <5 seconds β parallelism gains are minimal
2. They both write to temp dirs β simpler to keep sequential
3. If one fails, the other still runs (independent try/except in each tool)
The results are concatenated into a single string that gets injected
into the LLM prompt. The LLM uses these as high-confidence signals
to anchor its own analysis.
"""
results = []
# Run Bandit (Python security linter)
bandit_output = await run_bandit(pr_data.file_contents)
if bandit_output:
results.append(bandit_output)
# Run detect-secrets (credential scanner)
secrets_output = await run_detect_secrets(pr_data.file_contents)
if secrets_output:
results.append(secrets_output)
return "\n\n".join(results) if results else ""
|