Spaces:

debashis2007
/

SecurityIncidentAnalyzer

Sleeping

App Files Files Community

SecurityIncidentAnalyzer / .github /copilot-instructions.md

Debashis

Initial commit: Security Incident Analyzer with LLM integration

0355450 2 months ago

preview code

raw

history blame contribute delete

7.23 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

Copilot Instructions for Security Incident Analyzer

Project Overview

Security Incident Analyzer is an LLM-powered web app for security teams to paste logs/alerts and get immediate, director-level analysis: what happened, severity, remediation steps.

Tech Stack: Python, Gradio (UI), async/await (concurrency), pluggable LLM providers (OpenAI/local/mock)

Key Insight: The project abstracts LLM providers to support OpenAI, local models (Ollama), and mock inference without changing business logic.

Architecture & Component Map

src/
├── app.py                    # Gradio interface entry point
├── analyzer/
│   ├── security.py          # IncidentAnalyzer: parses LLM response into structured results
│   └── models.py            # RiskLevel enum, SecurityAnalysis dataclass
├── llm/
│   ├── provider.py          # Abstract BaseLLMProvider + OpenAI/Local/Mock implementations
│   └── prompts.py           # get_analysis_prompt() template generation
└── utils/
    ├── config.py            # Config class, LLMProvider enum, global config instance
    └── logger.py            # setup_logger() for consistent logging

Data Flow

User submits log → app.py:analyze_incident_sync()
Creates provider via create_provider() factory based on config.llm_provider
Passes log to IncidentAnalyzer.analyze() → calls provider.analyze() with templated prompt
LLM returns text response → IncidentAnalyzer._parse_response() extracts structured data using regex
Returns SecurityAnalysis → Gradio formats for display

Critical Conventions & Patterns

1. Provider Abstraction Pattern

All LLM interactions go through BaseLLMProvider subclasses. To add a new provider:

Subclass BaseLLMProvider and implement async analyze(log_text: str) -> str
Update create_provider() factory in provider.py
Add to LLMProvider enum in config.py

Why: Decouples business logic (incident analysis) from LLM infrastructure (API clients, models).

2. Configuration Management

Environment variables drive runtime behavior via src/utils/config.py:

LLM_PROVIDER: Controls which provider to use
OPENAI_API_KEY: Required only when LLM_PROVIDER=openai
LLM_MODEL: Optional model override; falls back to sensible defaults
DEBUG: Enables verbose logging

Validate before use: Call config.validate() in app.py to catch config errors early (e.g., missing API key).

3. Structured Output Parsing

LLM responses are free-form text. IncidentAnalyzer._parse_response() uses regex to extract:

Summary → First ~200 chars
Risk Level → Match against RiskLevel enum, default to MEDIUM
Remediation → Multi-line instruction block
Indicators → Lines prefixed with -, •, or keywords like "Indicator:"

Why regex, not JSON? Regex is permissive and works with any LLM format. Improves reliability across models.

4. Async/Await for Concurrency

All LLM calls are async (provider.analyze()) to avoid blocking the UI when network latency occurs. app.py wraps async in sync (analyze_incident_sync()) for Gradio compatibility.

Pattern: Use async def in provider classes; call with asyncio.run() in sync contexts.

5. Logging with Context

Use setup_logger(__name__) at module level. Include severity and context:

logger.info(f"Analyzing log input ({len(log_text)} chars)")
logger.error(f"OpenAI API error: {e}", exc_info=True)  # exc_info for tracebacks

Common Tasks & Commands

Run Locally

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env

# Edit .env with your API key (if using OpenAI)
# LLM_PROVIDER=mock runs without secrets

# Start the app (defaults to http://localhost:7860)
python src/app.py

Test Analysis Logic

# Run all tests
pytest tests/

# Test a specific module
pytest tests/test_analyzer.py -v

# With coverage
pytest --cov=src tests/

Switch LLM Providers

Update .env:

# Use mock (no API required, deterministic)
LLM_PROVIDER=mock

# Use OpenAI (requires OPENAI_API_KEY)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...

# Use local LLM (requires Ollama running on localhost:11434)
LLM_PROVIDER=local
LLM_MODEL=mistral:7b

Deploy to Hugging Face Spaces

Create a Hugging Face Space (CPU tier is sufficient)
Point Space to this repository
HF detects Procfile and requirements.txt, automatically launches the Gradio app

Extension Points

Add a new LLM provider:

Create subclass of BaseLLMProvider in src/llm/provider.py
Implement async analyze() method
Add enum value to LLMProvider in src/utils/config.py
Update create_provider() factory
Add integration test

Customize analysis behavior:

Modify prompt template in src/llm/prompts.py (e.g., ask for different output format)
Adjust parsing regex in IncidentAnalyzer._parse_response() to match new prompt structure

Add new output field:

Extend SecurityAnalysis dataclass in src/analyzer/models.py
Update prompt to include that field
Update parsing logic in _parse_response()
Update Gradio output format in app.py

Testing Strategy

Unit tests for parsing logic (test_analyzer.py): Mock LLM responses, verify regex extraction
Integration tests for providers (test_llm_providers.py): Mock HTTP responses, test async/await
E2E tests for the full flow: Use MockLLMProvider to avoid API costs

Use MockLLMProvider for all tests—it's deterministic and free.

Key Files to Review First

src/app.py — Entry point, Gradio UI, orchestration
src/analyzer/security.py — Core logic: how responses are parsed
src/llm/provider.py — How different LLMs are called (the abstraction)
src/utils/config.py — Environment-driven configuration
README.md — High-level project summary and deployment instructions

Known Patterns & Anti-Patterns

✅ Do:

Use environment variables for all configuration
Call config.validate() to catch errors early
Use async/await for I/O-bound operations
Test with MockLLMProvider to keep tests fast and free
Log with context (include variable state, not just errors)

❌ Don't:

Hardcode API keys or model names in code
Sync I/O calls that block the Gradio UI
Assume a specific LLM output format—use flexible regex parsing
Catch exceptions silently without logging
Mix provider-specific logic into IncidentAnalyzer (keep separation clean)

Quick Troubleshooting

Issue	Solution
`ValueError: OPENAI_API_KEY required`	Set `OPENAI_API_KEY` in `.env` and call `config.validate()`
Gradio not starting	Verify port 7860 is free; check `DEBUG=true` in `.env` for more logs
LLM calls timing out	Increase timeout in provider (default 30s for OpenAI, 60s for local)
Parsing is missing fields	Check prompt format in `prompts.py` matches regex patterns in `_parse_response()`
Mock provider not activated	Verify `.env` has `LLM_PROVIDER=mock` (default is mock if not set)