SecurityIncidentAnalyzer / .github /copilot-instructions.md
Debashis
Initial commit: Security Incident Analyzer with LLM integration
0355450

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

Copilot Instructions for Security Incident Analyzer

Project Overview

Security Incident Analyzer is an LLM-powered web app for security teams to paste logs/alerts and get immediate, director-level analysis: what happened, severity, remediation steps.

Tech Stack: Python, Gradio (UI), async/await (concurrency), pluggable LLM providers (OpenAI/local/mock)

Key Insight: The project abstracts LLM providers to support OpenAI, local models (Ollama), and mock inference without changing business logic.

Architecture & Component Map

src/
β”œβ”€β”€ app.py                    # Gradio interface entry point
β”œβ”€β”€ analyzer/
β”‚   β”œβ”€β”€ security.py          # IncidentAnalyzer: parses LLM response into structured results
β”‚   └── models.py            # RiskLevel enum, SecurityAnalysis dataclass
β”œβ”€β”€ llm/
β”‚   β”œβ”€β”€ provider.py          # Abstract BaseLLMProvider + OpenAI/Local/Mock implementations
β”‚   └── prompts.py           # get_analysis_prompt() template generation
└── utils/
    β”œβ”€β”€ config.py            # Config class, LLMProvider enum, global config instance
    └── logger.py            # setup_logger() for consistent logging

Data Flow

  1. User submits log β†’ app.py:analyze_incident_sync()
  2. Creates provider via create_provider() factory based on config.llm_provider
  3. Passes log to IncidentAnalyzer.analyze() β†’ calls provider.analyze() with templated prompt
  4. LLM returns text response β†’ IncidentAnalyzer._parse_response() extracts structured data using regex
  5. Returns SecurityAnalysis β†’ Gradio formats for display

Critical Conventions & Patterns

1. Provider Abstraction Pattern

All LLM interactions go through BaseLLMProvider subclasses. To add a new provider:

  • Subclass BaseLLMProvider and implement async analyze(log_text: str) -> str
  • Update create_provider() factory in provider.py
  • Add to LLMProvider enum in config.py

Why: Decouples business logic (incident analysis) from LLM infrastructure (API clients, models).

2. Configuration Management

Environment variables drive runtime behavior via src/utils/config.py:

  • LLM_PROVIDER: Controls which provider to use
  • OPENAI_API_KEY: Required only when LLM_PROVIDER=openai
  • LLM_MODEL: Optional model override; falls back to sensible defaults
  • DEBUG: Enables verbose logging

Validate before use: Call config.validate() in app.py to catch config errors early (e.g., missing API key).

3. Structured Output Parsing

LLM responses are free-form text. IncidentAnalyzer._parse_response() uses regex to extract:

  • Summary β†’ First ~200 chars
  • Risk Level β†’ Match against RiskLevel enum, default to MEDIUM
  • Remediation β†’ Multi-line instruction block
  • Indicators β†’ Lines prefixed with -, β€’, or keywords like "Indicator:"

Why regex, not JSON? Regex is permissive and works with any LLM format. Improves reliability across models.

4. Async/Await for Concurrency

All LLM calls are async (provider.analyze()) to avoid blocking the UI when network latency occurs. app.py wraps async in sync (analyze_incident_sync()) for Gradio compatibility.

Pattern: Use async def in provider classes; call with asyncio.run() in sync contexts.

5. Logging with Context

Use setup_logger(__name__) at module level. Include severity and context:

logger.info(f"Analyzing log input ({len(log_text)} chars)")
logger.error(f"OpenAI API error: {e}", exc_info=True)  # exc_info for tracebacks

Common Tasks & Commands

Run Locally

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env

# Edit .env with your API key (if using OpenAI)
# LLM_PROVIDER=mock runs without secrets

# Start the app (defaults to http://localhost:7860)
python src/app.py

Test Analysis Logic

# Run all tests
pytest tests/

# Test a specific module
pytest tests/test_analyzer.py -v

# With coverage
pytest --cov=src tests/

Switch LLM Providers

Update .env:

# Use mock (no API required, deterministic)
LLM_PROVIDER=mock

# Use OpenAI (requires OPENAI_API_KEY)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...

# Use local LLM (requires Ollama running on localhost:11434)
LLM_PROVIDER=local
LLM_MODEL=mistral:7b

Deploy to Hugging Face Spaces

  1. Create a Hugging Face Space (CPU tier is sufficient)
  2. Point Space to this repository
  3. HF detects Procfile and requirements.txt, automatically launches the Gradio app

Extension Points

Add a new LLM provider:

  1. Create subclass of BaseLLMProvider in src/llm/provider.py
  2. Implement async analyze() method
  3. Add enum value to LLMProvider in src/utils/config.py
  4. Update create_provider() factory
  5. Add integration test

Customize analysis behavior:

  • Modify prompt template in src/llm/prompts.py (e.g., ask for different output format)
  • Adjust parsing regex in IncidentAnalyzer._parse_response() to match new prompt structure

Add new output field:

  1. Extend SecurityAnalysis dataclass in src/analyzer/models.py
  2. Update prompt to include that field
  3. Update parsing logic in _parse_response()
  4. Update Gradio output format in app.py

Testing Strategy

  • Unit tests for parsing logic (test_analyzer.py): Mock LLM responses, verify regex extraction
  • Integration tests for providers (test_llm_providers.py): Mock HTTP responses, test async/await
  • E2E tests for the full flow: Use MockLLMProvider to avoid API costs

Use MockLLMProvider for all testsβ€”it's deterministic and free.

Key Files to Review First

  1. src/app.py β€” Entry point, Gradio UI, orchestration
  2. src/analyzer/security.py β€” Core logic: how responses are parsed
  3. src/llm/provider.py β€” How different LLMs are called (the abstraction)
  4. src/utils/config.py β€” Environment-driven configuration
  5. README.md β€” High-level project summary and deployment instructions

Known Patterns & Anti-Patterns

βœ… Do:

  • Use environment variables for all configuration
  • Call config.validate() to catch errors early
  • Use async/await for I/O-bound operations
  • Test with MockLLMProvider to keep tests fast and free
  • Log with context (include variable state, not just errors)

❌ Don't:

  • Hardcode API keys or model names in code
  • Sync I/O calls that block the Gradio UI
  • Assume a specific LLM output formatβ€”use flexible regex parsing
  • Catch exceptions silently without logging
  • Mix provider-specific logic into IncidentAnalyzer (keep separation clean)

Quick Troubleshooting

Issue Solution
ValueError: OPENAI_API_KEY required Set OPENAI_API_KEY in .env and call config.validate()
Gradio not starting Verify port 7860 is free; check DEBUG=true in .env for more logs
LLM calls timing out Increase timeout in provider (default 30s for OpenAI, 60s for local)
Parsing is missing fields Check prompt format in prompts.py matches regex patterns in _parse_response()
Mock provider not activated Verify .env has LLM_PROVIDER=mock (default is mock if not set)