SecurityIncidentAnalyzer / .github /copilot-instructions.md
Debashis
Initial commit: Security Incident Analyzer with LLM integration
0355450
# Copilot Instructions for Security Incident Analyzer
## Project Overview
**Security Incident Analyzer** is an LLM-powered web app for security teams to paste logs/alerts and get immediate, director-level analysis: what happened, severity, remediation steps.
**Tech Stack:** Python, Gradio (UI), async/await (concurrency), pluggable LLM providers (OpenAI/local/mock)
**Key Insight:** The project abstracts LLM providers to support OpenAI, local models (Ollama), and mock inference without changing business logic.
## Architecture & Component Map
```
src/
β”œβ”€β”€ app.py # Gradio interface entry point
β”œβ”€β”€ analyzer/
β”‚ β”œβ”€β”€ security.py # IncidentAnalyzer: parses LLM response into structured results
β”‚ └── models.py # RiskLevel enum, SecurityAnalysis dataclass
β”œβ”€β”€ llm/
β”‚ β”œβ”€β”€ provider.py # Abstract BaseLLMProvider + OpenAI/Local/Mock implementations
β”‚ └── prompts.py # get_analysis_prompt() template generation
└── utils/
β”œβ”€β”€ config.py # Config class, LLMProvider enum, global config instance
└── logger.py # setup_logger() for consistent logging
```
### Data Flow
1. User submits log β†’ `app.py:analyze_incident_sync()`
2. Creates provider via `create_provider()` factory based on `config.llm_provider`
3. Passes log to `IncidentAnalyzer.analyze()` β†’ calls `provider.analyze()` with templated prompt
4. LLM returns text response β†’ `IncidentAnalyzer._parse_response()` extracts structured data using regex
5. Returns `SecurityAnalysis` β†’ Gradio formats for display
## Critical Conventions & Patterns
### 1. Provider Abstraction Pattern
All LLM interactions go through `BaseLLMProvider` subclasses. To add a new provider:
- Subclass `BaseLLMProvider` and implement `async analyze(log_text: str) -> str`
- Update `create_provider()` factory in `provider.py`
- Add to `LLMProvider` enum in `config.py`
**Why:** Decouples business logic (incident analysis) from LLM infrastructure (API clients, models).
### 2. Configuration Management
Environment variables drive runtime behavior via `src/utils/config.py`:
- `LLM_PROVIDER`: Controls which provider to use
- `OPENAI_API_KEY`: Required only when `LLM_PROVIDER=openai`
- `LLM_MODEL`: Optional model override; falls back to sensible defaults
- `DEBUG`: Enables verbose logging
**Validate before use:** Call `config.validate()` in `app.py` to catch config errors early (e.g., missing API key).
### 3. Structured Output Parsing
LLM responses are free-form text. `IncidentAnalyzer._parse_response()` uses regex to extract:
- **Summary** β†’ First ~200 chars
- **Risk Level** β†’ Match against `RiskLevel` enum, default to MEDIUM
- **Remediation** β†’ Multi-line instruction block
- **Indicators** β†’ Lines prefixed with `-`, `β€’`, or keywords like "Indicator:"
**Why regex, not JSON?** Regex is permissive and works with any LLM format. Improves reliability across models.
### 4. Async/Await for Concurrency
All LLM calls are `async` (`provider.analyze()`) to avoid blocking the UI when network latency occurs. `app.py` wraps async in sync (`analyze_incident_sync()`) for Gradio compatibility.
**Pattern:** Use `async def` in provider classes; call with `asyncio.run()` in sync contexts.
### 5. Logging with Context
Use `setup_logger(__name__)` at module level. Include severity and context:
```python
logger.info(f"Analyzing log input ({len(log_text)} chars)")
logger.error(f"OpenAI API error: {e}", exc_info=True) # exc_info for tracebacks
```
## Common Tasks & Commands
### Run Locally
```bash
# Install dependencies
pip install -r requirements.txt
# Copy environment template
cp .env.example .env
# Edit .env with your API key (if using OpenAI)
# LLM_PROVIDER=mock runs without secrets
# Start the app (defaults to http://localhost:7860)
python src/app.py
```
### Test Analysis Logic
```bash
# Run all tests
pytest tests/
# Test a specific module
pytest tests/test_analyzer.py -v
# With coverage
pytest --cov=src tests/
```
### Switch LLM Providers
Update `.env`:
```bash
# Use mock (no API required, deterministic)
LLM_PROVIDER=mock
# Use OpenAI (requires OPENAI_API_KEY)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
# Use local LLM (requires Ollama running on localhost:11434)
LLM_PROVIDER=local
LLM_MODEL=mistral:7b
```
### Deploy to Hugging Face Spaces
1. Create a Hugging Face Space (CPU tier is sufficient)
2. Point Space to this repository
3. HF detects `Procfile` and `requirements.txt`, automatically launches the Gradio app
## Extension Points
**Add a new LLM provider:**
1. Create subclass of `BaseLLMProvider` in `src/llm/provider.py`
2. Implement `async analyze()` method
3. Add enum value to `LLMProvider` in `src/utils/config.py`
4. Update `create_provider()` factory
5. Add integration test
**Customize analysis behavior:**
- Modify prompt template in `src/llm/prompts.py` (e.g., ask for different output format)
- Adjust parsing regex in `IncidentAnalyzer._parse_response()` to match new prompt structure
**Add new output field:**
1. Extend `SecurityAnalysis` dataclass in `src/analyzer/models.py`
2. Update prompt to include that field
3. Update parsing logic in `_parse_response()`
4. Update Gradio output format in `app.py`
## Testing Strategy
- **Unit tests** for parsing logic (`test_analyzer.py`): Mock LLM responses, verify regex extraction
- **Integration tests** for providers (`test_llm_providers.py`): Mock HTTP responses, test async/await
- **E2E tests** for the full flow: Use `MockLLMProvider` to avoid API costs
Use `MockLLMProvider` for all testsβ€”it's deterministic and free.
## Key Files to Review First
1. **`src/app.py`** β€” Entry point, Gradio UI, orchestration
2. **`src/analyzer/security.py`** β€” Core logic: how responses are parsed
3. **`src/llm/provider.py`** β€” How different LLMs are called (the abstraction)
4. **`src/utils/config.py`** β€” Environment-driven configuration
5. **`README.md`** β€” High-level project summary and deployment instructions
## Known Patterns & Anti-Patterns
βœ… **Do:**
- Use environment variables for all configuration
- Call `config.validate()` to catch errors early
- Use `async`/`await` for I/O-bound operations
- Test with `MockLLMProvider` to keep tests fast and free
- Log with context (include variable state, not just errors)
❌ **Don't:**
- Hardcode API keys or model names in code
- Sync I/O calls that block the Gradio UI
- Assume a specific LLM output formatβ€”use flexible regex parsing
- Catch exceptions silently without logging
- Mix provider-specific logic into `IncidentAnalyzer` (keep separation clean)
## Quick Troubleshooting
| Issue | Solution |
|-------|----------|
| `ValueError: OPENAI_API_KEY required` | Set `OPENAI_API_KEY` in `.env` and call `config.validate()` |
| Gradio not starting | Verify port 7860 is free; check `DEBUG=true` in `.env` for more logs |
| LLM calls timing out | Increase timeout in provider (default 30s for OpenAI, 60s for local) |
| Parsing is missing fields | Check prompt format in `prompts.py` matches regex patterns in `_parse_response()` |
| Mock provider not activated | Verify `.env` has `LLM_PROVIDER=mock` (default is mock if not set) |