# Copilot Instructions for Security Incident Analyzer ## Project Overview **Security Incident Analyzer** is an LLM-powered web app for security teams to paste logs/alerts and get immediate, director-level analysis: what happened, severity, remediation steps. **Tech Stack:** Python, Gradio (UI), async/await (concurrency), pluggable LLM providers (OpenAI/local/mock) **Key Insight:** The project abstracts LLM providers to support OpenAI, local models (Ollama), and mock inference without changing business logic. ## Architecture & Component Map ``` src/ ├── app.py # Gradio interface entry point ├── analyzer/ │ ├── security.py # IncidentAnalyzer: parses LLM response into structured results │ └── models.py # RiskLevel enum, SecurityAnalysis dataclass ├── llm/ │ ├── provider.py # Abstract BaseLLMProvider + OpenAI/Local/Mock implementations │ └── prompts.py # get_analysis_prompt() template generation └── utils/ ├── config.py # Config class, LLMProvider enum, global config instance └── logger.py # setup_logger() for consistent logging ``` ### Data Flow 1. User submits log → `app.py:analyze_incident_sync()` 2. Creates provider via `create_provider()` factory based on `config.llm_provider` 3. Passes log to `IncidentAnalyzer.analyze()` → calls `provider.analyze()` with templated prompt 4. LLM returns text response → `IncidentAnalyzer._parse_response()` extracts structured data using regex 5. Returns `SecurityAnalysis` → Gradio formats for display ## Critical Conventions & Patterns ### 1. Provider Abstraction Pattern All LLM interactions go through `BaseLLMProvider` subclasses. To add a new provider: - Subclass `BaseLLMProvider` and implement `async analyze(log_text: str) -> str` - Update `create_provider()` factory in `provider.py` - Add to `LLMProvider` enum in `config.py` **Why:** Decouples business logic (incident analysis) from LLM infrastructure (API clients, models). ### 2. Configuration Management Environment variables drive runtime behavior via `src/utils/config.py`: - `LLM_PROVIDER`: Controls which provider to use - `OPENAI_API_KEY`: Required only when `LLM_PROVIDER=openai` - `LLM_MODEL`: Optional model override; falls back to sensible defaults - `DEBUG`: Enables verbose logging **Validate before use:** Call `config.validate()` in `app.py` to catch config errors early (e.g., missing API key). ### 3. Structured Output Parsing LLM responses are free-form text. `IncidentAnalyzer._parse_response()` uses regex to extract: - **Summary** → First ~200 chars - **Risk Level** → Match against `RiskLevel` enum, default to MEDIUM - **Remediation** → Multi-line instruction block - **Indicators** → Lines prefixed with `-`, `•`, or keywords like "Indicator:" **Why regex, not JSON?** Regex is permissive and works with any LLM format. Improves reliability across models. ### 4. Async/Await for Concurrency All LLM calls are `async` (`provider.analyze()`) to avoid blocking the UI when network latency occurs. `app.py` wraps async in sync (`analyze_incident_sync()`) for Gradio compatibility. **Pattern:** Use `async def` in provider classes; call with `asyncio.run()` in sync contexts. ### 5. Logging with Context Use `setup_logger(__name__)` at module level. Include severity and context: ```python logger.info(f"Analyzing log input ({len(log_text)} chars)") logger.error(f"OpenAI API error: {e}", exc_info=True) # exc_info for tracebacks ``` ## Common Tasks & Commands ### Run Locally ```bash # Install dependencies pip install -r requirements.txt # Copy environment template cp .env.example .env # Edit .env with your API key (if using OpenAI) # LLM_PROVIDER=mock runs without secrets # Start the app (defaults to http://localhost:7860) python src/app.py ``` ### Test Analysis Logic ```bash # Run all tests pytest tests/ # Test a specific module pytest tests/test_analyzer.py -v # With coverage pytest --cov=src tests/ ``` ### Switch LLM Providers Update `.env`: ```bash # Use mock (no API required, deterministic) LLM_PROVIDER=mock # Use OpenAI (requires OPENAI_API_KEY) LLM_PROVIDER=openai OPENAI_API_KEY=sk-... # Use local LLM (requires Ollama running on localhost:11434) LLM_PROVIDER=local LLM_MODEL=mistral:7b ``` ### Deploy to Hugging Face Spaces 1. Create a Hugging Face Space (CPU tier is sufficient) 2. Point Space to this repository 3. HF detects `Procfile` and `requirements.txt`, automatically launches the Gradio app ## Extension Points **Add a new LLM provider:** 1. Create subclass of `BaseLLMProvider` in `src/llm/provider.py` 2. Implement `async analyze()` method 3. Add enum value to `LLMProvider` in `src/utils/config.py` 4. Update `create_provider()` factory 5. Add integration test **Customize analysis behavior:** - Modify prompt template in `src/llm/prompts.py` (e.g., ask for different output format) - Adjust parsing regex in `IncidentAnalyzer._parse_response()` to match new prompt structure **Add new output field:** 1. Extend `SecurityAnalysis` dataclass in `src/analyzer/models.py` 2. Update prompt to include that field 3. Update parsing logic in `_parse_response()` 4. Update Gradio output format in `app.py` ## Testing Strategy - **Unit tests** for parsing logic (`test_analyzer.py`): Mock LLM responses, verify regex extraction - **Integration tests** for providers (`test_llm_providers.py`): Mock HTTP responses, test async/await - **E2E tests** for the full flow: Use `MockLLMProvider` to avoid API costs Use `MockLLMProvider` for all tests—it's deterministic and free. ## Key Files to Review First 1. **`src/app.py`** — Entry point, Gradio UI, orchestration 2. **`src/analyzer/security.py`** — Core logic: how responses are parsed 3. **`src/llm/provider.py`** — How different LLMs are called (the abstraction) 4. **`src/utils/config.py`** — Environment-driven configuration 5. **`README.md`** — High-level project summary and deployment instructions ## Known Patterns & Anti-Patterns ✅ **Do:** - Use environment variables for all configuration - Call `config.validate()` to catch errors early - Use `async`/`await` for I/O-bound operations - Test with `MockLLMProvider` to keep tests fast and free - Log with context (include variable state, not just errors) ❌ **Don't:** - Hardcode API keys or model names in code - Sync I/O calls that block the Gradio UI - Assume a specific LLM output format—use flexible regex parsing - Catch exceptions silently without logging - Mix provider-specific logic into `IncidentAnalyzer` (keep separation clean) ## Quick Troubleshooting | Issue | Solution | |-------|----------| | `ValueError: OPENAI_API_KEY required` | Set `OPENAI_API_KEY` in `.env` and call `config.validate()` | | Gradio not starting | Verify port 7860 is free; check `DEBUG=true` in `.env` for more logs | | LLM calls timing out | Increase timeout in provider (default 30s for OpenAI, 60s for local) | | Parsing is missing fields | Check prompt format in `prompts.py` matches regex patterns in `_parse_response()` | | Mock provider not activated | Verify `.env` has `LLM_PROVIDER=mock` (default is mock if not set) |