| # Copilot Instructions for Security Incident Analyzer | |
| ## Project Overview | |
| **Security Incident Analyzer** is an LLM-powered web app for security teams to paste logs/alerts and get immediate, director-level analysis: what happened, severity, remediation steps. | |
| **Tech Stack:** Python, Gradio (UI), async/await (concurrency), pluggable LLM providers (OpenAI/local/mock) | |
| **Key Insight:** The project abstracts LLM providers to support OpenAI, local models (Ollama), and mock inference without changing business logic. | |
| ## Architecture & Component Map | |
| ``` | |
| src/ | |
| βββ app.py # Gradio interface entry point | |
| βββ analyzer/ | |
| β βββ security.py # IncidentAnalyzer: parses LLM response into structured results | |
| β βββ models.py # RiskLevel enum, SecurityAnalysis dataclass | |
| βββ llm/ | |
| β βββ provider.py # Abstract BaseLLMProvider + OpenAI/Local/Mock implementations | |
| β βββ prompts.py # get_analysis_prompt() template generation | |
| βββ utils/ | |
| βββ config.py # Config class, LLMProvider enum, global config instance | |
| βββ logger.py # setup_logger() for consistent logging | |
| ``` | |
| ### Data Flow | |
| 1. User submits log β `app.py:analyze_incident_sync()` | |
| 2. Creates provider via `create_provider()` factory based on `config.llm_provider` | |
| 3. Passes log to `IncidentAnalyzer.analyze()` β calls `provider.analyze()` with templated prompt | |
| 4. LLM returns text response β `IncidentAnalyzer._parse_response()` extracts structured data using regex | |
| 5. Returns `SecurityAnalysis` β Gradio formats for display | |
| ## Critical Conventions & Patterns | |
| ### 1. Provider Abstraction Pattern | |
| All LLM interactions go through `BaseLLMProvider` subclasses. To add a new provider: | |
| - Subclass `BaseLLMProvider` and implement `async analyze(log_text: str) -> str` | |
| - Update `create_provider()` factory in `provider.py` | |
| - Add to `LLMProvider` enum in `config.py` | |
| **Why:** Decouples business logic (incident analysis) from LLM infrastructure (API clients, models). | |
| ### 2. Configuration Management | |
| Environment variables drive runtime behavior via `src/utils/config.py`: | |
| - `LLM_PROVIDER`: Controls which provider to use | |
| - `OPENAI_API_KEY`: Required only when `LLM_PROVIDER=openai` | |
| - `LLM_MODEL`: Optional model override; falls back to sensible defaults | |
| - `DEBUG`: Enables verbose logging | |
| **Validate before use:** Call `config.validate()` in `app.py` to catch config errors early (e.g., missing API key). | |
| ### 3. Structured Output Parsing | |
| LLM responses are free-form text. `IncidentAnalyzer._parse_response()` uses regex to extract: | |
| - **Summary** β First ~200 chars | |
| - **Risk Level** β Match against `RiskLevel` enum, default to MEDIUM | |
| - **Remediation** β Multi-line instruction block | |
| - **Indicators** β Lines prefixed with `-`, `β’`, or keywords like "Indicator:" | |
| **Why regex, not JSON?** Regex is permissive and works with any LLM format. Improves reliability across models. | |
| ### 4. Async/Await for Concurrency | |
| All LLM calls are `async` (`provider.analyze()`) to avoid blocking the UI when network latency occurs. `app.py` wraps async in sync (`analyze_incident_sync()`) for Gradio compatibility. | |
| **Pattern:** Use `async def` in provider classes; call with `asyncio.run()` in sync contexts. | |
| ### 5. Logging with Context | |
| Use `setup_logger(__name__)` at module level. Include severity and context: | |
| ```python | |
| logger.info(f"Analyzing log input ({len(log_text)} chars)") | |
| logger.error(f"OpenAI API error: {e}", exc_info=True) # exc_info for tracebacks | |
| ``` | |
| ## Common Tasks & Commands | |
| ### Run Locally | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Copy environment template | |
| cp .env.example .env | |
| # Edit .env with your API key (if using OpenAI) | |
| # LLM_PROVIDER=mock runs without secrets | |
| # Start the app (defaults to http://localhost:7860) | |
| python src/app.py | |
| ``` | |
| ### Test Analysis Logic | |
| ```bash | |
| # Run all tests | |
| pytest tests/ | |
| # Test a specific module | |
| pytest tests/test_analyzer.py -v | |
| # With coverage | |
| pytest --cov=src tests/ | |
| ``` | |
| ### Switch LLM Providers | |
| Update `.env`: | |
| ```bash | |
| # Use mock (no API required, deterministic) | |
| LLM_PROVIDER=mock | |
| # Use OpenAI (requires OPENAI_API_KEY) | |
| LLM_PROVIDER=openai | |
| OPENAI_API_KEY=sk-... | |
| # Use local LLM (requires Ollama running on localhost:11434) | |
| LLM_PROVIDER=local | |
| LLM_MODEL=mistral:7b | |
| ``` | |
| ### Deploy to Hugging Face Spaces | |
| 1. Create a Hugging Face Space (CPU tier is sufficient) | |
| 2. Point Space to this repository | |
| 3. HF detects `Procfile` and `requirements.txt`, automatically launches the Gradio app | |
| ## Extension Points | |
| **Add a new LLM provider:** | |
| 1. Create subclass of `BaseLLMProvider` in `src/llm/provider.py` | |
| 2. Implement `async analyze()` method | |
| 3. Add enum value to `LLMProvider` in `src/utils/config.py` | |
| 4. Update `create_provider()` factory | |
| 5. Add integration test | |
| **Customize analysis behavior:** | |
| - Modify prompt template in `src/llm/prompts.py` (e.g., ask for different output format) | |
| - Adjust parsing regex in `IncidentAnalyzer._parse_response()` to match new prompt structure | |
| **Add new output field:** | |
| 1. Extend `SecurityAnalysis` dataclass in `src/analyzer/models.py` | |
| 2. Update prompt to include that field | |
| 3. Update parsing logic in `_parse_response()` | |
| 4. Update Gradio output format in `app.py` | |
| ## Testing Strategy | |
| - **Unit tests** for parsing logic (`test_analyzer.py`): Mock LLM responses, verify regex extraction | |
| - **Integration tests** for providers (`test_llm_providers.py`): Mock HTTP responses, test async/await | |
| - **E2E tests** for the full flow: Use `MockLLMProvider` to avoid API costs | |
| Use `MockLLMProvider` for all testsβit's deterministic and free. | |
| ## Key Files to Review First | |
| 1. **`src/app.py`** β Entry point, Gradio UI, orchestration | |
| 2. **`src/analyzer/security.py`** β Core logic: how responses are parsed | |
| 3. **`src/llm/provider.py`** β How different LLMs are called (the abstraction) | |
| 4. **`src/utils/config.py`** β Environment-driven configuration | |
| 5. **`README.md`** β High-level project summary and deployment instructions | |
| ## Known Patterns & Anti-Patterns | |
| β **Do:** | |
| - Use environment variables for all configuration | |
| - Call `config.validate()` to catch errors early | |
| - Use `async`/`await` for I/O-bound operations | |
| - Test with `MockLLMProvider` to keep tests fast and free | |
| - Log with context (include variable state, not just errors) | |
| β **Don't:** | |
| - Hardcode API keys or model names in code | |
| - Sync I/O calls that block the Gradio UI | |
| - Assume a specific LLM output formatβuse flexible regex parsing | |
| - Catch exceptions silently without logging | |
| - Mix provider-specific logic into `IncidentAnalyzer` (keep separation clean) | |
| ## Quick Troubleshooting | |
| | Issue | Solution | | |
| |-------|----------| | |
| | `ValueError: OPENAI_API_KEY required` | Set `OPENAI_API_KEY` in `.env` and call `config.validate()` | | |
| | Gradio not starting | Verify port 7860 is free; check `DEBUG=true` in `.env` for more logs | | |
| | LLM calls timing out | Increase timeout in provider (default 30s for OpenAI, 60s for local) | | |
| | Parsing is missing fields | Check prompt format in `prompts.py` matches regex patterns in `_parse_response()` | | |
| | Mock provider not activated | Verify `.env` has `LLM_PROVIDER=mock` (default is mock if not set) | | |