# Copilot Instructions for Security Incident Analyzer

## Project Overview

**Security Incident Analyzer** is an LLM-powered web app for security teams to paste logs/alerts and get immediate, director-level analysis: what happened, severity, remediation steps.

**Tech Stack:** Python, Gradio (UI), async/await (concurrency), pluggable LLM providers (OpenAI/local/mock)

**Key Insight:** The project abstracts LLM providers to support OpenAI, local models (Ollama), and mock inference without changing business logic.

## Architecture & Component Map

```
src/
├── app.py                    # Gradio interface entry point
├── analyzer/
│   ├── security.py          # IncidentAnalyzer: parses LLM response into structured results
│   └── models.py            # RiskLevel enum, SecurityAnalysis dataclass
├── llm/
│   ├── provider.py          # Abstract BaseLLMProvider + OpenAI/Local/Mock implementations
│   └── prompts.py           # get_analysis_prompt() template generation
└── utils/
    ├── config.py            # Config class, LLMProvider enum, global config instance
    └── logger.py            # setup_logger() for consistent logging
```

### Data Flow

1. User submits log → `app.py:analyze_incident_sync()`
2. Creates provider via `create_provider()` factory based on `config.llm_provider`
3. Passes log to `IncidentAnalyzer.analyze()` → calls `provider.analyze()` with templated prompt
4. LLM returns text response → `IncidentAnalyzer._parse_response()` extracts structured data using regex
5. Returns `SecurityAnalysis` → Gradio formats for display

## Critical Conventions & Patterns

### 1. Provider Abstraction Pattern
All LLM interactions go through `BaseLLMProvider` subclasses. To add a new provider:
- Subclass `BaseLLMProvider` and implement `async analyze(log_text: str) -> str`
- Update `create_provider()` factory in `provider.py`
- Add to `LLMProvider` enum in `config.py`

**Why:** Decouples business logic (incident analysis) from LLM infrastructure (API clients, models).

### 2. Configuration Management
Environment variables drive runtime behavior via `src/utils/config.py`:
- `LLM_PROVIDER`: Controls which provider to use
- `OPENAI_API_KEY`: Required only when `LLM_PROVIDER=openai`
- `LLM_MODEL`: Optional model override; falls back to sensible defaults
- `DEBUG`: Enables verbose logging

**Validate before use:** Call `config.validate()` in `app.py` to catch config errors early (e.g., missing API key).

### 3. Structured Output Parsing
LLM responses are free-form text. `IncidentAnalyzer._parse_response()` uses regex to extract:
- **Summary** → First ~200 chars
- **Risk Level** → Match against `RiskLevel` enum, default to MEDIUM
- **Remediation** → Multi-line instruction block
- **Indicators** → Lines prefixed with `-`, `•`, or keywords like "Indicator:"

**Why regex, not JSON?** Regex is permissive and works with any LLM format. Improves reliability across models.

### 4. Async/Await for Concurrency
All LLM calls are `async` (`provider.analyze()`) to avoid blocking the UI when network latency occurs. `app.py` wraps async in sync (`analyze_incident_sync()`) for Gradio compatibility.

**Pattern:** Use `async def` in provider classes; call with `asyncio.run()` in sync contexts.

### 5. Logging with Context
Use `setup_logger(__name__)` at module level. Include severity and context:
```python
logger.info(f"Analyzing log input ({len(log_text)} chars)")
logger.error(f"OpenAI API error: {e}", exc_info=True)  # exc_info for tracebacks
```

## Common Tasks & Commands

### Run Locally
```bash
# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env

# Edit .env with your API key (if using OpenAI)
# LLM_PROVIDER=mock runs without secrets

# Start the app (defaults to http://localhost:7860)
python src/app.py
```

### Test Analysis Logic
```bash
# Run all tests
pytest tests/

# Test a specific module
pytest tests/test_analyzer.py -v

# With coverage
pytest --cov=src tests/
```

### Switch LLM Providers
Update `.env`:
```bash
# Use mock (no API required, deterministic)
LLM_PROVIDER=mock

# Use OpenAI (requires OPENAI_API_KEY)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...

# Use local LLM (requires Ollama running on localhost:11434)
LLM_PROVIDER=local
LLM_MODEL=mistral:7b
```

### Deploy to Hugging Face Spaces
1. Create a Hugging Face Space (CPU tier is sufficient)
2. Point Space to this repository
3. HF detects `Procfile` and `requirements.txt`, automatically launches the Gradio app

## Extension Points

**Add a new LLM provider:**
1. Create subclass of `BaseLLMProvider` in `src/llm/provider.py`
2. Implement `async analyze()` method
3. Add enum value to `LLMProvider` in `src/utils/config.py`
4. Update `create_provider()` factory
5. Add integration test

**Customize analysis behavior:**
- Modify prompt template in `src/llm/prompts.py` (e.g., ask for different output format)
- Adjust parsing regex in `IncidentAnalyzer._parse_response()` to match new prompt structure

**Add new output field:**
1. Extend `SecurityAnalysis` dataclass in `src/analyzer/models.py`
2. Update prompt to include that field
3. Update parsing logic in `_parse_response()`
4. Update Gradio output format in `app.py`

## Testing Strategy

- **Unit tests** for parsing logic (`test_analyzer.py`): Mock LLM responses, verify regex extraction
- **Integration tests** for providers (`test_llm_providers.py`): Mock HTTP responses, test async/await
- **E2E tests** for the full flow: Use `MockLLMProvider` to avoid API costs

Use `MockLLMProvider` for all tests—it's deterministic and free.

## Key Files to Review First

1. **`src/app.py`** — Entry point, Gradio UI, orchestration
2. **`src/analyzer/security.py`** — Core logic: how responses are parsed
3. **`src/llm/provider.py`** — How different LLMs are called (the abstraction)
4. **`src/utils/config.py`** — Environment-driven configuration
5. **`README.md`** — High-level project summary and deployment instructions

## Known Patterns & Anti-Patterns

✅ **Do:**
- Use environment variables for all configuration
- Call `config.validate()` to catch errors early
- Use `async`/`await` for I/O-bound operations
- Test with `MockLLMProvider` to keep tests fast and free
- Log with context (include variable state, not just errors)

❌ **Don't:**
- Hardcode API keys or model names in code
- Sync I/O calls that block the Gradio UI
- Assume a specific LLM output format—use flexible regex parsing
- Catch exceptions silently without logging
- Mix provider-specific logic into `IncidentAnalyzer` (keep separation clean)

## Quick Troubleshooting

| Issue | Solution |
|-------|----------|
| `ValueError: OPENAI_API_KEY required` | Set `OPENAI_API_KEY` in `.env` and call `config.validate()` |
| Gradio not starting | Verify port 7860 is free; check `DEBUG=true` in `.env` for more logs |
| LLM calls timing out | Increase timeout in provider (default 30s for OpenAI, 60s for local) |
| Parsing is missing fields | Check prompt format in `prompts.py` matches regex patterns in `_parse_response()` |
| Mock provider not activated | Verify `.env` has `LLM_PROVIDER=mock` (default is mock if not set) |