Spaces:

debashis2007
/

SecurityIncidentAnalyzer

Sleeping

App Files Files Community

SecurityIncidentAnalyzer / .github /copilot-instructions.md

Debashis

Initial commit: Security Incident Analyzer with LLM integration

0355450 2 months ago

preview code

raw

history blame contribute delete

7.23 kB

	# Copilot Instructions for Security Incident Analyzer

	## Project Overview

	Security Incident Analyzer is an LLM-powered web app for security teams to paste logs/alerts and get immediate, director-level analysis: what happened, severity, remediation steps.

	Tech Stack: Python, Gradio (UI), async/await (concurrency), pluggable LLM providers (OpenAI/local/mock)

	Key Insight: The project abstracts LLM providers to support OpenAI, local models (Ollama), and mock inference without changing business logic.

	## Architecture & Component Map

	```
	src/
	├── app.py # Gradio interface entry point
	├── analyzer/
	│ ├── security.py # IncidentAnalyzer: parses LLM response into structured results
	│ └── models.py # RiskLevel enum, SecurityAnalysis dataclass
	├── llm/
	│ ├── provider.py # Abstract BaseLLMProvider + OpenAI/Local/Mock implementations
	│ └── prompts.py # get_analysis_prompt() template generation
	└── utils/
	├── config.py # Config class, LLMProvider enum, global config instance
	└── logger.py # setup_logger() for consistent logging
	```

	### Data Flow

	1. User submits log → `app.py:analyze_incident_sync()`
	2. Creates provider via `create_provider()` factory based on `config.llm_provider`
	3. Passes log to `IncidentAnalyzer.analyze()` → calls `provider.analyze()` with templated prompt
	4. LLM returns text response → `IncidentAnalyzer._parse_response()` extracts structured data using regex
	5. Returns `SecurityAnalysis` → Gradio formats for display

	## Critical Conventions & Patterns

	### 1. Provider Abstraction Pattern
	All LLM interactions go through `BaseLLMProvider` subclasses. To add a new provider:
	- Subclass `BaseLLMProvider` and implement `async analyze(log_text: str) -> str`
	- Update `create_provider()` factory in `provider.py`
	- Add to `LLMProvider` enum in `config.py`

	Why: Decouples business logic (incident analysis) from LLM infrastructure (API clients, models).

	### 2. Configuration Management
	Environment variables drive runtime behavior via `src/utils/config.py`:
	- `LLM_PROVIDER`: Controls which provider to use
	- `OPENAI_API_KEY`: Required only when `LLM_PROVIDER=openai`
	- `LLM_MODEL`: Optional model override; falls back to sensible defaults
	- `DEBUG`: Enables verbose logging

	Validate before use: Call `config.validate()` in `app.py` to catch config errors early (e.g., missing API key).

	### 3. Structured Output Parsing
	LLM responses are free-form text. `IncidentAnalyzer._parse_response()` uses regex to extract:
	- Summary → First ~200 chars
	- Risk Level → Match against `RiskLevel` enum, default to MEDIUM
	- Remediation → Multi-line instruction block
	- Indicators → Lines prefixed with `-`, `•`, or keywords like "Indicator:"

	Why regex, not JSON? Regex is permissive and works with any LLM format. Improves reliability across models.

	### 4. Async/Await for Concurrency
	All LLM calls are `async` (`provider.analyze()`) to avoid blocking the UI when network latency occurs. `app.py` wraps async in sync (`analyze_incident_sync()`) for Gradio compatibility.

	Pattern: Use `async def` in provider classes; call with `asyncio.run()` in sync contexts.

	### 5. Logging with Context
	Use `setup_logger(__name__)` at module level. Include severity and context:
	```python
	logger.info(f"Analyzing log input ({len(log_text)} chars)")
	logger.error(f"OpenAI API error: {e}", exc_info=True) # exc_info for tracebacks
	```

	## Common Tasks & Commands

	### Run Locally
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Copy environment template
	cp .env.example .env

	# Edit .env with your API key (if using OpenAI)
	# LLM_PROVIDER=mock runs without secrets

	# Start the app (defaults to http://localhost:7860)
	python src/app.py
	```

	### Test Analysis Logic
	```bash
	# Run all tests
	pytest tests/

	# Test a specific module
	pytest tests/test_analyzer.py -v

	# With coverage
	pytest --cov=src tests/
	```

	### Switch LLM Providers
	Update `.env`:
	```bash
	# Use mock (no API required, deterministic)
	LLM_PROVIDER=mock

	# Use OpenAI (requires OPENAI_API_KEY)
	LLM_PROVIDER=openai
	OPENAI_API_KEY=sk-...

	# Use local LLM (requires Ollama running on localhost:11434)
	LLM_PROVIDER=local
	LLM_MODEL=mistral:7b
	```

	### Deploy to Hugging Face Spaces
	1. Create a Hugging Face Space (CPU tier is sufficient)
	2. Point Space to this repository
	3. HF detects `Procfile` and `requirements.txt`, automatically launches the Gradio app

	## Extension Points

	Add a new LLM provider:
	1. Create subclass of `BaseLLMProvider` in `src/llm/provider.py`
	2. Implement `async analyze()` method
	3. Add enum value to `LLMProvider` in `src/utils/config.py`
	4. Update `create_provider()` factory
	5. Add integration test

	Customize analysis behavior:
	- Modify prompt template in `src/llm/prompts.py` (e.g., ask for different output format)
	- Adjust parsing regex in `IncidentAnalyzer._parse_response()` to match new prompt structure

	Add new output field:
	1. Extend `SecurityAnalysis` dataclass in `src/analyzer/models.py`
	2. Update prompt to include that field
	3. Update parsing logic in `_parse_response()`
	4. Update Gradio output format in `app.py`

	## Testing Strategy

	- Unit tests for parsing logic (`test_analyzer.py`): Mock LLM responses, verify regex extraction
	- Integration tests for providers (`test_llm_providers.py`): Mock HTTP responses, test async/await
	- E2E tests for the full flow: Use `MockLLMProvider` to avoid API costs

	Use `MockLLMProvider` for all tests—it's deterministic and free.

	## Key Files to Review First

	1. `src/app.py` — Entry point, Gradio UI, orchestration
	2. `src/analyzer/security.py` — Core logic: how responses are parsed
	3. `src/llm/provider.py` — How different LLMs are called (the abstraction)
	4. `src/utils/config.py` — Environment-driven configuration
	5. `README.md` — High-level project summary and deployment instructions

	## Known Patterns & Anti-Patterns

	✅ Do:
	- Use environment variables for all configuration
	- Call `config.validate()` to catch errors early
	- Use `async`/`await` for I/O-bound operations
	- Test with `MockLLMProvider` to keep tests fast and free
	- Log with context (include variable state, not just errors)

	❌ Don't:
	- Hardcode API keys or model names in code
	- Sync I/O calls that block the Gradio UI
	- Assume a specific LLM output format—use flexible regex parsing
	- Catch exceptions silently without logging
	- Mix provider-specific logic into `IncidentAnalyzer` (keep separation clean)

	## Quick Troubleshooting

	\| Issue \| Solution \|
	\|-------\|----------\|
	\| `ValueError: OPENAI_API_KEY required` \| Set `OPENAI_API_KEY` in `.env` and call `config.validate()` \|
	\| Gradio not starting \| Verify port 7860 is free; check `DEBUG=true` in `.env` for more logs \|
	\| LLM calls timing out \| Increase timeout in provider (default 30s for OpenAI, 60s for local) \|
	\| Parsing is missing fields \| Check prompt format in `prompts.py` matches regex patterns in `_parse_response()` \|
	\| Mock provider not activated \| Verify `.env` has `LLM_PROVIDER=mock` (default is mock if not set) \|