# Codebase Analysis & Ollama Qwen3.5 Fallback Integration

## Codebase Analysis Summary

I've reviewed all **18 source files** across the entire project. Here's a comprehensive analysis:

---

## Issues Found

### 🔴 Critical Issues

#### 1. `inference.py` — `API_BASE_URL` crashes if env var is missing (Line 23)
```python
API_BASE_URL = os.environ["API_BASE_URL"]  # KeyError if missing!
API_KEY = os.environ["API_KEY"]            # Same problem
```
These use `os.environ[...]` which raises `KeyError` at import-time if the env vars aren't set. The `API_KEY` on line 25 has the same issue. During local development/testing, this crashes immediately.

> [!IMPORTANT]
> **Fix:** Change to `os.getenv()` with sensible defaults, and add the Ollama fallback here.

#### 2. `server/app.py` — `NetworkStatsResponse` model mismatch (Lines 205-207)
The endpoint `GET /stats` returns `NetworkStatsResponse(**env.get_network_stats())` but `get_network_stats()` now returns **extra fields** (`false_flag_accuracy`, `stealth_detection_rate`, `burst_ticks`, `false_flags_seen`, `stealth_attacks_seen`, `config_params`) that **are not defined** in the `NetworkStatsResponse` Pydantic model. This will cause a validation error or silently drop fields depending on Pydantic config.

#### 3. `server/firewall_environment.py` — INSPECT action bug (Lines 410-414)
```python
if inspected and session_id not in self.inspected_sessions:
    metadata["revealed"] = True
    self.inspected_sessions[session_id] = session
    self.pending_sessions[session_id] = session  # ← BUG
```
After popping from `pending_sessions` on line 400, the session is re-added to both `inspected_sessions` AND `pending_sessions`. This creates a **duplicate reference** — the session exists in both pools. When the session is later acted upon again (block after inspect), the code pops from `inspected_sessions` (line 397) but then also tries `pending_sessions.pop()` on line 441. This is not necessarily a crash, but it means:
- The session count in `state()` double-counts inspected sessions
- `_rebuild_queue` already deduplicates, so functional behavior is OK
- But the `pending_session_count` metric is inflated

---

### 🟡 Moderate Issues

#### 4. `data_loader.py` — Session TTL hardcoded vs config-driven (Lines 480-481)
```python
ttl = 2 if malicious else 3  # Hardcoded, ignoring config!
```
The `_build_session` method hardcodes TTL values, but `_spawn_sessions()` in `firewall_environment.py` **overwrites** these at lines 537-540. This is **not a bug** (the overwrite works), but it's dead code that could confuse developers.

#### 5. `server/app.py` — `StepResponse` model mismatch (Line 186)
`env.step()` returns `info.score` and `info.passed` fields, but `StepResponse.info` is typed as `Dict[str, Any]`, so this works. However `StepResponse.state` uses `StateResponse` which doesn't include `focus_observation` as a `List[float]` — this could cause validation issues in edge cases.

#### 6. Unused import in `models.py` (Line 3)
`List` from typing is imported but `Callable` is not — and `Callable` is imported in `graders.py` line 13 directly. Minor, no functional impact.

---

### 🟢 Things That Are Correct

| Component | Status | Notes |
|-----------|--------|-------|
| **Reward Engine** | ✅ Correct | Multi-objective weights sum to 1.0, anti-degenerate policy design is sound |
| **Threat Engine** | ✅ Correct | Kill chain model, stealth blending, escalation modifiers all work correctly |
| **Traffic Generator** | ✅ Correct | 22-dim features, 5 benign profiles, 5×4 malicious profiles, normalization bounds |
| **Grading System** | ✅ Correct | Deterministic seeding, weighted scoring, pass constraints |
| **Heuristic Agent** | ✅ Correct | 8-rule policy matches between `inference.py` and `heuristic_agent.py` |
| **Task Configs** | ✅ Correct | Monotonic difficulty progression across all 8 new parameters |
| **Docker Setup** | ✅ Correct | Proper non-root user, healthcheck, port exposure |
| **OpenEnv YAML** | ✅ Correct | Matches implementation: tasks, tools, action/observation spaces |
| **Test Suite** | ✅ Correct | 14 tests covering generators, rewards, threats, environment, graders |
| **Client** | ✅ Correct | Clean REST client matching all server endpoints |

---

## Proposed Changes

### 1. Ollama Qwen 3.5 Fallback in `inference.py`

The user wants the fallback to use **Ollama with Qwen 3.5** (likely `qwen3:0.6b` or `qwen2.5:3b` via Ollama cloud). I'll update `inference.py` to:

- Add Ollama as the fallback LLM provider when the primary API fails
- Use `qwen2.5:3b` (closest to "qwin3.5" — Qwen 2.5 3B is the widely-available Ollama model)
- Keep the heuristic as the final safety-net fallback

> [!IMPORTANT]
> **Clarification needed:** "qwin3.5 cloud" — I'm interpreting this as **Qwen 2.5 3B** via Ollama's local server (`http://localhost:11434/v1`). Ollama uses OpenAI-compatible API, so we can reuse the same `OpenAI` client.
> 
> If you mean a different model (e.g., `qwen3:0.6b`, `qwen2.5:7b`, or a cloud-hosted Qwen endpoint), please let me know and I'll adjust.

#### [MODIFY] [inference.py](file:///c:/Users/LE/OneDrive/Documents/GitHub/meta_ai_hackathon/inference.py)

- Add `OLLAMA_BASE_URL` and `OLLAMA_MODEL` environment variables with defaults
- Create a secondary `OpenAI` client pointing to Ollama
- In `get_action()`, on primary API failure → try Ollama → then heuristic
- Fix `API_BASE_URL` and `API_KEY` to use `os.getenv()` with defaults

---

### 2. Fix `NetworkStatsResponse` model mismatch

#### [MODIFY] [models.py](file:///c:/Users/LE/OneDrive/Documents/GitHub/meta_ai_hackathon/models.py)

- Add the missing fields to `NetworkStatsResponse`: `false_flag_accuracy`, `stealth_detection_rate`, `burst_ticks`, `false_flags_seen`, `stealth_attacks_seen`, `config_params`

---

### 3. Fix INSPECT duplicate session bug

#### [MODIFY] [firewall_environment.py](file:///c:/Users/LE/OneDrive/Documents/GitHub/meta_ai_hackathon/server/firewall_environment.py)

- Remove the line that re-adds the session to `pending_sessions` after INSPECT (line 414). The session should only be in `inspected_sessions` during the inspection phase.

---

## Open Questions

> [!IMPORTANT]
> 1. **Qwen model version**: I'm defaulting to `qwen2.5:3b` via Ollama at `http://localhost:11434/v1`. Should I use a different model name or a remote Ollama endpoint URL?
> 2. **API_KEY fallback**: Should the Ollama fallback use `"ollama"` as the API key (Ollama doesn't require one), or do you have a specific cloud-hosted Ollama endpoint that needs authentication?

---

## Verification Plan

### Automated Tests
- Run `pytest tests/` to verify no regressions
- Run `python scripts/check_accuracy.py` to validate all parameter checks pass

### Manual Verification
- Test inference with Ollama running locally to verify the fallback chain works