| # Codebase Analysis & Ollama Qwen3.5 Fallback Integration |
|
|
| ## Codebase Analysis Summary |
|
|
| I've reviewed all **18 source files** across the entire project. Here's a comprehensive analysis: |
|
|
| --- |
|
|
| ## Issues Found |
|
|
| ### π΄ Critical Issues |
|
|
| #### 1. `inference.py` β `API_BASE_URL` crashes if env var is missing (Line 23) |
| ```python |
| API_BASE_URL = os.environ["API_BASE_URL"] # KeyError if missing! |
| API_KEY = os.environ["API_KEY"] # Same problem |
| ``` |
| These use `os.environ[...]` which raises `KeyError` at import-time if the env vars aren't set. The `API_KEY` on line 25 has the same issue. During local development/testing, this crashes immediately. |
|
|
| > [!IMPORTANT] |
| > **Fix:** Change to `os.getenv()` with sensible defaults, and add the Ollama fallback here. |
|
|
| #### 2. `server/app.py` β `NetworkStatsResponse` model mismatch (Lines 205-207) |
| The endpoint `GET /stats` returns `NetworkStatsResponse(**env.get_network_stats())` but `get_network_stats()` now returns **extra fields** (`false_flag_accuracy`, `stealth_detection_rate`, `burst_ticks`, `false_flags_seen`, `stealth_attacks_seen`, `config_params`) that **are not defined** in the `NetworkStatsResponse` Pydantic model. This will cause a validation error or silently drop fields depending on Pydantic config. |
|
|
| #### 3. `server/firewall_environment.py` β INSPECT action bug (Lines 410-414) |
| ```python |
| if inspected and session_id not in self.inspected_sessions: |
| metadata["revealed"] = True |
| self.inspected_sessions[session_id] = session |
| self.pending_sessions[session_id] = session # β BUG |
| ``` |
| After popping from `pending_sessions` on line 400, the session is re-added to both `inspected_sessions` AND `pending_sessions`. This creates a **duplicate reference** β the session exists in both pools. When the session is later acted upon again (block after inspect), the code pops from `inspected_sessions` (line 397) but then also tries `pending_sessions.pop()` on line 441. This is not necessarily a crash, but it means: |
| - The session count in `state()` double-counts inspected sessions |
| - `_rebuild_queue` already deduplicates, so functional behavior is OK |
| - But the `pending_session_count` metric is inflated |
|
|
| --- |
|
|
| ### π‘ Moderate Issues |
|
|
| #### 4. `data_loader.py` β Session TTL hardcoded vs config-driven (Lines 480-481) |
| ```python |
| ttl = 2 if malicious else 3 # Hardcoded, ignoring config! |
| ``` |
| The `_build_session` method hardcodes TTL values, but `_spawn_sessions()` in `firewall_environment.py` **overwrites** these at lines 537-540. This is **not a bug** (the overwrite works), but it's dead code that could confuse developers. |
|
|
| #### 5. `server/app.py` β `StepResponse` model mismatch (Line 186) |
| `env.step()` returns `info.score` and `info.passed` fields, but `StepResponse.info` is typed as `Dict[str, Any]`, so this works. However `StepResponse.state` uses `StateResponse` which doesn't include `focus_observation` as a `List[float]` β this could cause validation issues in edge cases. |
|
|
| #### 6. Unused import in `models.py` (Line 3) |
| `List` from typing is imported but `Callable` is not β and `Callable` is imported in `graders.py` line 13 directly. Minor, no functional impact. |
|
|
| --- |
|
|
| ### π’ Things That Are Correct |
|
|
| | Component | Status | Notes | |
| |-----------|--------|-------| |
| | **Reward Engine** | β
Correct | Multi-objective weights sum to 1.0, anti-degenerate policy design is sound | |
| | **Threat Engine** | β
Correct | Kill chain model, stealth blending, escalation modifiers all work correctly | |
| | **Traffic Generator** | β
Correct | 22-dim features, 5 benign profiles, 5Γ4 malicious profiles, normalization bounds | |
| | **Grading System** | β
Correct | Deterministic seeding, weighted scoring, pass constraints | |
| | **Heuristic Agent** | β
Correct | 8-rule policy matches between `inference.py` and `heuristic_agent.py` | |
| | **Task Configs** | β
Correct | Monotonic difficulty progression across all 8 new parameters | |
| | **Docker Setup** | β
Correct | Proper non-root user, healthcheck, port exposure | |
| | **OpenEnv YAML** | β
Correct | Matches implementation: tasks, tools, action/observation spaces | |
| | **Test Suite** | β
Correct | 14 tests covering generators, rewards, threats, environment, graders | |
| | **Client** | β
Correct | Clean REST client matching all server endpoints | |
|
|
| --- |
|
|
| ## Proposed Changes |
|
|
| ### 1. Ollama Qwen 3.5 Fallback in `inference.py` |
|
|
| The user wants the fallback to use **Ollama with Qwen 3.5** (likely `qwen3:0.6b` or `qwen2.5:3b` via Ollama cloud). I'll update `inference.py` to: |
|
|
| - Add Ollama as the fallback LLM provider when the primary API fails |
| - Use `qwen2.5:3b` (closest to "qwin3.5" β Qwen 2.5 3B is the widely-available Ollama model) |
| - Keep the heuristic as the final safety-net fallback |
|
|
| > [!IMPORTANT] |
| > **Clarification needed:** "qwin3.5 cloud" β I'm interpreting this as **Qwen 2.5 3B** via Ollama's local server (`http://localhost:11434/v1`). Ollama uses OpenAI-compatible API, so we can reuse the same `OpenAI` client. |
| > |
| > If you mean a different model (e.g., `qwen3:0.6b`, `qwen2.5:7b`, or a cloud-hosted Qwen endpoint), please let me know and I'll adjust. |
|
|
| #### [MODIFY] [inference.py](file:///c:/Users/LE/OneDrive/Documents/GitHub/meta_ai_hackathon/inference.py) |
|
|
| - Add `OLLAMA_BASE_URL` and `OLLAMA_MODEL` environment variables with defaults |
| - Create a secondary `OpenAI` client pointing to Ollama |
| - In `get_action()`, on primary API failure β try Ollama β then heuristic |
| - Fix `API_BASE_URL` and `API_KEY` to use `os.getenv()` with defaults |
|
|
| --- |
|
|
| ### 2. Fix `NetworkStatsResponse` model mismatch |
|
|
| #### [MODIFY] [models.py](file:///c:/Users/LE/OneDrive/Documents/GitHub/meta_ai_hackathon/models.py) |
|
|
| - Add the missing fields to `NetworkStatsResponse`: `false_flag_accuracy`, `stealth_detection_rate`, `burst_ticks`, `false_flags_seen`, `stealth_attacks_seen`, `config_params` |
|
|
| --- |
|
|
| ### 3. Fix INSPECT duplicate session bug |
|
|
| #### [MODIFY] [firewall_environment.py](file:///c:/Users/LE/OneDrive/Documents/GitHub/meta_ai_hackathon/server/firewall_environment.py) |
|
|
| - Remove the line that re-adds the session to `pending_sessions` after INSPECT (line 414). The session should only be in `inspected_sessions` during the inspection phase. |
|
|
| --- |
|
|
| ## Open Questions |
|
|
| > [!IMPORTANT] |
| > 1. **Qwen model version**: I'm defaulting to `qwen2.5:3b` via Ollama at `http://localhost:11434/v1`. Should I use a different model name or a remote Ollama endpoint URL? |
| > 2. **API_KEY fallback**: Should the Ollama fallback use `"ollama"` as the API key (Ollama doesn't require one), or do you have a specific cloud-hosted Ollama endpoint that needs authentication? |
| |
| --- |
| |
| ## Verification Plan |
| |
| ### Automated Tests |
| - Run `pytest tests/` to verify no regressions |
| - Run `python scripts/check_accuracy.py` to validate all parameter checks pass |
| |
| ### Manual Verification |
| - Test inference with Ollama running locally to verify the fallback chain works |
| |