Spaces:

vettri06
/

meta_hackathon_finals_d3cdrs

Sleeping

App Files Files Community

meta_hackathon_finals_d3cdrs / implementation_plan.md

GOOD CAT

Deploy clean Space snapshot without binary artifacts

ccd6313 28 days ago

preview code

raw

history blame contribute delete

6.89 kB

Codebase Analysis & Ollama Qwen3.5 Fallback Integration

Codebase Analysis Summary

I've reviewed all 18 source files across the entire project. Here's a comprehensive analysis:

Issues Found

🔴 Critical Issues

1. `inference.py` — `API_BASE_URL` crashes if env var is missing (Line 23)

API_BASE_URL = os.environ["API_BASE_URL"]  # KeyError if missing!
API_KEY = os.environ["API_KEY"]            # Same problem

These use os.environ[...] which raises KeyError at import-time if the env vars aren't set. The API_KEY on line 25 has the same issue. During local development/testing, this crashes immediately.

Fix: Change to os.getenv() with sensible defaults, and add the Ollama fallback here.

2. `server/app.py` — `NetworkStatsResponse` model mismatch (Lines 205-207)

The endpoint GET /stats returns NetworkStatsResponse(**env.get_network_stats()) but get_network_stats() now returns extra fields (false_flag_accuracy, stealth_detection_rate, burst_ticks, false_flags_seen, stealth_attacks_seen, config_params) that are not defined in the NetworkStatsResponse Pydantic model. This will cause a validation error or silently drop fields depending on Pydantic config.

3. `server/firewall_environment.py` — INSPECT action bug (Lines 410-414)

if inspected and session_id not in self.inspected_sessions:
    metadata["revealed"] = True
    self.inspected_sessions[session_id] = session
    self.pending_sessions[session_id] = session  # ← BUG

After popping from pending_sessions on line 400, the session is re-added to both inspected_sessions AND pending_sessions. This creates a duplicate reference — the session exists in both pools. When the session is later acted upon again (block after inspect), the code pops from inspected_sessions (line 397) but then also tries pending_sessions.pop() on line 441. This is not necessarily a crash, but it means:

The session count in state() double-counts inspected sessions
_rebuild_queue already deduplicates, so functional behavior is OK
But the pending_session_count metric is inflated

🟡 Moderate Issues

4. `data_loader.py` — Session TTL hardcoded vs config-driven (Lines 480-481)

ttl = 2 if malicious else 3  # Hardcoded, ignoring config!

The _build_session method hardcodes TTL values, but _spawn_sessions() in firewall_environment.py overwrites these at lines 537-540. This is not a bug (the overwrite works), but it's dead code that could confuse developers.

5. `server/app.py` — `StepResponse` model mismatch (Line 186)

env.step() returns info.score and info.passed fields, but StepResponse.info is typed as Dict[str, Any], so this works. However StepResponse.state uses StateResponse which doesn't include focus_observation as a List[float] — this could cause validation issues in edge cases.

6. Unused import in `models.py` (Line 3)

List from typing is imported but Callable is not — and Callable is imported in graders.py line 13 directly. Minor, no functional impact.

🟢 Things That Are Correct

Component	Status	Notes
Reward Engine	✅ Correct	Multi-objective weights sum to 1.0, anti-degenerate policy design is sound
Threat Engine	✅ Correct	Kill chain model, stealth blending, escalation modifiers all work correctly
Traffic Generator	✅ Correct	22-dim features, 5 benign profiles, 5×4 malicious profiles, normalization bounds
Grading System	✅ Correct	Deterministic seeding, weighted scoring, pass constraints
Heuristic Agent	✅ Correct	8-rule policy matches between `inference.py` and `heuristic_agent.py`
Task Configs	✅ Correct	Monotonic difficulty progression across all 8 new parameters
Docker Setup	✅ Correct	Proper non-root user, healthcheck, port exposure
OpenEnv YAML	✅ Correct	Matches implementation: tasks, tools, action/observation spaces
Test Suite	✅ Correct	14 tests covering generators, rewards, threats, environment, graders
Client	✅ Correct	Clean REST client matching all server endpoints

Proposed Changes

1. Ollama Qwen 3.5 Fallback in `inference.py`

The user wants the fallback to use Ollama with Qwen 3.5 (likely qwen3:0.6b or qwen2.5:3b via Ollama cloud). I'll update inference.py to:

Add Ollama as the fallback LLM provider when the primary API fails
Use qwen2.5:3b (closest to "qwin3.5" — Qwen 2.5 3B is the widely-available Ollama model)
Keep the heuristic as the final safety-net fallback

Clarification needed: "qwin3.5 cloud" — I'm interpreting this as Qwen 2.5 3B via Ollama's local server (http://localhost:11434/v1). Ollama uses OpenAI-compatible API, so we can reuse the same OpenAI client.

If you mean a different model (e.g., qwen3:0.6b, qwen2.5:7b, or a cloud-hosted Qwen endpoint), please let me know and I'll adjust.

[MODIFY] inference.py

Add OLLAMA_BASE_URL and OLLAMA_MODEL environment variables with defaults
Create a secondary OpenAI client pointing to Ollama
In get_action(), on primary API failure → try Ollama → then heuristic
Fix API_BASE_URL and API_KEY to use os.getenv() with defaults

2. Fix `NetworkStatsResponse` model mismatch

[MODIFY] models.py

Add the missing fields to NetworkStatsResponse: false_flag_accuracy, stealth_detection_rate, burst_ticks, false_flags_seen, stealth_attacks_seen, config_params

3. Fix INSPECT duplicate session bug

[MODIFY] firewall_environment.py

Remove the line that re-adds the session to pending_sessions after INSPECT (line 414). The session should only be in inspected_sessions during the inspection phase.

Open Questions

Qwen model version: I'm defaulting to qwen2.5:3b via Ollama at http://localhost:11434/v1. Should I use a different model name or a remote Ollama endpoint URL?

API_KEY fallback: Should the Ollama fallback use "ollama" as the API key (Ollama doesn't require one), or do you have a specific cloud-hosted Ollama endpoint that needs authentication?

Spaces:

vettri06
/

meta_hackathon_finals_d3cdrs

Sleeping

Codebase Analysis & Ollama Qwen3.5 Fallback Integration

Codebase Analysis Summary

Issues Found

🔴 Critical Issues

1. `inference.py` — `API_BASE_URL` crashes if env var is missing (Line 23)

2. `server/app.py` — `NetworkStatsResponse` model mismatch (Lines 205-207)

3. `server/firewall_environment.py` — INSPECT action bug (Lines 410-414)

🟡 Moderate Issues

4. `data_loader.py` — Session TTL hardcoded vs config-driven (Lines 480-481)

5. `server/app.py` — `StepResponse` model mismatch (Line 186)

6. Unused import in `models.py` (Line 3)

🟢 Things That Are Correct

Proposed Changes

1. Ollama Qwen 3.5 Fallback in `inference.py`

[MODIFY] inference.py

2. Fix `NetworkStatsResponse` model mismatch

[MODIFY] models.py

3. Fix INSPECT duplicate session bug

[MODIFY] firewall_environment.py

Open Questions

Verification Plan

Automated Tests

Manual Verification

Codebase Analysis & Ollama Qwen3.5 Fallback Integration

Codebase Analysis Summary

Issues Found

🔴 Critical Issues

1. inference.py — API_BASE_URL crashes if env var is missing (Line 23)

2. server/app.py — NetworkStatsResponse model mismatch (Lines 205-207)

3. server/firewall_environment.py — INSPECT action bug (Lines 410-414)

🟡 Moderate Issues

4. data_loader.py — Session TTL hardcoded vs config-driven (Lines 480-481)

5. server/app.py — StepResponse model mismatch (Line 186)

6. Unused import in models.py (Line 3)

🟢 Things That Are Correct

Proposed Changes

1. Ollama Qwen 3.5 Fallback in inference.py

[MODIFY] inference.py

2. Fix NetworkStatsResponse model mismatch

[MODIFY] models.py

3. Fix INSPECT duplicate session bug

[MODIFY] firewall_environment.py

Open Questions

Verification Plan

Automated Tests

Manual Verification

1. `inference.py` — `API_BASE_URL` crashes if env var is missing (Line 23)

2. `server/app.py` — `NetworkStatsResponse` model mismatch (Lines 205-207)

3. `server/firewall_environment.py` — INSPECT action bug (Lines 410-414)

4. `data_loader.py` — Session TTL hardcoded vs config-driven (Lines 480-481)

5. `server/app.py` — `StepResponse` model mismatch (Line 186)

6. Unused import in `models.py` (Line 3)

1. Ollama Qwen 3.5 Fallback in `inference.py`

2. Fix `NetworkStatsResponse` model mismatch