Instructions to use patlegu/opnsense-agent-phi35 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use patlegu/opnsense-agent-phi35 with PEFT:
```
Task type is invalid.
```

How to use patlegu/opnsense-agent-phi35 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="patlegu/opnsense-agent-phi35",
	filename="opnsense-agent-phi35-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use patlegu/opnsense-agent-phi35 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M

Use Docker

docker model run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M

LM Studio
Jan

vLLM

How to use patlegu/opnsense-agent-phi35 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "patlegu/opnsense-agent-phi35"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "patlegu/opnsense-agent-phi35",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M

Ollama
How to use patlegu/opnsense-agent-phi35 with Ollama:
```
ollama run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M
```

Unsloth Studio new

How to use patlegu/opnsense-agent-phi35 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for patlegu/opnsense-agent-phi35 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for patlegu/opnsense-agent-phi35 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for patlegu/opnsense-agent-phi35 to start chatting

Docker Model Runner
How to use patlegu/opnsense-agent-phi35 with Docker Model Runner:
```
docker model run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M
```

Lemonade

How to use patlegu/opnsense-agent-phi35 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull patlegu/opnsense-agent-phi35:Q4_K_M

Run and chat with the model

lemonade run user.opnsense-agent-phi35-Q4_K_M

List all available models

lemonade list

opnsense-agent-phi35 — a LoRA that gives Phi-3 mini an OPNsense brain

"Did my firewall just get AI?"

Sort of, yes — but not the way you might think. This is a 3.8 B parameter LoRA adapter on top of unsloth/Phi-3-mini-4k-instruct trained to emit structured JSON tool calls for the OPNsense REST API (firewall rules, NAT, WireGuard, Suricata, Unbound, IPsec, OpenVPN, traffic shaping, ACME, cron, monit, diagnostics…).

It does not chat about firewalls — it acts on them. You feed it an admin intent in natural language, it picks the right OPNsense endpoint and produces a well-formed argument blob. A surrounding agent (provided in the training repo) then executes the call with the usual safety rails (scope confirmation, read/write separation, audit log).

⚠️ Note on naming: the repository name keeps the _phi35 suffix for historical reasons, but the actual base model is Phi-3 mini 4k, not Phi-3.5. A Phi-3.5 variant is in the backlog.

TL;DR

Property	Value
Base model	`unsloth/Phi-3-mini-4k-instruct`
Adapter type	LoRA (PEFT 0.18+)
Rank / alpha	`r = 64`, `lora_alpha = 128`
Target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
Training samples	13 701 SFT (combined dataset)
Final eval loss	0.29 (run v7)
Verification	102 / 102 OPNsense functions on CAP v1 packets (see Team play)
Context length	4 096
License	MIT (matches Phi-3)

What it can do

The training set covers 102 distinct OPNsense API functions across all the moving parts of a typical firewall deployment:

Filter rules — list / create / toggle / delete pf rules
NAT — port-forward, outbound NAT, 1:1
WireGuard — instances, peers, key rotation
OpenVPN — server/client, certificates
IPsec — phase1/phase2, mobile clients
Routing — static routes, default gateway
Traffic shaper — pipes, queues, rules (QoS)
Unbound DNS — overrides, blocklists, restart
Suricata — toggle rules, reload, alert tail (IDS/IPS)
ACME — Let's Encrypt certificates, renewal
Monit — service status, restart
Cron — list / schedule / toggle jobs
Diagnostics — interfaces, ARP, sockets, gateway status

Out of training set ⇒ either polite refusal or hallucinated arguments. Always wrap with a tool whitelist on the client side.

Deployment topology

The natural question is "can I just run this inside my OPNsense box?" You technically can — FreeBSD ports include misc/llama-cpp and a Q4_K_M quant of Phi-3 mini fits in ~2.5 GB of RAM. We strongly recommend you don't. Here's why, and what to do instead.

Option A — embedded (NOT recommended)

┌──────────────────── OPNsense (FreeBSD) ────────────────────┐
│  pf rules · NAT · WireGuard · Suricata · …                 │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ llama-server (Phi-3 + LoRA)  ← CPU contention        │  │
│  │ python agent harness         ← extra attack surface  │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────┘

Problems: every inference (10–30 s on CPU) steals cycles from the packet-forwarding path, you've widened the attack surface of an edge router, and you're now responsible for a hand-built llama.cpp on each OPNsense upgrade.

Option B — sidecar VM (recommended)

┌── debian-llm VM ──┐         REST/HTTPS         ┌── OPNsense ──┐
│ llama-server      │ ─────  + API key   ─────►  │ pf · NAT · … │
│ + Phi-3 + LoRA    │ ◄── 200/4xx/5xx/JSON ───── │ /api/...     │
│ + agent harness   │                            │              │
└───────────────────┘                            └──────────────┘
        │
        ▼
   admin chat / scripts

OPNsense stays minimal and audited. The LLM runs on a Debian/Ubuntu VM with whatever resources you can afford to give it (CPU-only is fine; GPU optional). The agent talks to OPNsense exclusively via its authenticated REST API, with a scope_confirmed guard before any mutating call.

This is the topology used by the reference opnsense-wg-agent.py which drives this LoRA in production.

Team play — part of an agentic SOC

This LoRA was not trained in isolation. It is one of three specialist agents inside an agentic SOC built around a coordinator-pilot pattern. A natural-language request like "block all chinese IPs that scanned port 22 in the last hour" gets:

Parsed by the coordinator's pilot agent (a larger reasoning LLM, e.g. Qwen 2.5 7B-Instruct).
Decomposed into a plan of one or more CAP v1 packets (Coordinator-Agent Packet — a typed JSON envelope with a directive, named entities, and arguments).
Dispatched to the right specialist:
- OPNsense agent (this LoRA) — firewall rules, NAT, VPN…
- WireGuard agent — peer onboarding, key rotation
- CrowdSec agent — decision lists, bouncers, scenarios
Synthesised back into a single human-readable report.

                    natural-language admin request
                                    │
                                    ▼
                ┌───────────────────────────────────────┐
                │  Coordinator / Pilot (Qwen 2.5 7B)    │
                │  plan ▸ execute ▸ synthesise          │
                └───────────────┬───────────────────────┘
                                │  CAP v1 JSON
                ┌───────────────┼───────────────┐
                ▼               ▼               ▼
        ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
        │  OPNsense    │ │  WireGuard   │ │  CrowdSec    │
        │  LoRA (this) │ │  LoRA        │ │  LoRA        │
        │  Phi-3 mini  │ │  Phi-3 mini  │ │  Phi-3 mini  │
        └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
               │                │                │
               ▼                ▼                ▼
        OPNsense REST    /etc/wireguard       CrowdSec LAPI

The CAP v1 packet

Each specialist agent consumes a CAP v1 packet — a normalised envelope produced by the coordinator. Example:

{
  "directive": "block_ip",
  "entities": {
    "IP_ADDRESS":  ["203.0.113.42"],
    "INTERFACE":   ["wan"],
    "PORT_NUMBER": [],
    "HOSTNAME":    [],
    "IP_SUBNET":   []
  },
  "args":    {"action": "block"},
  "context": {"source": "coordinator", "run_id": "plan-abc-1234", "confidence": 0.97}
}

The OPNsense LoRA's job is then to map directive → the right OPNsense tool call, with entities and args projected into the call's parameters. This is the format that production traffic actually uses — much narrower than free-form NL prompting, and the model is trained on both representations (~40 % CAP v1 / ~60 % NL → tool-call) in the combined dataset.

Integration verification

scripts/verify_opnsense_v2.py exercises CAP v1 → tool_call on the full 102-function surface (one CAP packet per function, with realistic entity payloads):

Run	Loss	CAP v1 verify	Coverage
v3 (Mar 1)	0.32	—	92 functions
v5 (Mar 4)	0.30	≥ 99 %	99 functions
v6 (Mar 5)	0.28	100 % (69/69 on CAP v2 sample)	99 functions
v7 (Mar 7)	0.2876	100 % (102/102)	102 functions

The lesson from the v1 → v7 cycles is in the training journal: the right verification target is the production format (CAP v1), not the free-form NL chat format. Once that was understood (around v5), targeted dataset augmentation became surgical instead of guesswork.

What "team play" means in practice

The OPNsense agent can ask the coordinator for clarification when an entity is ambiguous (e.g. INTERFACE: ["wan", "opt1"]).
The coordinator can chain CAP packets: "block IP X" may produce (1) CrowdSec.add_decision + (2) OPNsense.create_filter_rule to enforce the same intent at two layers.
Each agent runs on its own port (3000/3001/3002), so the failure of one specialist does not poison the others — the coordinator marks the failed step and continues the plan.

The full coordinator + agents architecture is documented in the cyber-agent-engine repository (see coordinator/README.md and AGENTS.md).

Quickstart — running it on a Debian VM

1. Build `llama.cpp` from source (b3813 or later)

Older builds break LoRA loading (specifically, b1-9c69907 silently ignores --lora-init-without-apply, which causes the adapter to be merged at the wrong moment and produces garbage for OPNsense tools).

sudo apt install -y build-essential cmake git git-lfs
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && git checkout b3813
cmake -B build -DGGML_CUDA=OFF       # add =ON if you have a GPU
cmake --build build -j"$(nproc)"

2. Get the base model as GGUF

mkdir -p ~/models
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf \
    Phi-3-mini-4k-instruct-q4.gguf \
    --local-dir ~/models/phi-3-mini

(Or any other Q4 / Q5 / Q8 quant of Phi-3-mini-4k-instruct you trust.)

3. Clone this LoRA

git lfs install
git clone https://huggingface.co/patlegu/opnsense-agent-phi35 \
    ~/loras/opnsense-agent-phi35

4. Convert the LoRA to GGUF (one-time)

llama-server wants GGUF, not raw safetensors:

cd ~/llama.cpp
python convert_lora_to_gguf.py \
    --base ~/models/phi-3-mini \
    ~/loras/opnsense-agent-phi35
# produces opnsense-agent-phi35-F16-LoRA.gguf next to the adapter

5. Run `llama-server` with the LoRA applied at load time

~/llama.cpp/build/bin/llama-server \
    -m ~/models/phi-3-mini/Phi-3-mini-4k-instruct-q4.gguf \
    --lora ~/loras/opnsense-agent-phi35/opnsense-agent-phi35-F16-LoRA.gguf \
    --host 0.0.0.0 --port 8080 \
    --ctx-size 4096 \
    -t "$(nproc)"

Add -ngl 33 if compiled with CUDA — offloads all 32 layers + output to the GPU and you'll get ~80 tok/s on a 24 GB card vs ~7 tok/s on CPU.

6. Smoke test

curl -s http://localhost:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "phi-3-mini",
      "messages": [
        {"role": "system", "content": "You are an OPNsense agent."},
        {"role": "user",   "content": "List all scheduled cron jobs."}
      ],
      "tools": [{
        "type": "function",
        "function": {
          "name": "get_cron_jobs",
          "description": "Get list of all scheduled Cron jobs",
          "parameters": {"type": "object", "properties": {}, "required": []}
        }
      }],
      "tool_choice": "auto"
    }' | jq '.choices[0].message.tool_calls'

Expected output: a single tool_calls entry pointing at get_cron_jobs with arguments: "{}".

Tool-calling format

The LoRA was trained on OpenAI-style messages with tools and tool_calls. A typical training example looks like:

{
  "messages": [
    {"role": "user", "content": "Retrieve all scheduled Cron jobs."},
    {"role": "assistant", "content": null, "tool_calls": [{
      "id": "call_g2zmJfQcE",
      "type": "function",
      "function": {"name": "get_cron_jobs", "arguments": "{}"}
    }]},
    {"role": "tool", "tool_call_id": "call_g2zmJfQcE",
     "content": "{\"status\":\"success\",\"details\":\"…\"}"},
    {"role": "assistant", "content": "The following cron jobs are…"}
  ],
  "tools": [{
    "type": "function",
    "function": {"name": "get_cron_jobs",
                 "description": "Get list of all scheduled Cron jobs",
                 "parameters": {"type":"object","properties":{},"required":[]}}
  }]
}

A minimal agent loop:

You: build the tools list (whitelist of safe OPNsense functions).
Model: returns a tool_calls[] block.
You: dispatch the call to your OPNsense API client, capture the result.
You: append a role: "tool" message with the result.
Re-prompt → the model summarises the result in natural language.

A reference Python implementation lives in agents/opnsense/ of the training repository.

Limitations & safety

Tool selection only — the model does not execute anything on the firewall. Make sure your agent loop applies a scope_confirmed: bool guard before any mutating call (rule create/delete, service restart, etc.).
Argument hallucination — for tools outside the training set, the model will happily make up plausible-looking parameters. Whitelist your tools and validate every argument server-side.
Not an autonomous decision-maker — treat it as an admin shortcut ("which API do I call to do X?"), not as a replacement for a human reviewing the change before it lands.
English-first, French-aware — training set is mostly English with some French. Other languages will degrade fast.
llama.cpp version pin — LoRA loading is broken in some older llama.cpp builds. Use b3813 or later.
License of the base model still applies. Phi-3 mini is MIT, but you're responsible for downstream compliance as defined in Microsoft's model card.

Training details

Setting	Value
Base	`unsloth/phi-3-mini-4k-instruct-bnb-4bit` (4-bit base + LoRA)
Epochs	3
Batch	2 × 8 grad accum (= 16 effective)
Learning rate	2e-4, cosine, 10 % warmup
Sequence length	4 096
Optimizer	`adamw_8bit` (bf16)
Trainer	Unsloth + TRL SFT
Dataset	13 701 SFT examples (combined: base + IDS + traffic-shaping + ACME + IPsec/OpenVPN)
Final eval loss	0.2876
Verification	102 / 102 functions answered correctly on `verify_opnsense_v2.py`

The training script and dataset generation pipeline live in the cyber-agent-engine repository: scripts/train_opnsense_lora.py and data/sft/opnsense_combined_train.jsonl.

Acknowledgements

Microsoft Phi-3 — base model
Unsloth — fast LoRA + 4-bit base
OPNsense — the firewall this is built around
llama.cpp — inference runtime

Citation

If you use this adapter in research or production, please cite the parent project:

@software{opnsense_agent_phi35_2026,
  author  = {Le Guyader, P. and contributors},
  title   = {opnsense-agent-phi35: a Phi-3 LoRA for OPNsense tool-calling},
  year    = {2026},
  url     = {https://huggingface.co/patlegu/opnsense-agent-phi35}
}

Downloads last month: 30

GGUF

Model size

4B params

Architecture

phi3

Hardware compatibility

4-bit

Model tree for patlegu/opnsense-agent-phi35

Base model

unsloth/Phi-3-mini-4k-instruct

Adapter

(325)

this model