Instructions to use patlegu/opnsense-agent-phi35 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use patlegu/opnsense-agent-phi35 with PEFT:
Task type is invalid.
- llama-cpp-python
How to use patlegu/opnsense-agent-phi35 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="patlegu/opnsense-agent-phi35", filename="opnsense-agent-phi35-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use patlegu/opnsense-agent-phi35 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M # Run inference directly in the terminal: llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M # Run inference directly in the terminal: llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M
Use Docker
docker model run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use patlegu/opnsense-agent-phi35 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "patlegu/opnsense-agent-phi35" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "patlegu/opnsense-agent-phi35", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M
- Ollama
How to use patlegu/opnsense-agent-phi35 with Ollama:
ollama run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M
- Unsloth Studio new
How to use patlegu/opnsense-agent-phi35 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for patlegu/opnsense-agent-phi35 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for patlegu/opnsense-agent-phi35 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for patlegu/opnsense-agent-phi35 to start chatting
- Docker Model Runner
How to use patlegu/opnsense-agent-phi35 with Docker Model Runner:
docker model run hf.co/patlegu/opnsense-agent-phi35:Q4_K_M
- Lemonade
How to use patlegu/opnsense-agent-phi35 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull patlegu/opnsense-agent-phi35:Q4_K_M
Run and chat with the model
lemonade run user.opnsense-agent-phi35-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M# Run inference directly in the terminal:
llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M# Run inference directly in the terminal:
./llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M# Run inference directly in the terminal:
./build/bin/llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_MUse Docker
docker model run hf.co/patlegu/opnsense-agent-phi35:Q4_K_Mopnsense-agent-phi35 β a LoRA that gives Phi-3 mini an OPNsense brain
"Did my firewall just get AI?"
Sort of, yes β but not the way you might think. This is a 3.8 B
parameter LoRA adapter on top of unsloth/Phi-3-mini-4k-instruct
trained to emit structured JSON tool calls for the
OPNsense REST API (firewall rules, NAT,
WireGuard, Suricata, Unbound, IPsec, OpenVPN, traffic shaping, ACME,
cron, monit, diagnosticsβ¦).
It does not chat about firewalls β it acts on them. You feed it an admin intent in natural language, it picks the right OPNsense endpoint and produces a well-formed argument blob. A surrounding agent (provided in the training repo) then executes the call with the usual safety rails (scope confirmation, read/write separation, audit log).
β οΈ Note on naming: the repository name keeps the
_phi35suffix for historical reasons, but the actual base model is Phi-3 mini 4k, not Phi-3.5. A Phi-3.5 variant is in the backlog.
TL;DR
| Property | Value |
|---|---|
| Base model | unsloth/Phi-3-mini-4k-instruct |
| Adapter type | LoRA (PEFT 0.18+) |
| Rank / alpha | r = 64, lora_alpha = 128 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training samples | 13 701 SFT (combined dataset) |
| Final eval loss | 0.29 (run v7) |
| Verification | 102 / 102 OPNsense functions on CAP v1 packets (see Team play) |
| Context length | 4 096 |
| License | MIT (matches Phi-3) |
What it can do
The training set covers 102 distinct OPNsense API functions across all the moving parts of a typical firewall deployment:
- Filter rules β list / create / toggle / delete pf rules
- NAT β port-forward, outbound NAT, 1:1
- WireGuard β instances, peers, key rotation
- OpenVPN β server/client, certificates
- IPsec β phase1/phase2, mobile clients
- Routing β static routes, default gateway
- Traffic shaper β pipes, queues, rules (QoS)
- Unbound DNS β overrides, blocklists, restart
- Suricata β toggle rules, reload, alert tail (IDS/IPS)
- ACME β Let's Encrypt certificates, renewal
- Monit β service status, restart
- Cron β list / schedule / toggle jobs
- Diagnostics β interfaces, ARP, sockets, gateway status
Out of training set β either polite refusal or hallucinated arguments. Always wrap with a tool whitelist on the client side.
Deployment topology
The natural question is "can I just run this inside my OPNsense
box?" You technically can β FreeBSD ports include misc/llama-cpp
and a Q4_K_M quant of Phi-3 mini fits in ~2.5 GB of RAM. We
strongly recommend you don't. Here's why, and what to do instead.
Option A β embedded (NOT recommended)
βββββββββββββββββββββ OPNsense (FreeBSD) βββββββββββββββββββββ
β pf rules Β· NAT Β· WireGuard Β· Suricata Β· β¦ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β llama-server (Phi-3 + LoRA) β CPU contention β β
β β python agent harness β extra attack surface β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Problems: every inference (10β30 s on CPU) steals cycles from the
packet-forwarding path, you've widened the attack surface of an edge
router, and you're now responsible for a hand-built llama.cpp on
each OPNsense upgrade.
Option B β sidecar VM (recommended)
βββ debian-llm VM βββ REST/HTTPS βββ OPNsense βββ
β llama-server β βββββ + API key ββββββΊ β pf Β· NAT Β· β¦ β
β + Phi-3 + LoRA β βββ 200/4xx/5xx/JSON βββββ β /api/... β
β + agent harness β β β
βββββββββββββββββββββ ββββββββββββββββ
β
βΌ
admin chat / scripts
OPNsense stays minimal and audited. The LLM runs on a Debian/Ubuntu
VM with whatever resources you can afford to give it (CPU-only is
fine; GPU optional). The agent talks to OPNsense exclusively via its
authenticated REST API, with a scope_confirmed guard before any
mutating call.
This is the topology used by the reference
opnsense-wg-agent.py
which drives this LoRA in production.
Team play β part of an agentic SOC
This LoRA was not trained in isolation. It is one of three specialist agents inside an agentic SOC built around a coordinator-pilot pattern. A natural-language request like "block all chinese IPs that scanned port 22 in the last hour" gets:
- Parsed by the coordinator's pilot agent (a larger reasoning LLM, e.g. Qwen 2.5 7B-Instruct).
- Decomposed into a plan of one or more CAP v1 packets
(Coordinator-Agent Packet β a typed JSON envelope with a
directive, named entities, and arguments). - Dispatched to the right specialist:
- OPNsense agent (this LoRA) β firewall rules, NAT, VPNβ¦
- WireGuard agent β peer onboarding, key rotation
- CrowdSec agent β decision lists, bouncers, scenarios
- Synthesised back into a single human-readable report.
natural-language admin request
β
βΌ
βββββββββββββββββββββββββββββββββββββββββ
β Coordinator / Pilot (Qwen 2.5 7B) β
β plan βΈ execute βΈ synthesise β
βββββββββββββββββ¬ββββββββββββββββββββββββ
β CAP v1 JSON
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β OPNsense β β WireGuard β β CrowdSec β
β LoRA (this) β β LoRA β β LoRA β
β Phi-3 mini β β Phi-3 mini β β Phi-3 mini β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βΌ βΌ βΌ
OPNsense REST /etc/wireguard CrowdSec LAPI
The CAP v1 packet
Each specialist agent consumes a CAP v1 packet β a normalised envelope produced by the coordinator. Example:
{
"directive": "block_ip",
"entities": {
"IP_ADDRESS": ["203.0.113.42"],
"INTERFACE": ["wan"],
"PORT_NUMBER": [],
"HOSTNAME": [],
"IP_SUBNET": []
},
"args": {"action": "block"},
"context": {"source": "coordinator", "run_id": "plan-abc-1234", "confidence": 0.97}
}
The OPNsense LoRA's job is then to map directive β the right OPNsense
tool call, with entities and args projected into the call's parameters.
This is the format that production traffic actually uses β much
narrower than free-form NL prompting, and the model is trained on
both representations (~40 % CAP v1 / ~60 % NL β tool-call) in the
combined dataset.
Integration verification
scripts/verify_opnsense_v2.py exercises CAP v1 β tool_call on
the full 102-function surface (one CAP packet per function, with
realistic entity payloads):
| Run | Loss | CAP v1 verify | Coverage |
|---|---|---|---|
| v3 (Mar 1) | 0.32 | β | 92 functions |
| v5 (Mar 4) | 0.30 | β₯ 99 % | 99 functions |
| v6 (Mar 5) | 0.28 | 100 % (69/69 on CAP v2 sample) | 99 functions |
| v7 (Mar 7) | 0.2876 | 100 % (102/102) | 102 functions |
The lesson from the v1 β v7 cycles is in the training journal: the right verification target is the production format (CAP v1), not the free-form NL chat format. Once that was understood (around v5), targeted dataset augmentation became surgical instead of guesswork.
What "team play" means in practice
- The OPNsense agent can ask the coordinator for clarification
when an entity is ambiguous (e.g.
INTERFACE: ["wan", "opt1"]). - The coordinator can chain CAP packets: "block IP X" may produce (1) CrowdSec.add_decision + (2) OPNsense.create_filter_rule to enforce the same intent at two layers.
- Each agent runs on its own port (3000/3001/3002), so the failure of one specialist does not poison the others β the coordinator marks the failed step and continues the plan.
The full coordinator + agents architecture is documented in the
cyber-agent-engine
repository (see coordinator/README.md and AGENTS.md).
Quickstart β running it on a Debian VM
1. Build llama.cpp from source (b3813 or later)
Older builds break LoRA loading (specifically, b1-9c69907 silently
ignores --lora-init-without-apply, which causes the adapter to be
merged at the wrong moment and produces garbage for OPNsense tools).
sudo apt install -y build-essential cmake git git-lfs
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && git checkout b3813
cmake -B build -DGGML_CUDA=OFF # add =ON if you have a GPU
cmake --build build -j"$(nproc)"
2. Get the base model as GGUF
mkdir -p ~/models
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf \
Phi-3-mini-4k-instruct-q4.gguf \
--local-dir ~/models/phi-3-mini
(Or any other Q4 / Q5 / Q8 quant of Phi-3-mini-4k-instruct you
trust.)
3. Clone this LoRA
git lfs install
git clone https://huggingface.co/patlegu/opnsense-agent-phi35 \
~/loras/opnsense-agent-phi35
4. Convert the LoRA to GGUF (one-time)
llama-server wants GGUF, not raw safetensors:
cd ~/llama.cpp
python convert_lora_to_gguf.py \
--base ~/models/phi-3-mini \
~/loras/opnsense-agent-phi35
# produces opnsense-agent-phi35-F16-LoRA.gguf next to the adapter
5. Run llama-server with the LoRA applied at load time
~/llama.cpp/build/bin/llama-server \
-m ~/models/phi-3-mini/Phi-3-mini-4k-instruct-q4.gguf \
--lora ~/loras/opnsense-agent-phi35/opnsense-agent-phi35-F16-LoRA.gguf \
--host 0.0.0.0 --port 8080 \
--ctx-size 4096 \
-t "$(nproc)"
Add -ngl 33 if compiled with CUDA β offloads all 32 layers + output
to the GPU and you'll get ~80 tok/s on a 24 GB card vs ~7 tok/s on
CPU.
6. Smoke test
curl -s http://localhost:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "phi-3-mini",
"messages": [
{"role": "system", "content": "You are an OPNsense agent."},
{"role": "user", "content": "List all scheduled cron jobs."}
],
"tools": [{
"type": "function",
"function": {
"name": "get_cron_jobs",
"description": "Get list of all scheduled Cron jobs",
"parameters": {"type": "object", "properties": {}, "required": []}
}
}],
"tool_choice": "auto"
}' | jq '.choices[0].message.tool_calls'
Expected output: a single tool_calls entry pointing at
get_cron_jobs with arguments: "{}".
Tool-calling format
The LoRA was trained on OpenAI-style messages with tools and
tool_calls. A typical training example looks like:
{
"messages": [
{"role": "user", "content": "Retrieve all scheduled Cron jobs."},
{"role": "assistant", "content": null, "tool_calls": [{
"id": "call_g2zmJfQcE",
"type": "function",
"function": {"name": "get_cron_jobs", "arguments": "{}"}
}]},
{"role": "tool", "tool_call_id": "call_g2zmJfQcE",
"content": "{\"status\":\"success\",\"details\":\"β¦\"}"},
{"role": "assistant", "content": "The following cron jobs areβ¦"}
],
"tools": [{
"type": "function",
"function": {"name": "get_cron_jobs",
"description": "Get list of all scheduled Cron jobs",
"parameters": {"type":"object","properties":{},"required":[]}}
}]
}
A minimal agent loop:
- You: build the
toolslist (whitelist of safe OPNsense functions). - Model: returns a
tool_calls[]block. - You: dispatch the call to your OPNsense API client, capture the result.
- You: append a
role: "tool"message with the result. - Re-prompt β the model summarises the result in natural language.
A reference Python implementation lives in agents/opnsense/ of the
training repository.
Limitations & safety
- Tool selection only β the model does not execute anything on
the firewall. Make sure your agent loop applies a
scope_confirmed: boolguard before any mutating call (rule create/delete, service restart, etc.). - Argument hallucination β for tools outside the training set, the model will happily make up plausible-looking parameters. Whitelist your tools and validate every argument server-side.
- Not an autonomous decision-maker β treat it as an admin shortcut ("which API do I call to do X?"), not as a replacement for a human reviewing the change before it lands.
- English-first, French-aware β training set is mostly English with some French. Other languages will degrade fast.
- llama.cpp version pin β LoRA loading is broken in some older llama.cpp builds. Use b3813 or later.
- License of the base model still applies. Phi-3 mini is MIT, but you're responsible for downstream compliance as defined in Microsoft's model card.
Training details
| Setting | Value |
|---|---|
| Base | unsloth/phi-3-mini-4k-instruct-bnb-4bit (4-bit base + LoRA) |
| Epochs | 3 |
| Batch | 2 Γ 8 grad accum (= 16 effective) |
| Learning rate | 2e-4, cosine, 10 % warmup |
| Sequence length | 4 096 |
| Optimizer | adamw_8bit (bf16) |
| Trainer | Unsloth + TRL SFT |
| Dataset | 13 701 SFT examples (combined: base + IDS + traffic-shaping + ACME + IPsec/OpenVPN) |
| Final eval loss | 0.2876 |
| Verification | 102 / 102 functions answered correctly on verify_opnsense_v2.py |
The training script and dataset generation pipeline live in the
cyber-agent-engine
repository: scripts/train_opnsense_lora.py and
data/sft/opnsense_combined_train.jsonl.
Acknowledgements
- Microsoft Phi-3 β base model
- Unsloth β fast LoRA + 4-bit base
- OPNsense β the firewall this is built around
- llama.cpp β inference runtime
Citation
If you use this adapter in research or production, please cite the parent project:
@software{opnsense_agent_phi35_2026,
author = {Le Guyader, P. and contributors},
title = {opnsense-agent-phi35: a Phi-3 LoRA for OPNsense tool-calling},
year = {2026},
url = {https://huggingface.co/patlegu/opnsense-agent-phi35}
}
- Downloads last month
- 30
4-bit
Model tree for patlegu/opnsense-agent-phi35
Base model
unsloth/Phi-3-mini-4k-instruct
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf patlegu/opnsense-agent-phi35:Q4_K_M# Run inference directly in the terminal: llama-cli -hf patlegu/opnsense-agent-phi35:Q4_K_M