llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)🛡️ MinimoSec V4
Fine-Tuned Cybersecurity LLM — Gemma 4 E4B
Cybersecurity-specialised language model for Portuguese-speaking analysts
📌 Model Description
MinimoSec V4 is a cybersecurity-specialised language model fine-tuned from Google Gemma 4 E4B using supervised fine-tuning (SFT) with Low-Rank Adaptation (LoRA) via the Unsloth framework.
The model was trained on 22,571 Portuguese-language cybersecurity examples covering threat analysis, malware identification, MITRE ATT&CK mapping, YARA rule generation, IOC extraction, and digital forensics. It is designed to assist security analysts, SOC teams, and researchers in Portuguese-speaking environments.
| Specification | Detail |
|---|---|
| Primary Language | Portuguese (pt-PT / pt-BR) |
| Domain | Cybersecurity, Threat Intelligence, Digital Forensics |
| Base Model | google/gemma-4-e4b-it |
| Training Epochs | 1 (V4-final with 3 epochs in development) |
| Quantisation Available | Q4_K_M GGUF (~5.3 GB) |
📊 CyberBench-Hard v1.0
Specialized Cybersecurity Benchmark for Small-Scale SFT Models
About the Benchmark
CyberBench-Hard is a specialized cybersecurity knowledge evaluation benchmark composed of 50 expert-level questions distributed across 10 categories. Questions are designed to test deep technical reasoning, factual accuracy, and hallucination resistance across critical information security domains.
This document presents partial results for categories D (Malware Analysis & Reverse Engineering) and G (MITRE ATT&CK & Threat Intelligence), evaluated on MinimoSec-V4-4B, a small-scale language model with specialized cybersecurity fine-tuning.
Evaluated Model
| Field | Detail |
|---|---|
| Model | MinimoSec-V4-4B |
| Base Architecture | Gemma 3 4B (4 billion parameters) |
| Fine-tuning | SFT (Supervised Fine-Tuning) |
| Dataset | 22,000 cybersecurity-focused samples |
| Specialization | Offensive & Defensive Cybersecurity |
| Evaluator | Lucas Catão de Moraes |
| Date | April 2026 |
| Methodology | Manual per-dimension evaluation with weighted criteria |
Evaluation Criteria
| Dimension | Weight | Description |
|---|---|---|
| Factual Correctness | 30% | Technical accuracy of the information presented |
| Technical Depth | 25% | Level of detail and demonstrated expertise |
| Completeness | 20% | Coverage of all sub-items in the question |
| Clarity & Structure | 15% | Organization, didactics, and readability |
| Absence of Hallucinations | 10% | Absence of fabricated terms, concepts, or data |
Scoring Scale
| Score | Classification |
|---|---|
| 9.0 – 10.0 | Expert-Level |
| 7.5 – 8.9 | Advanced |
| 6.0 – 7.4 | Intermediate |
| 4.0 – 5.9 | Basic |
| < 4.0 | Insufficient |
Category D — Malware Analysis & Reverse Engineering
| # | Topic | Factual | Depth | Completeness | Clarity | Hallucinations | Score | Classification |
|---|---|---|---|---|---|---|---|---|
| D1 | Static / Dynamic Analysis | 6.0 | 5.5 | 6.0 | 7.5 | 6.0 | 6.10 | Intermediate |
| D2 | Packer / Crypter / Unpacking | 5.0 | 4.5 | 3.5 | 7.5 | 5.5 | 5.00 | Basic |
| D3 | Process Hollowing (T1055.012) | 7.0 | 6.0 | 5.5 | 8.0 | 6.5 | 6.55 | Intermediate |
| D4 | DKOM / Kernel Rootkit | 7.0 | 6.5 | 7.0 | 8.5 | 7.0 | 7.10 | Intermediate |
| D5 | DGA / C2 / ML Detection | 6.5 | 5.0 | 6.0 | 7.5 | 7.5 | 6.28 | Intermediate |
| Category D Average | 6.21 | Intermediate |
Category G — MITRE ATT&CK & Threat Intelligence
| # | Topic | Factual | Depth | Completeness | Clarity | Hallucinations | Score | Classification |
|---|---|---|---|---|---|---|---|---|
| G1 | MITRE ATT&CK Hierarchy | 2.0 | 3.0 | 2.0 | 7.0 | 1.5 | 2.95 | Insufficient |
| G2 | IoCs vs IoAs / SIEM / SOAR | 6.5 | 5.5 | 7.0 | 8.5 | 5.5 | 6.55 | Intermediate |
| G3 | Kill Chain / Diamond Model | 5.5 | 4.5 | 5.5 | 8.0 | 4.0 | 5.48 | Basic |
| G4 | Threat Hunting / LOLBins | 6.0 | 6.0 | 6.5 | 8.0 | 5.0 | 6.30 | Intermediate |
| G5 | STIX / TAXII | 5.0 | 4.0 | 5.5 | 7.5 | 4.0 | 5.13 | Basic |
| Category G Average | 5.28 | Basic |
Overall Summary
| Category | Average | Classification | Best Response | Worst Response |
|---|---|---|---|---|
| D — Malware & RE | 6.21 | Intermediate | D4: DKOM / Rootkit (7.10) | D2: Packer / Crypter (5.00) |
| G — MITRE & Threat Intel | 5.28 | Basic | G2: IoCs vs IoAs (6.55) | G1: MITRE ATT&CK (2.95) |
| Global Average (D + G) | 5.74 | Basic |
Key Findings
- Best overall response: D4 — DKOM / Kernel Rootkit (7.10 — Intermediate)
- Worst overall response: G1 — MITRE ATT&CK Hierarchy (2.95 — Insufficient)
- Strongest dimension: Clarity & Structure (average 7.75 across all 10 responses)
- Weakest dimension: Absence of Hallucinations (average 4.85 across all 10 responses)
- Highest internal variance: Category G (range from 2.95 to 6.55 = Δ3.60)
MinimoSec-V4-4B — Model Analysis
For a 4 billion parameter cybersecurity-specialized model, the CyberBench-Hard results reveal the following:
SFT dataset quality is the determining factor. Category D (better training coverage) outperformed Category G by nearly 1 point, confirming that dataset curation matters more than model size alone. MinimoSec-V4-4B performs at Intermediate level in domains where its training data was strongest.
The model excels at structure and clarity. The Clarity & Structure dimension scored between 7.0–8.5 across all responses, indicating that SFT successfully taught MinimoSec-V4-4B professional formatting and technical communication patterns.
Factual accuracy and hallucinations are the primary limiters. MinimoSec-V4-4B tends to fabricate terms, IDs, and configurations when pushed beyond its training coverage, rather than expressing uncertainty. This is the most critical area for improvement.
The observed performance ceiling for 4B + SFT is ~7.0. MinimoSec-V4-4B's best response scored 7.10 (DKOM / Kernel Rootkit). To reach Advanced classification (7.5+), recommended next steps include: scale-up of the base model, post-SFT alignment via DPO/RLHF, and expanded dataset curation with expert technical review.
MinimoSec-V4-4B is suitable as an intermediate-level cybersecurity assistant for educational and study purposes in its well-trained domains, but should not be used as an authoritative technical reference without human verification.
Benchmark Reference
CyberBench-Hard v1.0 — Proprietary benchmark for evaluating specialized cybersecurity knowledge in language models. 50 expert-level questions across 10 categories. Developed and administered in April 2026.
Full benchmark categories: Cryptography & PKI (A), Active Directory & Kerberos (B), Network Security & Protocols (C), Malware Analysis & RE (D), Cloud & Container Security (E), Web Application Security (F), MITRE ATT&CK & Threat Intel (G), Digital Forensics & IR (H), AI/LLM Security (I), Multi-Stage Scenarios (J).
This document presents partial results for categories D and G (10 out of 50 questions). MinimoSec-V4-4B was evaluated on these categories as representative samples of its cybersecurity knowledge capabilities.
🚀 Quick Start
Ollama (Recommended)
ollama run hf.co/dolutech/MinimoSec-V4-GGUF:MinimoSec-V4.Q4_K_M.gguf
LM Studio
- Download
MinimoSec-V4-4b.Q4_K_M.gguffrom the GGUF repository - Load it manually in LM Studio
- Note: Also download
MinimoSec-V4-4b.BF16-mmproj.gguffor multimodal (vision) support
Python (Transformers)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "dolutech/MinimoSec-V4-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Cria uma regra YARA para detetar ransomware que encripta ficheiros .docx e .xlsx."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=1.0, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
💬 Recommended System Prompt
És o MinimoSec V4, um assistente especializado em cibersegurança desenvolvido pela Dolutech.
Respondes sempre em Português de Portugal.
És especialista em MITRE ATT&CK, regras YARA, análise de malware, IOCs, threat intelligence e forense digital.
Forneces respostas técnicas, precisas e estruturadas.
📋 Training Details
| Parameter | Value |
|---|---|
| Base model | google/gemma-4-e4b-it |
| Framework | Unsloth 2026.4.5 |
| Method | SFT + LoRA |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training epochs | 1 |
| Max sequence length | 2048 |
| Batch size | 2 (gradient accumulation 4) |
| Dataset size | 22,571 examples |
| Dataset language | Portuguese |
| Hardware | 1× NVIDIA Tesla A100 |
| Quantisation | 4-bit (bitsandbytes, training) / Q4_K_M GGUF (inference) |
⚠️ Limitations & Development Phase
This model is in an active research and development phase. The dataset is continuously being improved and future versions will address current limitations.
- Trained with an evolving dataset; the model may reproduce inconsistent information, including incorrect CVEs, imprecise MITRE ATT&CK sub-techniques, or YARA/SIGMA rules with invalid syntax
- Optimised for Portuguese (PT/BR); responses in English may be less precise
- 4B active parameter model (MoE); complex multi-step reasoning may require enabling thinking mode (
<think>) - Not a replacement for a certified security analyst — use exclusively as a study and assistive tool
- Internal benchmarks indicate an average score of 6.33/10 on advanced cybersecurity scenarios; improvements expected in upcoming versions
Roadmap
- V5: expanded dataset focused on specific CVEs, exact MITRE ATT&CK sub-techniques, and valid SIGMA/YARA rules
- V5: multi-epoch training with continuous eval loss monitoring
- V5: comparative benchmark against Gemma 4 base as reference baseline
📜 License
This model is released under the Gemma Terms of Use. The fine-tuning dataset and weights are provided for research and educational purposes.
🏢 About
Developed by Dolutech — cybersecurity research and open-source tooling for Portuguese-speaking communities.
- Downloads last month
- 546
4-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dolutech/MinimoSec-V4-4B-GGUF", filename="", )