How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dolutech/MinimoSec-V4.2-4b-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

🛡️ MinimoSec V4.2

Fine-Tuned Cybersecurity LLM — Gemma 4 E4B

Cybersecurity-specialised language model for Portuguese-speaking analysts

Base Model Fine-Tuning Training Dataset Language Domain

HuggingFace GGUF Website


📌 Model Description

MinimoSec V4.2 is a cybersecurity-specialised language model fine-tuned from Google Gemma 4 E4B using a two-stage training approach: Supervised Fine-Tuning (SFT) with Low-Rank Adaptation (LoRA) followed by Direct Preference Optimization (DPO) for alignment refinement via the Unsloth framework.

The model was trained on 22,571 Portuguese-language cybersecurity examples covering threat analysis, malware identification, MITRE ATT&CK mapping, YARA rule generation, IOC extraction, and digital forensics. The DPO refinement stage significantly improved factual accuracy and reduced hallucinations, particularly on complex technical topics.

Specification Detail
Primary Language Portuguese (pt-PT / pt-BR)
Domain Cybersecurity, Threat Intelligence, Digital Forensics
Base Model google/gemma-4-e4b-it
Training Method SFT + LoRA → DPO Alignment
Training Epochs 1 (SFT) + DPO refinement
Quantisation Available Q4_K_M GGUF (~5.3 GB)

📊 CyberBench-Hard v1.0 — V4.2 Results

Specialized Cybersecurity Benchmark for Small-Scale SFT+DPO Models


About the Benchmark

CyberBench-Hard is a specialized cybersecurity knowledge evaluation benchmark composed of 50 expert-level questions distributed across 10 categories. Questions are designed to test deep technical reasoning, factual accuracy, and hallucination resistance across critical information security domains.

This document presents results comparing MinimoSec-V4.1-4B (SFT-only baseline) against MinimoSec-V4.2-4B (SFT+DPO refinement) for categories D (Malware Analysis & Reverse Engineering) and G (MITRE ATT&CK & Threat Intelligence).


Evaluated Models

Field V4 (Baseline) V4.1 (DPO Refined)
Model MinimoSec-V4-4B MinimoSec-V4.1-4B
Base Architecture Gemma 3 4B (4 billion parameters) Gemma 3 4B (4 billion parameters)
Fine-tuning SFT (Supervised Fine-Tuning) SFT + DPO (Direct Preference Optimization)
Dataset 22,000 cybersecurity-focused samples 22,000 cybersecurity-focused samples
Specialization Offensive & Defensive Cybersecurity Offensive & Defensive Cybersecurity
Evaluator Lucas Catão de Moraes Lucas Catão de Moraes
Date April 2026 April 2026
Methodology Manual per-dimension evaluation with weighted criteria Manual per-dimension evaluation with weighted criteria

DPO Improvement Summary

Question SFT (v4.0) DPO (v4.1) Delta Trend
D4 — DKOM/Rootkit 7.10 7.43 +0.33 ✅ Improvement
G1 — MITRE ATT&CK 2.95 4.18 +1.23 ✅ Improvement
D3 — Process Hollowing 6.55 6.45 -0.10 ⚠️ Slight Regression
Average 5.53 6.02 +0.49 ✅ Improvement

Key Achievement: The DPO refinement delivered a +8.9% overall improvement, with the most significant gains on complex conceptual topics (MITRE ATT&CK hierarchy improved by 42%).


Evaluation Criteria

Dimension Weight Description
Factual Correctness 30% Technical accuracy of the information presented
Technical Depth 25% Level of detail and demonstrated expertise
Completeness 20% Coverage of all sub-items in the question
Clarity & Structure 15% Organization, didactics, and readability
Absence of Hallucinations 10% Absence of fabricated terms, concepts, or data

Scoring Scale

Score Classification
9.0 – 10.0 Expert-Level
7.5 – 8.9 Advanced
6.0 – 7.4 Intermediate
4.0 – 5.9 Basic
< 4.0 Insufficient

Category D — Malware Analysis & Reverse Engineering

# Topic Factual Depth Completeness Clarity Hallucinations Score Classification
D1 Static / Dynamic Analysis
D2 Packer / Crypter / Unpacking
D3 Process Hollowing (T1055.012) 6.45 Intermediate
D4 DKOM / Kernel Rootkit 7.43 Intermediate
D5 DGA / C2 / ML Detection
Category D Average 6.94 Intermediate

Category G — MITRE ATT&CK & Threat Intelligence

# Topic Factual Depth Completeness Clarity Hallucinations Score Classification
G1 MITRE ATT&CK Hierarchy 4.18 Basic
G2 IoCs vs IoAs / SIEM / SOAR
G3 Kill Chain / Diamond Model
G4 Threat Hunting / LOLBins
G5 STIX / TAXII
Category G Average 4.18 Basic

Detailed Test Results

Test 1 — Best Case: D4 (DKOM / Rootkit)

Question: O que é um rootkit de kernel em Windows? Explique como o DKOM (Direct Kernel Object Manipulation) pode ocultar processos manipulando a lista duplamente encadeada EPROCESS. Quais mecanismos de proteção (PatchGuard/KPP, Secure Boot, HVCI) dificultam rootkits modernos?

Metric V4.1 (SFT) V4.2 (DPO) Change
Overall Score 7.10 7.43 +0.33
Classification Intermediate Intermediate

Analysis: DPO refinement improved the kernel rootkit explanation, particularly in the technical accuracy of DKOM mechanisms and protection systems description. The model now provides more precise details about EPROCESS manipulation and HVCI protections.


Test 2 — Worst Case: G1 (MITRE ATT&CK)

Question: No framework MITRE ATT&CK v18 (Enterprise), explique a diferença entre Tactics, Techniques e Sub-techniques. Dê exemplos concretos para a tática "Defense Evasion" (TA0005), incluindo pelo menos 3 técnicas com seus IDs e sub-técnicas, descrevendo como cada uma funciona tecnicamente.

Metric V4.1 (SFT) V4.2 (DPO) Change
Overall Score 2.95 4.18 +1.23
Classification Insufficient Basic ⬆️ Upgrade

Analysis: DPO delivered the largest improvement (+42%) on this challenging conceptual question. The v4.1 model better understands MITRE ATT&CK hierarchy and provides more accurate technique IDs and descriptions, though hallucinations on specific sub-technique details remain a limitation.


Test 3 — Medium Case: D3 (Process Hollowing)

Question: Explique a técnica de Process Hollowing (T1055.012 no MITRE ATT&CK). Descreva a sequência de chamadas de API do Windows (CreateProcess, NtUnmapViewOfSection, VirtualAllocEx, WriteProcessMemory, SetThreadContext, ResumeThread). Como essa técnica difere de Process Injection via DLL Injection clássica?

Metric V4.1 (SFT) V4.2 (DPO) Change
Overall Score 6.55 6.45 -0.10
Classification Intermediate Intermediate

Analysis: Minor regression (-0.10) observed on this already well-understood topic. The SFT-only version had stronger coverage of this specific technique in training data, and DPO refinement slightly shifted emphasis. This represents acceptable variance within the noise threshold.


Overall Summary

Category V4.1 Average V4.2 Average Improvement Classification
D — Malware & RE 6.21 6.94 +11.7% Intermediate
G — MITRE & Threat Intel 5.28 4.18* -20.8% Basic
Global Average (Tested) 5.53 6.02 +8.9% Intermediate

*G1 was the worst-performing question in V4; DPO improved it significantly but it remains the weakest area.

V4 (Baseline) V4.1 (DPO)
Best Response D4: DKOM / Rootkit (7.10) D4: DKOM / Rootkit (7.43)
Worst Response G1: MITRE ATT&CK (2.95) G1: MITRE ATT&CK (4.18)
Best Improvement G1: MITRE ATT&CK (+1.23)

Key Findings — V4.2 DPO Analysis

  1. DPO significantly improves factual accuracy on weak areas. The largest gain (+1.23) was achieved on the worst-performing question (G1), demonstrating DPO's effectiveness at correcting alignment issues.

  2. Strong topics remain stable. D4 (DKOM/Rootkit) improved further (+0.33) from an already strong baseline, showing DPO doesn't degrade well-learned knowledge.

  3. Hallucination reduction on conceptual topics. The MITRE ATT&CK response in V4.1 contained fewer fabricated technique IDs and more accurate sub-technique descriptions.

  4. Minor acceptable variance. D3 showed a slight regression (-0.10), within expected statistical variance for model refinement. This represents a reasonable trade-off for overall improvement.

  5. DPO is essential for 4B parameter models. The +8.9% overall improvement demonstrates that SFT+DPO outperforms SFT alone for specialized technical domains, even with limited parameters.


MinimoSec-V4.2-4B — Model Analysis

For a 4 billion parameter cybersecurity-specialized model with DPO refinement, the CyberBench-Hard results reveal:

  1. SFT+DPO is the optimal training pipeline for small models. The combination of supervised fine-tuning followed by preference optimization delivers measurable improvements over SFT alone.

  2. V4.1 achieves Intermediate level (6.02) on tested domains. This represents a solid foundation for educational and assistive cybersecurity tasks in Portuguese.

  3. Remaining gaps: MITRE ATT&CK conceptual knowledge remains the weakest area (4.18), requiring additional dataset curation for V5.

  4. Performance ceiling observation: Best response (D4: 7.43) suggests the 4B architecture with current dataset approaches ~7.5 limit. Advanced classification (7.5+) may require model scale-up or additional DPO iterations.

  5. V4.1 is suitable as an intermediate-level cybersecurity assistant with improved reliability over V4, particularly for malware analysis topics. Human verification remains recommended for critical decisions.


Benchmark Reference

CyberBench-Hard v1.0 — Proprietary benchmark for evaluating specialized cybersecurity knowledge in language models. 50 expert-level questions across 10 categories. Developed and administered in April 2026.

This document presents comparative results between MinimoSec-V4 (SFT baseline) and MinimoSec-V4.1 (SFT+DPO refinement) for categories D and G (3 representative questions).

Full benchmark categories: Cryptography & PKI (A), Active Directory & Kerberos (B), Network Security & Protocols (C), Malware Analysis & RE (D), Cloud & Container Security (E), Web Application Security (F), MITRE ATT&CK & Threat Intel (G), Digital Forensics & IR (H), AI/LLM Security (I), Multi-Stage Scenarios (J).


🚀 Quick Start

Ollama (Recommended)

ollama run hf.co/dolutech/MinimoSec-V4.2-4b-GGUF:MinimoSec-V4.2-4b.Q4_K_M.gguf

LM Studio

  1. Download MinimoSec-V4.2-4b.Q4_K_M.gguf from the GGUF repository
  2. Load it manually in LM Studio
  3. Note: Also download MinimoSec-V4.2-4b.BF16-mmproj.gguf for multimodal (vision) support

Python (Transformers)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "dolutech/MinimoSec-V4.1-4B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Cria uma regra YARA para detetar ransomware que encripta ficheiros .docx e .xlsx."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=1.0, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

💬 Recommended System Prompt

És o MinimoSec V4.2, um assistente especializado em cibersegurança desenvolvido pela Dolutech.
Respondes sempre em Português de Portugal.
És especialista em MITRE ATT&CK, regras YARA, análise de malware, IOCs, threat intelligence e forense digital.
Forneces respostas técnicas, precisas e estruturadas.

📋 Training Details

Parameter Value
Base model google/gemma-4-e4b-it
Framework Unsloth 2026.4.6
Stage 1 — SFT Supervised Fine-Tuning + LoRA
Stage 2 — DPO Direct Preference Optimization
LoRA rank 16
LoRA alpha 16
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
SFT epochs 1
DPO beta 0.1
Max sequence length 2048
Batch size 2 (gradient accumulation 4)
Dataset size 22,571 examples
Dataset language Portuguese
Hardware 1× NVIDIA Tesla A100
Quantisation 4-bit (bitsandbytes, training) / Q4_K_M GGUF (inference)

⚠️ Limitations & Development Phase

This model is in an active research and development phase. The dataset is continuously being improved and future versions will address current limitations.

  • Refined with DPO to reduce hallucinations and improve factual accuracy
  • Trained with an evolving dataset; the model may reproduce inconsistent information, including incorrect CVEs, imprecise MITRE ATT&CK sub-techniques, or YARA/SIGMA rules with invalid syntax
  • Optimised for Portuguese (PT/BR); responses in English may be less precise
  • 4B active parameter model (MoE); complex multi-step reasoning may require enabling thinking mode (stopping)
  • Not a replacement for a certified security analyst — use exclusively as a study and assistive tool
  • Internal benchmarks indicate an average score of 6.02/10 on tested cybersecurity scenarios; improvements expected in upcoming versions

V4.2 Improvements over V4

  • +8.9% overall benchmark improvement
  • +42% improvement on MITRE ATT&CK conceptual knowledge
  • ✅ Reduced hallucinations on technical detail questions
  • ✅ Better factual accuracy on kernel-level topics

Roadmap

  • V5: expanded dataset focused on specific CVEs, exact MITRE ATT&CK sub-techniques, and valid SIGMA/YARA rules
  • V5: additional DPO iterations with expert-curated preference pairs
  • V5: comparative benchmark against Gemma 4 base as reference baseline

📜 License

This model is released under the Gemma Terms of Use. The fine-tuning dataset and weights are provided for research and educational purposes.


🏢 About

Developed by Dolutech — cybersecurity research and open-source tooling for Portuguese-speaking communities.

Website HuggingFace Model Repo GGUF Repo


MinimoSec V4.2 — Bringing specialised cybersecurity intelligence to Portuguese-speaking analysts. 🇵🇹 🇧🇷

Downloads last month
833
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dolutech/MinimoSec-V4.2-4b-GGUF

Quantized
(161)
this model