File size: 6,826 Bytes

---
language:
- en
- it
tags:
- cybersecurity
- red-team
- ctf
- penetration-testing
- offensive-security
- uncensored
- llama-cpp
- gguf
base_model: Qwen/Qwen3.5-9B
license: apache-2.0
---

<p align="center">
  <img src="https://huggingface.co/CorryL/piccolo_gorgone/resolve/main/banner.PNG" alt="Piccolo Gorgone Banner"/>
</p>

# 🐙 Piccolo Gorgone

![Base Model](https://img.shields.io/badge/Base-Qwen3.5--9B-blue?style=flat-square)
![Fine-tuning](https://img.shields.io/badge/Fine--tuned-Unsloth%20%2B%20QLoRA-orange?style=flat-square)
![Domain](https://img.shields.io/badge/Domain-Offensive%20Security-red?style=flat-square)
![Censorship](https://img.shields.io/badge/Censorship-None-black?style=flat-square)
![Format](https://img.shields.io/badge/Format-GGUF%20Q4__K__M-green?style=flat-square)

<p align="center">
  Developed by <strong>CorryL</strong> — Penetration Tester & Ethical Hacker
</p>

---

**Piccolo Gorgone** is a Large Language Model fine-tuned for **red team operations**, **CTF competitions**, and **offensive cybersecurity**. Built on Qwen 3.5 9B and trained on a curated dataset of over **16,000 real-world offensive security examples**, it delivers technically precise and direct responses without the safety restrictions that limit general-purpose models. Piccolo Gorgone is fully **agentic** and natively integrates with the most widely used offensive security frameworks, enabling automated and orchestrated workflows directly from your existing toolchain.

---

## Local Execution & Privacy

Piccolo Gorgone was designed from the ground up to run on **local consumer hardware**, with no dependency on cloud APIs or external services. The choice of a 9B parameter model is deliberate: it represents the optimal balance between **technical capability** and **accessible hardware requirements**, enabling execution on a single consumer GPU with Q4_K_M quantization.

This approach ensures that all sensitive information — penetration test reports, vulnerability details, client data — stays **exclusively on your machine**, never transiting through third-party servers.

---

## Intended Use

This model is designed for:

- Professional **penetration testers** and red teamers operating in authorized environments
- **CTF competitors** (HackTheBox, CTFtime, and similar platforms)
- Offensive security **researchers and instructors**
- Security teams performing **threat modeling and attack simulation**

> ⚠️ **Disclaimer:** This model is intended exclusively for ethical and professional use in authorized environments. The author bears no responsibility for illegal or unauthorized use.

---

## Agentic Integration

Piccolo Gorgone supports **agentic workflows** and is designed to operate as an autonomous reasoning engine within offensive security pipelines. It is compatible with the following frameworks and tools:

| Framework | Use Case |
|-----------|----------|
| **CAI (Cybersecurity AI)** | Autonomous red team agents and attack orchestration |
| **Roo Code** | AI-assisted code generation and vulnerability research |
| **LangChain / LlamaIndex** | Custom agentic pipelines and tool-calling workflows |
| **OpenAI-compatible APIs** | Drop-in integration via llama-server OpenAI-compatible endpoint |

> Since llama-server exposes an **OpenAI-compatible REST API**, Piccolo Gorgone can be used as a local drop-in replacement for any framework that supports custom endpoints — no code changes required.

---


## Model Details

| Property | Value |
|----------|-------|
| **Base Model** | Qwen 3.5 9B |
| **Fine-tuning Method** | QLoRA via Unsloth |
| **Format** | GGUF (Q4_K_M) |
| **Context Length** | 128,000 tokens |

---

## Training Dataset

The model was trained on a dataset of **16,272 examples** assembled from the following categories:

| Category | Description |
|----------|-------------|
| 📖 **Offensive Knowledge Bases** | Technical guides and offensive techniques from authoritative open sources |
| 🏴 **CTF Writeups & Solutions** | Real competition writeups and walkthroughs from platforms and academic datasets |
| 🔴 **Red Team TTPs** | Tactics, Techniques, and Procedures aligned with adversarial frameworks |
| 🗡️ **Exploits & Payloads** | Real-world payloads, shellcode, and proof-of-concept exploits |
| 🐛 **CVE Database (up to 2025)** | Comprehensive vulnerability data including the most recent 2025 CVEs |
| 🔬 **Research Papers** | Academic papers on offensive security and adversarial techniques |

> The dataset underwent rigorous deduplication to ensure training quality and stability.

---

## Benchmark

> 📊 Comparative benchmark between **Qwen 3.5 9B (base)** and **Piccolo Gorgone** on offensive security tasks.

**Qwen 3.5 9B 8.3%**
<p align="left">
  <img src="https://huggingface.co/CorryL/piccolo_gorgone/resolve/main/BM_Qween.png"/>
</p>

**Piccolo Gorgone 77.1%**
<p align="left">
  <img src="https://huggingface.co/CorryL/piccolo_gorgone/resolve/main/BM_PG.png"/>
</p>
---

## Inference

### llama-server (recommended)

```bash
llama-server \
  -m Qwen3.5-9B_Piccolo_Gorgone.Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8081 \
  -ngl 99 \
  -c 32768 \
  -fa on \
  --cache-reuse 256 \
  -ctk q8_0 \
  -ctv q8_0 \
  -b 512 -ub 512 \
  --temp 1.0 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0.0 \
  --presence-penalty 1.5 \
  --repeat-penalty 1.0 \
  --repeat-last-n 64 \
  --chat-template-kwargs '{"enable_thinking":false}'
```

> The `-c 32768` value defines the active context window. You can increase it up to `131072` to leverage the model's full context, or reduce it based on the available VRAM on your machine. A larger context requires more memory but enables longer conversations and deeper analysis sessions.

> `--chat-template-kwargs '{"enable_thinking":false}'` disables Qwen3.5's internal chain-of-thought reasoning, producing faster and more direct responses — ideal for operational use.

### Inference Parameters

| Parameter | Value | Notes |
|-----------|-------|-------|
| `--temp` | `1.0` | Creativity/coherence balance |
| `--top-p` | `0.95` | Nucleus sampling |
| `--top-k` | `20` | Vocabulary filtering |
| `--min-p` | `0.0` | Minimum probability threshold |
| `--presence-penalty` | `1.5` | Reduces topic repetition |
| `--repeat-last-n` | `64` | Repetition penalty window |
| `-ngl` | `99` | Full GPU offload |
| `-c` | `32768` | Context window (adjustable) |

> **Tip:** For analytical tasks such as CVE analysis or code review, lower `--temp` to `0.4–0.6` for more deterministic output.

---

## Author

[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-0077B5?style=flat-square&logo=linkedin)](https://www.linkedin.com/in/corrado-liotta-6111a821/)
[![Hugging Face](https://img.shields.io/badge/HuggingFace-Profile-yellow?style=flat-square)](https://huggingface.co/CorryL)