Update README.md

ade9118 verified 8 days ago

6.83 kB

	---
	language:
	- en
	- it
	tags:
	- cybersecurity
	- red-team
	- ctf
	- penetration-testing
	- offensive-security
	- uncensored
	- llama-cpp
	- gguf
	base_model: Qwen/Qwen3.5-9B
	license: apache-2.0
	---

	<p align="center">
	<img src="https://huggingface.co/CorryL/piccolo_gorgone/resolve/main/banner.PNG" alt="Piccolo Gorgone Banner"/>
	</p>

	# 🐙 Piccolo Gorgone

	![Base Model](https://img.shields.io/badge/Base-Qwen3.5--9B-blue?style=flat-square)
	![Fine-tuning](https://img.shields.io/badge/Fine--tuned-Unsloth%20%2B%20QLoRA-orange?style=flat-square)
	![Domain](https://img.shields.io/badge/Domain-Offensive%20Security-red?style=flat-square)
	![Censorship](https://img.shields.io/badge/Censorship-None-black?style=flat-square)
	![Format](https://img.shields.io/badge/Format-GGUF%20Q4__K__M-green?style=flat-square)

	<p align="center">
	Developed by <strong>CorryL</strong> — Penetration Tester & Ethical Hacker
	</p>

	---

	Piccolo Gorgone is a Large Language Model fine-tuned for red team operations, CTF competitions, and offensive cybersecurity. Built on Qwen 3.5 9B and trained on a curated dataset of over 16,000 real-world offensive security examples, it delivers technically precise and direct responses without the safety restrictions that limit general-purpose models. Piccolo Gorgone is fully agentic and natively integrates with the most widely used offensive security frameworks, enabling automated and orchestrated workflows directly from your existing toolchain.

	---

	## Local Execution & Privacy

	Piccolo Gorgone was designed from the ground up to run on local consumer hardware, with no dependency on cloud APIs or external services. The choice of a 9B parameter model is deliberate: it represents the optimal balance between technical capability and accessible hardware requirements, enabling execution on a single consumer GPU with Q4_K_M quantization.

	This approach ensures that all sensitive information — penetration test reports, vulnerability details, client data — stays exclusively on your machine, never transiting through third-party servers.

	---

	## Intended Use

	This model is designed for:

	- Professional penetration testers and red teamers operating in authorized environments
	- CTF competitors (HackTheBox, CTFtime, and similar platforms)
	- Offensive security researchers and instructors
	- Security teams performing threat modeling and attack simulation

	> ⚠️ Disclaimer: This model is intended exclusively for ethical and professional use in authorized environments. The author bears no responsibility for illegal or unauthorized use.

	---

	## Agentic Integration

	Piccolo Gorgone supports agentic workflows and is designed to operate as an autonomous reasoning engine within offensive security pipelines. It is compatible with the following frameworks and tools:

	\| Framework \| Use Case \|
	\|-----------\|----------\|
	\| CAI (Cybersecurity AI) \| Autonomous red team agents and attack orchestration \|
	\| Roo Code \| AI-assisted code generation and vulnerability research \|
	\| LangChain / LlamaIndex \| Custom agentic pipelines and tool-calling workflows \|
	\| OpenAI-compatible APIs \| Drop-in integration via llama-server OpenAI-compatible endpoint \|

	> Since llama-server exposes an OpenAI-compatible REST API, Piccolo Gorgone can be used as a local drop-in replacement for any framework that supports custom endpoints — no code changes required.

	---


	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| Qwen 3.5 9B \|
	\| Fine-tuning Method \| QLoRA via Unsloth \|
	\| Format \| GGUF (Q4_K_M) \|
	\| Context Length \| 128,000 tokens \|

	---

	## Training Dataset

	The model was trained on a dataset of 16,272 examples assembled from the following categories:

	\| Category \| Description \|
	\|----------\|-------------\|
	\| 📖 Offensive Knowledge Bases \| Technical guides and offensive techniques from authoritative open sources \|
	\| 🏴 CTF Writeups & Solutions \| Real competition writeups and walkthroughs from platforms and academic datasets \|
	\| 🔴 Red Team TTPs \| Tactics, Techniques, and Procedures aligned with adversarial frameworks \|
	\| 🗡️ Exploits & Payloads \| Real-world payloads, shellcode, and proof-of-concept exploits \|
	\| 🐛 CVE Database (up to 2025) \| Comprehensive vulnerability data including the most recent 2025 CVEs \|
	\| 🔬 Research Papers \| Academic papers on offensive security and adversarial techniques \|

	> The dataset underwent rigorous deduplication to ensure training quality and stability.

	---

	## Benchmark

	> 📊 Comparative benchmark between Qwen 3.5 9B (base) and Piccolo Gorgone on offensive security tasks.

	Qwen 3.5 9B 8.3%
	<p align="left">
	<img src="https://huggingface.co/CorryL/piccolo_gorgone/resolve/main/BM_Qween.png"/>
	</p>

	Piccolo Gorgone 77.1%
	<p align="left">
	<img src="https://huggingface.co/CorryL/piccolo_gorgone/resolve/main/BM_PG.png"/>
	</p>
	---

	## Inference

	### llama-server (recommended)

	```bash
	llama-server \
	-m Qwen3.5-9B_Piccolo_Gorgone.Q4_K_M.gguf \
	--host 0.0.0.0 \
	--port 8081 \
	-ngl 99 \
	-c 32768 \
	-fa on \
	--cache-reuse 256 \
	-ctk q8_0 \
	-ctv q8_0 \
	-b 512 -ub 512 \
	--temp 1.0 \
	--top-p 0.95 \
	--top-k 20 \
	--min-p 0.0 \
	--presence-penalty 1.5 \
	--repeat-penalty 1.0 \
	--repeat-last-n 64 \
	--chat-template-kwargs '{"enable_thinking":false}'
	```

	> The `-c 32768` value defines the active context window. You can increase it up to `131072` to leverage the model's full context, or reduce it based on the available VRAM on your machine. A larger context requires more memory but enables longer conversations and deeper analysis sessions.

	> `--chat-template-kwargs '{"enable_thinking":false}'` disables Qwen3.5's internal chain-of-thought reasoning, producing faster and more direct responses — ideal for operational use.

	### Inference Parameters

	\| Parameter \| Value \| Notes \|
	\|-----------\|-------\|-------\|
	\| `--temp` \| `1.0` \| Creativity/coherence balance \|
	\| `--top-p` \| `0.95` \| Nucleus sampling \|
	\| `--top-k` \| `20` \| Vocabulary filtering \|
	\| `--min-p` \| `0.0` \| Minimum probability threshold \|
	\| `--presence-penalty` \| `1.5` \| Reduces topic repetition \|
	\| `--repeat-last-n` \| `64` \| Repetition penalty window \|
	\| `-ngl` \| `99` \| Full GPU offload \|
	\| `-c` \| `32768` \| Context window (adjustable) \|

	> Tip: For analytical tasks such as CVE analysis or code review, lower `--temp` to `0.4–0.6` for more deterministic output.

	---

	## Author

	[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-0077B5?style=flat-square&logo=linkedin)](https://www.linkedin.com/in/corrado-liotta-6111a821/)
	[![Hugging Face](https://img.shields.io/badge/HuggingFace-Profile-yellow?style=flat-square)](https://huggingface.co/CorryL)