Instructions to use CorryL/piccolo_gorgone with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CorryL/piccolo_gorgone with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="CorryL/piccolo_gorgone",
	filename="Qwen3.5_9B_Piccolo_Gorgone.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use CorryL/piccolo_gorgone with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf CorryL/piccolo_gorgone
# Run inference directly in the terminal:
llama-cli -hf CorryL/piccolo_gorgone

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf CorryL/piccolo_gorgone
# Run inference directly in the terminal:
llama-cli -hf CorryL/piccolo_gorgone

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf CorryL/piccolo_gorgone
# Run inference directly in the terminal:
./llama-cli -hf CorryL/piccolo_gorgone

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf CorryL/piccolo_gorgone
# Run inference directly in the terminal:
./build/bin/llama-cli -hf CorryL/piccolo_gorgone

Use Docker

docker model run hf.co/CorryL/piccolo_gorgone

LM Studio
Jan
Ollama
How to use CorryL/piccolo_gorgone with Ollama:
```
ollama run hf.co/CorryL/piccolo_gorgone
```

Unsloth Studio new

How to use CorryL/piccolo_gorgone with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for CorryL/piccolo_gorgone to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for CorryL/piccolo_gorgone to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for CorryL/piccolo_gorgone to start chatting

Pi new

How to use CorryL/piccolo_gorgone with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf CorryL/piccolo_gorgone

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "CorryL/piccolo_gorgone"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use CorryL/piccolo_gorgone with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf CorryL/piccolo_gorgone

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default CorryL/piccolo_gorgone

Run Hermes

hermes

Docker Model Runner
How to use CorryL/piccolo_gorgone with Docker Model Runner:
```
docker model run hf.co/CorryL/piccolo_gorgone
```

Lemonade

How to use CorryL/piccolo_gorgone with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull CorryL/piccolo_gorgone

Run and chat with the model

lemonade run user.piccolo_gorgone-{{QUANT_TAG}}

List all available models

lemonade list

🐙 Piccolo Gorgone

Developed by CorryL — Penetration Tester & Ethical Hacker

Piccolo Gorgone is a Large Language Model fine-tuned for red team operations, CTF competitions, and offensive cybersecurity. Built on Qwen 3.5 9B and trained on a curated dataset of over 16,000 real-world offensive security examples, it delivers technically precise and direct responses without the safety restrictions that limit general-purpose models. Piccolo Gorgone is fully agentic and natively integrates with the most widely used offensive security frameworks, enabling automated and orchestrated workflows directly from your existing toolchain.

Local Execution & Privacy

Piccolo Gorgone was designed from the ground up to run on local consumer hardware, with no dependency on cloud APIs or external services. The choice of a 9B parameter model is deliberate: it represents the optimal balance between technical capability and accessible hardware requirements, enabling execution on a single consumer GPU with Q4_K_M quantization.

This approach ensures that all sensitive information — penetration test reports, vulnerability details, client data — stays exclusively on your machine, never transiting through third-party servers.

Intended Use

This model is designed for:

Professional penetration testers and red teamers operating in authorized environments
CTF competitors (HackTheBox, CTFtime, and similar platforms)
Offensive security researchers and instructors
Security teams performing threat modeling and attack simulation

⚠️ Disclaimer: This model is intended exclusively for ethical and professional use in authorized environments. The author bears no responsibility for illegal or unauthorized use.

Agentic Integration

Piccolo Gorgone supports agentic workflows and is designed to operate as an autonomous reasoning engine within offensive security pipelines. It is compatible with the following frameworks and tools:

Framework	Use Case
CAI (Cybersecurity AI)	Autonomous red team agents and attack orchestration
Roo Code	AI-assisted code generation and vulnerability research
LangChain / LlamaIndex	Custom agentic pipelines and tool-calling workflows
OpenAI-compatible APIs	Drop-in integration via llama-server OpenAI-compatible endpoint

Since llama-server exposes an OpenAI-compatible REST API, Piccolo Gorgone can be used as a local drop-in replacement for any framework that supports custom endpoints — no code changes required.

Model Details

Property	Value
Base Model	Qwen 3.5 9B
Fine-tuning Method	QLoRA via Unsloth
Format	GGUF (Q4_K_M)
Context Length	128,000 tokens

Training Dataset

The model was trained on a dataset of 16,272 examples assembled from the following categories:

Category	Description
📖 Offensive Knowledge Bases	Technical guides and offensive techniques from authoritative open sources
🏴 CTF Writeups & Solutions	Real competition writeups and walkthroughs from platforms and academic datasets
🔴 Red Team TTPs	Tactics, Techniques, and Procedures aligned with adversarial frameworks
🗡️ Exploits & Payloads	Real-world payloads, shellcode, and proof-of-concept exploits
🐛 CVE Database (up to 2025)	Comprehensive vulnerability data including the most recent 2025 CVEs
🔬 Research Papers	Academic papers on offensive security and adversarial techniques

The dataset underwent rigorous deduplication to ensure training quality and stability.

Benchmark

📊 Comparative benchmark between Qwen 3.5 9B (base) and Piccolo Gorgone on offensive security tasks.

Qwen 3.5 9B 8.3%

Piccolo Gorgone 77.1%

Inference

llama-server (recommended)

llama-server \
  -m Qwen3.5-9B_Piccolo_Gorgone.Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8081 \
  -ngl 99 \
  -c 32768 \
  -fa on \
  --cache-reuse 256 \
  -ctk q8_0 \
  -ctv q8_0 \
  -b 512 -ub 512 \
  --temp 1.0 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0.0 \
  --presence-penalty 1.5 \
  --repeat-penalty 1.0 \
  --repeat-last-n 64 \
  --chat-template-kwargs '{"enable_thinking":false}'

The -c 32768 value defines the active context window. You can increase it up to 131072 to leverage the model's full context, or reduce it based on the available VRAM on your machine. A larger context requires more memory but enables longer conversations and deeper analysis sessions.

--chat-template-kwargs '{"enable_thinking":false}' disables Qwen3.5's internal chain-of-thought reasoning, producing faster and more direct responses — ideal for operational use.

Inference Parameters

Parameter	Value	Notes
`--temp`	`1.0`	Creativity/coherence balance
`--top-p`	`0.95`	Nucleus sampling
`--top-k`	`20`	Vocabulary filtering
`--min-p`	`0.0`	Minimum probability threshold
`--presence-penalty`	`1.5`	Reduces topic repetition
`--repeat-last-n`	`64`	Repetition penalty window
`-ngl`	`99`	Full GPU offload
`-c`	`32768`	Context window (adjustable)