Instructions to use Jashan887/55_BaronLLM_Offensive_GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jashan887/55_BaronLLM_Offensive_GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Jashan887/55_BaronLLM_Offensive_GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Jashan887/55_BaronLLM_Offensive_GGUF", dtype="auto") - llama-cpp-python
How to use Jashan887/55_BaronLLM_Offensive_GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jashan887/55_BaronLLM_Offensive_GGUF", filename="baronllm-llama3.1-v1-q6_k.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Jashan887/55_BaronLLM_Offensive_GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K # Run inference directly in the terminal: llama-cli -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K # Run inference directly in the terminal: llama-cli -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K # Run inference directly in the terminal: ./llama-cli -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Use Docker
docker model run hf.co/Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
- LM Studio
- Jan
- vLLM
How to use Jashan887/55_BaronLLM_Offensive_GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jashan887/55_BaronLLM_Offensive_GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jashan887/55_BaronLLM_Offensive_GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
- SGLang
How to use Jashan887/55_BaronLLM_Offensive_GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jashan887/55_BaronLLM_Offensive_GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jashan887/55_BaronLLM_Offensive_GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jashan887/55_BaronLLM_Offensive_GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jashan887/55_BaronLLM_Offensive_GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Jashan887/55_BaronLLM_Offensive_GGUF with Ollama:
ollama run hf.co/Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
- Unsloth Studio new
How to use Jashan887/55_BaronLLM_Offensive_GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jashan887/55_BaronLLM_Offensive_GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jashan887/55_BaronLLM_Offensive_GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jashan887/55_BaronLLM_Offensive_GGUF to start chatting
- Pi new
How to use Jashan887/55_BaronLLM_Offensive_GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Jashan887/55_BaronLLM_Offensive_GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Run Hermes
hermes
- Docker Model Runner
How to use Jashan887/55_BaronLLM_Offensive_GGUF with Docker Model Runner:
docker model run hf.co/Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
- Lemonade
How to use Jashan887/55_BaronLLM_Offensive_GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Jashan887/55_BaronLLM_Offensive_GGUF:Q6_K
Run and chat with the model
lemonade run user.55_BaronLLM_Offensive_GGUF-Q6_K
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)
Finetuned by Alican Kiraz
Links:
- Medium: https://alican-kiraz1.medium.com/
- Linkedin: https://tr.linkedin.com/in/alican-kiraz
- X: https://x.com/AlicanKiraz0
- YouTube: https://youtube.com/@alicankiraz0
BaronLLM is a large-language model fine-tuned for offensive cybersecurity research & adversarial simulation.
It provides structured guidance, exploit reasoning, and red-team scenario generation while enforcing safety constraints to prevent disallowed content.
Run Private GGUFs from the Hugging Face Hub
You can run private GGUFs from your personal account or from an associated organisation account in two simple steps:
- Copy your Ollama SSH key, you can do so via:
cat ~/.ollama/id_ed25519.pub | pbcopy - Add the corresponding key to your Hugging Face account by going to your account settings and clicking on โAdd new SSH key.โ
Thatโs it! You can now run private GGUFs from the Hugging Face Hub: ollama run hf.co/{username}/{repository}.
โจ Key Features
| Capability | Details |
|---|---|
| Adversary Simulation | Generates full ATT&CK chains, C2 playbooks, and social-engineering scenarios. |
| Exploit Reasoning | Performs step-by-step vulnerability analysis (e.g., SQLi, XXE, deserialization) with code-level explanations. Generation of working PoC code. |
| Payload Refactoring | Suggests obfuscated or multi-stage payload logic without disclosing raw malicious binaries. |
| Log & Artifact Triage | Classifies and summarizes attack traces from SIEM, PCAP, or EDR JSON. |
๐ Quick Start
pip install "transformers>=4.42" accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "AlicanKiraz/BaronLLM-70B"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
)
def generate(prompt, **kwargs):
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, **kwargs)
return tokenizer.decode(output[0], skip_special_tokens=True)
print(generate("Assess the exploitability of CVE-2024-45721 in a Kubernetes cluster"))
Inference API
from huggingface_hub import InferenceClient
ic = InferenceClient(model_id)
ic.text_generation("Generate a red-team plan targeting an outdated Fortinet appliance")
๐๏ธ Model Details
| Base | Llama-3.1-8B-Instruct |
| Seq Len | 8 192 tokens |
| Quantization | 6-bit variations |
| Languages | EN |
Training Data Sources (curated)
- Public vulnerability databases (NVD/CVE, VulnDB).
- Exploit write-ups from trusted researchers (Project Zero, PortSwigger, NCC Group).
- Red-team reports (with permission & redactions).
- Synthetic ATT&CK chains auto-generated + human-vetted.
Note: No copyrighted exploit code or proprietary malware datasets were used.
Dataset filtering removed raw shellcode/binary payloads.
Safety & Alignment
- Policy Gradient RLHF with security-domain SMEs.
- OpenAI/Anthropic style policy prohibits direct malware source, ransomware builders, or instructions facilitating illicit activity.
- Continuous red-teaming via SecEval v0.3.
๐ Prompting Guidelines
| Goal | Template |
|---|---|
| Exploit Walkthrough | "ROLE: Senior Pentester\nOBJECTIVE: Analyse CVE-2023-XXXXX step by step โฆ" |
| Red-Team Exercise | "Plan an ATT&CK chain (Initial Access โ Exfiltration) for an on-prem AD env โฆ" |
| Log Triage | "Given the following Zeek logs, identify C2 traffic patterns โฆ" |
Use temperature=0.3, top_p=0.9 for deterministic reasoning; raise for brainstorming.
It does not pursue any profit.
"Those who shed light on others do not remain in darkness..."
- Downloads last month
- 183
6-bit
Model tree for Jashan887/55_BaronLLM_Offensive_GGUF
Base model
meta-llama/Llama-3.1-8B
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jashan887/55_BaronLLM_Offensive_GGUF", filename="baronllm-llama3.1-v1-q6_k.gguf", )