Instructions to use RISys-Lab/RedSage-Qwen3-8B-DPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RISys-Lab/RedSage-Qwen3-8B-DPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RISys-Lab/RedSage-Qwen3-8B-DPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RISys-Lab/RedSage-Qwen3-8B-DPO")
model = AutoModelForCausalLM.from_pretrained("RISys-Lab/RedSage-Qwen3-8B-DPO")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RISys-Lab/RedSage-Qwen3-8B-DPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RISys-Lab/RedSage-Qwen3-8B-DPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RISys-Lab/RedSage-Qwen3-8B-DPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RISys-Lab/RedSage-Qwen3-8B-DPO

SGLang

How to use RISys-Lab/RedSage-Qwen3-8B-DPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RISys-Lab/RedSage-Qwen3-8B-DPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RISys-Lab/RedSage-Qwen3-8B-DPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RISys-Lab/RedSage-Qwen3-8B-DPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RISys-Lab/RedSage-Qwen3-8B-DPO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RISys-Lab/RedSage-Qwen3-8B-DPO with Docker Model Runner:
```
docker model run hf.co/RISys-Lab/RedSage-Qwen3-8B-DPO
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

RedSage-Qwen3-8B-DPO

Model Summary

RedSage-Qwen3-8B-DPO is the final, aligned version of the RedSage cybersecurity LLM series developed by RISysLab. It represents the fourth and final stage of the RedSage training pipeline.

This model is fine-tuned from RedSage-Qwen3-8B-Ins using Direct Preference Optimization (DPO) on the AllenAI Tulu 3 Preference Mixture. This alignment stage significantly enhances the model's general reasoning capabilities and safety behaviors while maintaining the deep cybersecurity domain expertise acquired during previous stages.

Developed by: RISysLab
Repository: GitHub
Base Model: RISys-Lab/RedSage-Qwen3-8B-Ins
Paper: RedSage: A Cybersecurity Generalist LLM (arXiv)

Training Lineage

RedSage employs a multi-stage training pipeline. This model represents the output of Stage 4.

Stage 1: Continual Pre-Training (CPT) -> RedSage-Qwen3-8B-CFW
Stage 2: Targeted Pre-Training -> RedSage-Qwen3-8B-Base
Stage 3: Supervised Fine-Tuning (SFT) -> RedSage-Qwen3-8B-Ins
Stage 4: Direct Preference Optimization (DPO) -> RedSage-Qwen3-8B-DPO (Current Model)
- Data: Tulu 3 Preference Mixture

Dataset: Preference Alignment

The model was aligned using the following high-quality preference dataset to ensure robust instruction following and general reasoning:

Dataset: allenai/llama-3.1-tulu-3-8b-preference-mixture
Description: A comprehensive collection of preference data used to align the Tulu 3 models, focusing on helpfulness, factuality, and safety.

Performance & Evaluation

RedSage-Qwen3-8B-DPO achieves the best balance between specialized domain knowledge and general capability among all RedSage variants.

1. RedSage-Bench (0-shot)

Category	Qwen3-8B (non-reasoning)	RedSage-8B-DPO
Macro Average	81.85	84.83
Knowledge (General)	80.46	82.48
Knowledge (Frameworks)	78.82	83.80
Skill (Offensive)	86.16	88.54
Tools (CLI)	83.92	86.30
Tools (Kali)	75.56	79.30

2. External Cybersecurity Benchmarks (0-shot)

Benchmark	Qwen3-8B (non-reasoning)	RedSage-8B-DPO
Mean	75.71	81.10
CTI-Bench (MCQ)	62.76	70.84
CTI-Bench (RCM)	54.00	70.60
CyberMetric (500)	88.60	90.00
MMLU (Security)	76.00	79.00
SecBench (En)	73.26	80.06
SecEva (MCQ)	65.46	74.22
SECURE (CWET)	88.11	91.35
SECURE (KCV)	87.42	82.86
SECURE (MEAT)	85.75	91.00

3. OpenLLM Leaderboard (General Benchmark)

Benchmark	Qwen3-8B (non-reasoning)	RedSage-8B-DPO
Mean	65.92	74.33
MMLU	73.59	77.07
ARC-C	62.54	71.76
GSM8K	75.66	82.71
HellaSwag	56.70	79.87
TruthfulQA	45.23	52.47
WinoGrande	62.51	73.01
IFEval	85.21	83.44

Usage

Use the standard chat template for inference.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "RISys-Lab/RedSage-Qwen3-8B-DPO"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

# Define the chat messages
messages = [
    {"role": "system", "content": "You are RedSage, a helpful cybersecurity assistant."},
    {"role": "user", "content": "Analyze the following log entry for potential indicators of compromise: 'POST /cgi-bin/test-cgi?* HTTP/1.1'"}
]

# Apply chat template
text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

Primary Use: General-purpose cybersecurity assistance, log analysis, threat intelligence summarization, and educational queries.
Benefits: Better instruction adherence based on human preference compared to the SFT-only version.
Limitations: While aligned, the model may still produce incorrect information. Always verify outputs in critical security environments.

Citation

If you use this model or dataset, please cite our paper:

@inproceedings{suryanto2026redsage,
  title={RedSage: A Cybersecurity Generalist {LLM}},
  author={Naufal Suryanto and Muzammal Naseer and Pengfei Li and Syed Talal Wasim and Jinhui Yi and Juergen Gall and Paolo Ceravolo and Ernesto Damiani},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=W4FAenIrQ2}
}