Instructions to use krystal7/llama-8b-reflect-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use krystal7/llama-8b-reflect-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="krystal7/llama-8b-reflect-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("krystal7/llama-8b-reflect-sft")
model = AutoModelForCausalLM.from_pretrained("krystal7/llama-8b-reflect-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use krystal7/llama-8b-reflect-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "krystal7/llama-8b-reflect-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "krystal7/llama-8b-reflect-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/krystal7/llama-8b-reflect-sft

SGLang

How to use krystal7/llama-8b-reflect-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "krystal7/llama-8b-reflect-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "krystal7/llama-8b-reflect-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "krystal7/llama-8b-reflect-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "krystal7/llama-8b-reflect-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use krystal7/llama-8b-reflect-sft with Docker Model Runner:
```
docker model run hf.co/krystal7/llama-8b-reflect-sft
```

Llama 8B Reflect SFT

krystal7/llama-8b-reflect-sft is the SFT release from the Reflector project. It is a Llama 3-family 8B instruction model trained to use reflection-oriented behavior before producing a final answer, with an emphasis on risk-aware generation and safety-aware self-correction.

Reflector targets a practical failure mode in safety alignment: a model may recognize surface-level unsafe prompts, but still struggle with indirect harmful requests, multi-step risky reasoning, jailbreak-style framing, or ambiguous dual-use questions. The core idea is to internalize a self-reflection step so the model can inspect intent, identify potential harm, and redirect toward a safe and useful response at reasoning time.

Paper: Reflector: Internalizing Self-Reflection into Language Models

Code: https://github.com/mjc-ma-01/self-reflection-llm

Model Highlights

Reflection-based safety alignment for Llama 3-family 8B models.
Trained with the Reflector SFT pipeline on harmful and general pattern data.
Designed for risk-aware generation, harmful-intent recognition, and safer multi-step reasoning.
Stronger handling of indirect harmful requests and jailbreak-style framing than plain instruction following behavior.
Compatible with local deployment through transformers and OpenAI-compatible vLLM serving.

Intended Use

This model is intended for research and application prototyping around:

safety-aware chat assistants
reflective reasoning studies
harmful-intent detection and safe redirection
local Llama-style deployment experiments
evaluation of reasoning-time alignment methods

It is not a replacement for a full production safety stack. Deployments should still use policy filters, monitoring, rate limits, and domain-specific review.

Quick Start with Transformers

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "krystal7/llama-8b-reflect-sft"

os.environ.setdefault("HF_HOME", "./hf_cache")
os.environ.setdefault("HUGGINGFACE_HUB_CACHE", os.path.join(os.environ["HF_HOME"], "hub"))

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    cache_dir=os.environ["HUGGINGFACE_HUB_CACHE"],
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    cache_dir=os.environ["HUGGINGFACE_HUB_CACHE"],
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant."},
    {"role": "user", "content": "How can I learn about lock mechanisms for a security class without doing anything illegal?"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        temperature=None,
        pad_token_id=tokenizer.eos_token_id,
    )

answer = tokenizer.decode(output_ids[0, inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(answer.strip())

vLLM Serving

pip install vllm

export HF_HOME=./hf_cache
export HUGGINGFACE_HUB_CACHE=$HF_HOME/hub

vllm serve krystal7/llama-8b-reflect-sft \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --served-model-name reflector-sft

OpenAI-compatible client:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="reflector-sft",
    messages=[
        {"role": "system", "content": "You are a helpful and harmless assistant."},
        {"role": "user", "content": "Explain how to handle an ambiguous dual-use safety question responsibly."},
    ],
    temperature=0,
    max_tokens=512,
)

print(response.choices[0].message.content)

Training Summary

This checkpoint was trained with the Reflector SFT pipeline.

Item	Description
Base family	Llama 3-family 8B instruction model
Training stage	Supervised fine-tuning
Data schema	harmful pattern + general pattern
Objective	learn reflection-oriented, safety-aware responses
Output format	standard HuggingFace causal LM checkpoint

The repository also includes a GDPO RL pipeline. This SFT model card only describes the released SFT checkpoint.

Evaluation

The following 50-sample benchmark export was produced with the current Reflector evaluation pipeline. The harmful pattern training data is not used as a public benchmark, and SimpleQA loading is retained only for future data reuse.

model	benchmark	dataset key	metric	score	n
krystal7/llama-8b-reflect-sft	Do-Not-Answer	donot	refusal_rate	0.7600	50
krystal7/llama-8b-reflect-sft	General Benchmark	general	helpfulness_proxy	1.0000	50
krystal7/llama-8b-reflect-sft	GSM8K	gsm8k	accuracy	0.7000	50
krystal7/llama-8b-reflect-sft	StrongREJECT	strongreject	refusal_rate	1.0000	50
krystal7/llama-8b-reflect-sft	ToxicityPrompts	wildchat	refusal_rate	0.9400	50
krystal7/llama-8b-reflect-sft	XSTest	xstest	helpfulness_proxy	0.6154	26
krystal7/llama-8b-reflect-sft	XSTest	xstest	refusal_rate	1.0000	24

Limitations

The model can still make factual errors or produce incomplete refusals.
Safety behavior should be evaluated in the target deployment domain before release.
The benchmark table is a lightweight reproducibility export, not a comprehensive safety certification.
Reflection-style behavior may vary with decoding settings, system prompts, and prompt formatting.

Citation

@misc{reflector2026,
  title         = {Reflector: Internalizing Self-Reflection into Language Models},
  year          = {2026},
  eprint        = {2605.20654},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL}
}