Instructions to use justinj92/Delphermes-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use justinj92/Delphermes-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="justinj92/Delphermes-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("justinj92/Delphermes-8B")
model = AutoModelForCausalLM.from_pretrained("justinj92/Delphermes-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use justinj92/Delphermes-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "justinj92/Delphermes-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "justinj92/Delphermes-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/justinj92/Delphermes-8B

SGLang

How to use justinj92/Delphermes-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "justinj92/Delphermes-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "justinj92/Delphermes-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "justinj92/Delphermes-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "justinj92/Delphermes-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use justinj92/Delphermes-8B with Docker Model Runner:
```
docker model run hf.co/justinj92/Delphermes-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Delphermes-8B

This is a merged LoRA model based on Qwen/Qwen3-8B, SFT on Hermes3 + Dolphin Dataset. The model demonstrates strong performance across reasoning, mathematical problem-solving, and commonsense understanding tasks.

Model Details

Base Model: Qwen/Qwen3-8B
Language: English (en)
Library: transformers
Training Method: LoRA fine-tuning with Axolotl
Infrastructure: 8xB200 Cluster from PrimeIntellect
Training Framework: DeepSpeed Zero2

Performance

Benchmark	Score	Description
HellaSwag	88%	Commonsense reasoning and natural language inference
GSM8K	89%	Grade school math word problems
TheoryPlay	80%	Theory of mind and social reasoning tasks

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "justinj92/Delphermes-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example usage for reasoning tasks
text = "Sarah believes that her keys are in her purse, but they are actually on the kitchen table. Where will Sarah look for her keys?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    **inputs, 
    max_length=200,
    temperature=0.1,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Chat Format

This model supports the Hermes chat format:

def format_chat(messages):
    formatted = ""
    for message in messages:
        role = message["role"]
        content = message["content"]
        if role == "system":
            formatted += f"<|im_start|>system\n{content}<|im_end|>\n"
        elif role == "user":
            formatted += f"<|im_start|>user\n{content}<|im_end|>\n"
        elif role == "assistant":
            formatted += f"<|im_start|>assistant\n{content}<|im_end|>\n"
    formatted += "<|im_start|>assistant\n"
    return formatted

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Solve this math problem: A store has 45 apples. If they sell 1/3 of them in the morning and 1/5 of the remaining apples in the afternoon, how many apples are left?"}
]

prompt = format_chat(messages)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Framework: Axolotl with DeepSpeed Zero2 optimization
Hardware: 8x NVIDIA B200 GPUs (PrimeIntellect cluster)
Base Model: Qwen/Qwen3-8B
Training Method: Low-Rank Adaptation (LoRA)
Dataset: NousResearch/Hermes-3-Dataset + QuixiAI/dolphin
Training Duration: 28 hours
Learning Rate: 0.0004
Batch Size: 8
Sequence Length: 4096

Evaluation Methodology

All evaluations were conducted using:

HellaSwag: Standard validation set with 4-way multiple choice accuracy
GSM8K: Test set with exact match accuracy on final numerical answers
TheoryPlay: Validation set with accuracy on theory of mind reasoning tasks

Limitations

The model may still struggle with very complex mathematical proofs
Performance on non-English languages may be limited
May occasionally generate inconsistent responses in edge cases
Training data cutoff affects knowledge of recent events

Ethical Considerations

This model has been trained on curated datasets and should be used responsibly. Users should:

Verify important information from the model
Be aware of potential biases in training data
Use appropriate content filtering for production applications

Citation

@misc{Delphermes-8B,
  title={Delphermes-8B: A Fine-tuned Language Model for Reasoning Tasks},
  author={[Your Name]},
  year={2025},
  url={https://huggingface.co/justinj92/Delphermes-8B}
}