Instructions to use TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B")
model = AutoModelForCausalLM.from_pretrained("TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B

SGLang

How to use TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B with Docker Model Runner:
```
docker model run hf.co/TareksTesting/Dungeonmaster-Expanded-R1-LLaMa-70B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Dungeonmaster is meant to be specifically for creative roleplays with stakes and consequences using the following curated models:

Dungeonmaster expanded features 2 extra models, bringing the total up to 7! Admittedly I was concerned about that many models in one single merge. But you never know, so I decided to try both and see...

NB: I think the reasoning got too diluted, it works well as a normal model, but 'thinking' doesn't seem to work.

My ideal vision for Dungeonmaster were these 7 models.

LatitudeGames/Wayfarer-Large-70B-Llama-3.3 - A fine-tuned model specifically designed for this very application.
ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3 - Another fine-tune trained on RP datasets.
Sao10K/70B-L3.3-mhnnn-x1 - For some extra creativity
TheDrummer/Anubis-70B-v1 - Another excellent RP fine-tune.
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1 - For it's strong descriptive writing.
SicariusSicariiStuff/Negative_LLAMA_70B - To assist with the darker undertones.
TheDrummer/Fallen-Llama-3.3-R1-70B-v1 - The secret sauce, a completely unhinged thinking model that turns things up to 11.

Mergekit

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Linear DELLA merge method using TareksLab/Genesis-R1-L3.3-70B as a base.

Models Merged

The following models were included in the merge:

ArliAI/Llama-3.3-70B-ArliAI-RPMax-v1.4
SicariusSicariiStuff/Negative_LLAMA_70B
LatitudeGames/Wayfarer-Large-70B-Llama-3.3
TheDrummer/Anubis-70B-v1
TheDrummer/Fallen-Llama-3.3-R1-70B-v1
TareksLab/Genesis-R1-L3.3-70B
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: LatitudeGames/Wayfarer-Large-70B-Llama-3.3
  - model: ArliAI/Llama-3.3-70B-ArliAI-RPMax-v1.4
  - model: Sao10K/70B-L3.3-mhnnn-x1
  - model: TheDrummer/Anubis-70B-v1
  - model: EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
  - model: SicariusSicariiStuff/Negative_LLAMA_70B
  - model: TheDrummer/Fallen-Llama-3.3-R1-70B-v1
merge_method: della_linear
chat_template: llama3
base_model: TareksLab/Genesis-R1-L3.3-70B
parameters:
  weight: 0.14
  density: 0.7
  epsilon: 0.2
  lambda: 1.1
  normalize: true
dtype: bfloat16
tokenizer:
 source: TareksLab/Genesis-R1-L3.3-70B