Instructions to use TheDrummer/Behemoth-R1-123B-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TheDrummer/Behemoth-R1-123B-v2-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TheDrummer/Behemoth-R1-123B-v2-GGUF",
	filename="Behemoth-R1-123B-v2d-Q4_K_M-00001-of-00002.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use TheDrummer/Behemoth-R1-123B-v2-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M

Use Docker

docker model run hf.co/TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use TheDrummer/Behemoth-R1-123B-v2-GGUF with Ollama:
```
ollama run hf.co/TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M
```

Unsloth Studio new

How to use TheDrummer/Behemoth-R1-123B-v2-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheDrummer/Behemoth-R1-123B-v2-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheDrummer/Behemoth-R1-123B-v2-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TheDrummer/Behemoth-R1-123B-v2-GGUF to start chatting

Docker Model Runner
How to use TheDrummer/Behemoth-R1-123B-v2-GGUF with Docker Model Runner:
```
docker model run hf.co/TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M
```

Lemonade

How to use TheDrummer/Behemoth-R1-123B-v2-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TheDrummer/Behemoth-R1-123B-v2-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Behemoth-R1-123B-v2-GGUF-Q4_K_M

List all available models

lemonade list

BeaverAI

Nearly 7000 members strong 💪 A hub for users and makers alike!

Drummer is open for work / employment (I'm a Software Engineer). Contact me through any of these channels: https://linktr.ee/thelocaldrummer

Thank you to everyone who subscribed through Patreon. Your suppprt helps me chug along in this brave new world.

Drummer proudly presents...

Behemoth R1 123B v2 🦣

Usage

Mistral v7 (Non-Tekken) + (i.e., Mistral v3 + [SYSTEM_PROMPT])
Warning: Using the wrong version / whitespacing may deteriorate performance.
Prefill <think> to ensure reasoning (and test your patience).
You can slightly steer the thinking by prefixing the think tag (e.g., <immoral_think>).
Works great even without reasoning.

Rationale for Reasoning

Hear me out for a second. I know it's crazy to have a 123B dense model spend precious output tokens to reason for some time, but if you're a fan of Largestral, then consider the following below...

Sometimes, you'd want to leave the character responses untouched. Reasoning divides the AI response into two phases: planning & execution. It gives you the opportunity to 'modify' the planning phase without messing with the character's execution.

The planning phase will also pick apart the scenario, break down nuances, and surface implicit story elements. If it's erroneous, then you have a chance to correct the AI before the execution phase. If it's missing details, then you can wrangle it during the planning phase and watch it unfold in the execution phase.

Nutshell: Reasoning adds another useful dimension for these creative uses.

Description

As far as I see, this doesn't even feel like Behemoth. It's something way better. It's the top 3 you've ever made. This is a solid cook my man.

Characters in particular are portrayed so much better and more authentically, which was Largestral's biggest problem. Dialogue is much improved, and the smarts 2411 had have been retained quite well. Its prose has changed for the better without the overconfidence in base.

This is so much better than any other 2411 tune I've tried tbh. It's doing quite well on adherence.

After a few messages, the model gets pretty smart. In fact, so smart that it tries to analyze why I want to do some particular RP. The model is getting better with a nasty prefill.

This model continues to surprise and impress me. It's really exactly what I wanted Largestral 2411 to be. I cannot overstate how much better it is than the base and any other tune of it. From what I remember, it actually feels as good as Nemotron Ultra..

Yes, super intelligent, and something about it makes characters have much more texture and personality than other models.

Model tree for TheDrummer/Behemoth-R1-123B-v2-GGUF

Base model

mistralai/Mistral-Large-Instruct-2411

Quantized

(33)

this model

TheDrummer
/

Behemoth-R1-123B-v2-GGUF