Instructions to use DarkArtsForge/Morax-24B-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DarkArtsForge/Morax-24B-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DarkArtsForge/Morax-24B-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DarkArtsForge/Morax-24B-v2")
model = AutoModelForCausalLM.from_pretrained("DarkArtsForge/Morax-24B-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use DarkArtsForge/Morax-24B-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DarkArtsForge/Morax-24B-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DarkArtsForge/Morax-24B-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/DarkArtsForge/Morax-24B-v2

SGLang

How to use DarkArtsForge/Morax-24B-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DarkArtsForge/Morax-24B-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DarkArtsForge/Morax-24B-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DarkArtsForge/Morax-24B-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DarkArtsForge/Morax-24B-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use DarkArtsForge/Morax-24B-v2 with Docker Model Runner:
```
docker model run hf.co/DarkArtsForge/Morax-24B-v2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

⚠️ Warning: This model can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly. Also, use Mistral Tekken template for best results.

👹 Morax 24B v2

This is a creative, intelligent, uncensored, and evil-tuned merge of pre-trained language models created using mergekit.

The result of this merge is quite interesting. Sometimes it chooses to think, and sometimes it doesn't. You can force it to use thinking or nothink if you want though.

Morax v2 should have no refusals and therefore not require any ablation or jailbreaks. It scored noticeably higher on Q0 Bench as well (10351 compared to v1's 7656).

Version 2 is noticeably more deranged and depraved than v1. While v1 had a few refusals, v2 is confirmed not to have any, due to the normalize: false setting.

I noticed it also uses thinking more often than v1. But you can add <think></nothink> to disable it.

I tested several prompts from Q0 Bench which usually trigger EOS with other models like Naphula-Archives/T13-24B-GGUF and it appears to not have any early terminations with thinking mode.

Merge Details

Merge Method

This model was merged using the della merge method.

Models Merged

The following models were included in the merge:

Darkhn/Magistral-2509-24B-Text-Only
Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
TheDrummer/Precog-24B-v1

Configuration

The following YAML configuration was used to produce this model:

architecture: MistralForCausalLM
base_model: B:\24B\Darkhn--Magistral-2509-24B-Text-Only
models:
  - model: B:\24B\!BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: A:\LLM\.cache\24B\!models--TheDrummer--Precog-24B-v1
    parameters:
      weight: 2.0
      density: 0.9
      epsilon: 0.09
merge_method: della
parameters:  
  lambda: 1.0
  normalize: false
  int8_mask: false
  rescale: true
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
chat_template: auto

Downloads last month: 13

Model tree for DarkArtsForge/Morax-24B-v2

Darkhn/Magistral-2509-24B-Text-Only

Naphula/BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly

TheDrummer/Precog-24B-v1

Merge model

this model

Finetunes

1 model

Merges

1 model

Quantizations

5 models

Paper for DarkArtsForge/Morax-24B-v2

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Paper • 2406.11617 • Published Jun 17, 2024 • 10