Instructions to use efops/marziel-8b-custom with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use efops/marziel-8b-custom with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("efops/marziel-8b-custom")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

llama-cpp-python

How to use efops/marziel-8b-custom with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="efops/marziel-8b-custom",
	filename="marziel-v6-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use efops/marziel-8b-custom with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf efops/marziel-8b-custom:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf efops/marziel-8b-custom:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf efops/marziel-8b-custom:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf efops/marziel-8b-custom:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf efops/marziel-8b-custom:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf efops/marziel-8b-custom:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf efops/marziel-8b-custom:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf efops/marziel-8b-custom:Q4_K_M

Use Docker

docker model run hf.co/efops/marziel-8b-custom:Q4_K_M

LM Studio
Jan

vLLM

How to use efops/marziel-8b-custom with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "efops/marziel-8b-custom"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "efops/marziel-8b-custom",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/efops/marziel-8b-custom:Q4_K_M

Ollama
How to use efops/marziel-8b-custom with Ollama:
```
ollama run hf.co/efops/marziel-8b-custom:Q4_K_M
```

Unsloth Studio new

How to use efops/marziel-8b-custom with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for efops/marziel-8b-custom to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for efops/marziel-8b-custom to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for efops/marziel-8b-custom to start chatting

Pi new

How to use efops/marziel-8b-custom with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "efops/marziel-8b-custom"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "efops/marziel-8b-custom"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use efops/marziel-8b-custom with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "efops/marziel-8b-custom"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default efops/marziel-8b-custom

Run Hermes

hermes

MLX LM

How to use efops/marziel-8b-custom with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "efops/marziel-8b-custom"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "efops/marziel-8b-custom"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "efops/marziel-8b-custom",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use efops/marziel-8b-custom with Docker Model Runner:
```
docker model run hf.co/efops/marziel-8b-custom:Q4_K_M
```

Lemonade

How to use efops/marziel-8b-custom with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull efops/marziel-8b-custom:Q4_K_M

Run and chat with the model

lemonade run user.marziel-8b-custom-Q4_K_M

List all available models

lemonade list

marziel-8b-custom

Commit History

docs: update model card for Marziel OS v1.0.1

58dbcad
verified

efops commited on Mar 29

docs: update model card for Marziel OS v1.0.1 (take 4)

e0bedd5
verified

efops commited on Mar 29

docs: update model card for Marziel OS v1.0.1

aa2c070
verified

efops commited on Mar 29

Update README.md

143cb29
verified

efops commited on Mar 27

revert accidental readme overwrite and insert v0.9.2 engine notes

eee5727

Efkan commited on Mar 27

Upload README.md with huggingface_hub

0bad2c0
verified

efops commited on Mar 27

Update model card for v0.9.0 with Hybrid RAG and TurboQuant KV cache details

186270e
verified

efops commited on Mar 26

Delete marziel-8b-custom.gguf with huggingface_hub

9c9e653
verified

efops commited on Mar 25

Upload README.md with huggingface_hub

9713b86
verified

efops commited on Mar 25

Upload folder using huggingface_hub

a2619c2
verified

efops commited on Mar 25

Upload marziel-v6-Q4_K_M.gguf with huggingface_hub

161e2b7
verified

efops commited on Mar 25

Upload README.md with huggingface_hub

8ff59da
verified

efops commited on Mar 25

Remove old Llama model files

3206819
verified

efops commited on Mar 21

Remove old Llama model files

53c1ba9
verified

efops commited on Mar 21

Remove old Llama model files

caa3812
verified

efops commited on Mar 21

Remove old Llama model files

2441080
verified

efops commited on Mar 21

Remove old Llama model files

e951f67
verified

efops commited on Mar 21

Remove old Llama model files

147871c
verified

efops commited on Mar 21

Remove old Llama model files

e7ede0a
verified

efops commited on Mar 21

Remove old Llama model files

38d141f
verified

efops commited on Mar 21

Remove old Llama model files

11c052e
verified

efops commited on Mar 21

Remove old Llama model files

746d664
verified

efops commited on Mar 21

v0.7.0: MLX 4-bit model for Apple Silicon (4.2GB, 4.5 bpw)

b2ba3a8
verified

efops commited on Mar 20

Remove old chat_template.jinja — replaced by new v0.7.0 model

2616cd2
verified

efops commited on Mar 20

Remove old recipe.yaml — replaced by new v0.7.0 model

6e54159
verified

efops commited on Mar 20

Remove old model.safetensors — replaced by new v0.7.0 model

2455e45
verified

efops commited on Mar 20

v0.7.0: Update model card

9c8caa9
verified

efops commited on Mar 20

v0.7.0: Merged fp16 safetensors — v5 model

d786cbc
verified

efops commited on Mar 20

v0.7.0: GGUF Q4_K_M — v5 model with new capabilities

14c3e01
verified

efops commited on Mar 20

v0.6.0: auto-install semantic router

d9b73ad
verified

efops commited on Mar 16

Restore original fused float16 model.safetensors for MLX (4.2GB)

0f60d3e
verified

efops commited on Mar 16

Remove GPTQ model.safetensors.index.json

3fba9c1
verified

efops commited on Mar 16

Remove GPTQ model-00002-of-00002.safetensors

5c95498
verified

efops commited on Mar 16

Remove GPTQ model-00001-of-00002.safetensors

9ab0e79
verified

efops commited on Mar 16

v0.5.9: semantic intent routing

9cd2a84
verified

efops commited on Mar 15

v0.5.8: 3-tier inference (MLX/vLLM/llama.cpp)

be9c653
verified

efops commited on Mar 15

v0.5.8: GPTQ W4A16 quantized model for vLLM CPU (~4GB)

6f37080
verified

efops commited on Mar 15

Delete model.safetensors.index.json with huggingface_hub

3ccfa93
verified

efops commited on Mar 15

Delete model-00004-of-00004.safetensors with huggingface_hub

7829d1c
verified

efops commited on Mar 15

Delete model-00003-of-00004.safetensors with huggingface_hub

8a3dca8
verified

efops commited on Mar 15

Delete model-00002-of-00004.safetensors with huggingface_hub

82c8c01
verified

efops commited on Mar 15

Delete model-00001-of-00004.safetensors with huggingface_hub

2a6918f
verified

efops commited on Mar 15

Delete model.safetensors with huggingface_hub

a356bcf
verified

efops commited on Mar 15

v0.5.8: Replace MLX-quantized with proper dequantized safetensors for llm-compressor

a692ea7
verified

efops commited on Mar 15

Fix config.json: remove invalid GGML quantization fields

37ba332
verified

efops commited on Mar 15

v0.5.7

49979de
verified

efops commited on Mar 15

Fix tokenizer_class: TokenizersBackend → PreTrainedTokenizerFast

47fef26
verified

efops commited on Mar 15

v0.5.6

5385906
verified

efops commited on Mar 15

v0.5.5

c4e795f
verified

efops commited on Mar 15

v0.5.4: vLLM CPU pre-built wheel, bfloat16, TCMalloc

6996b5a
verified

efops commited on Mar 15

Commit History

docs: update model card for Marziel OS v1.0.1 58dbcad verified

docs: update model card for Marziel OS v1.0.1 (take 4) e0bedd5 verified

docs: update model card for Marziel OS v1.0.1 aa2c070 verified

Update README.md 143cb29 verified

revert accidental readme overwrite and insert v0.9.2 engine notes eee5727

Upload README.md with huggingface_hub 0bad2c0 verified

Update model card for v0.9.0 with Hybrid RAG and TurboQuant KV cache details 186270e verified

Delete marziel-8b-custom.gguf with huggingface_hub 9c9e653 verified

Upload README.md with huggingface_hub 9713b86 verified

Upload folder using huggingface_hub a2619c2 verified

Upload marziel-v6-Q4_K_M.gguf with huggingface_hub 161e2b7 verified

Upload README.md with huggingface_hub 8ff59da verified

Remove old Llama model files 3206819 verified

Remove old Llama model files 53c1ba9 verified

Remove old Llama model files caa3812 verified

Remove old Llama model files 2441080 verified

Remove old Llama model files e951f67 verified

Remove old Llama model files 147871c verified

Remove old Llama model files e7ede0a verified

Remove old Llama model files 38d141f verified

Remove old Llama model files 11c052e verified

Remove old Llama model files 746d664 verified

v0.7.0: MLX 4-bit model for Apple Silicon (4.2GB, 4.5 bpw) b2ba3a8 verified

Remove old chat_template.jinja — replaced by new v0.7.0 model 2616cd2 verified

Remove old recipe.yaml — replaced by new v0.7.0 model 6e54159 verified

Remove old model.safetensors — replaced by new v0.7.0 model 2455e45 verified

v0.7.0: Update model card 9c8caa9 verified

v0.7.0: Merged fp16 safetensors — v5 model d786cbc verified

v0.7.0: GGUF Q4_K_M — v5 model with new capabilities 14c3e01 verified

v0.6.0: auto-install semantic router d9b73ad verified

Restore original fused float16 model.safetensors for MLX (4.2GB) 0f60d3e verified

Remove GPTQ model.safetensors.index.json 3fba9c1 verified

Remove GPTQ model-00002-of-00002.safetensors 5c95498 verified

Remove GPTQ model-00001-of-00002.safetensors 9ab0e79 verified

v0.5.9: semantic intent routing 9cd2a84 verified

v0.5.8: 3-tier inference (MLX/vLLM/llama.cpp) be9c653 verified

v0.5.8: GPTQ W4A16 quantized model for vLLM CPU (~4GB) 6f37080 verified

Delete model.safetensors.index.json with huggingface_hub 3ccfa93 verified

Delete model-00004-of-00004.safetensors with huggingface_hub 7829d1c verified

Delete model-00003-of-00004.safetensors with huggingface_hub 8a3dca8 verified

Delete model-00002-of-00004.safetensors with huggingface_hub 82c8c01 verified

Delete model-00001-of-00004.safetensors with huggingface_hub 2a6918f verified

Delete model.safetensors with huggingface_hub a356bcf verified

v0.5.8: Replace MLX-quantized with proper dequantized safetensors for llm-compressor a692ea7 verified

Fix config.json: remove invalid GGML quantization fields 37ba332 verified

v0.5.7 49979de verified

Fix tokenizer_class: TokenizersBackend → PreTrainedTokenizerFast 47fef26 verified

v0.5.6 5385906 verified

v0.5.5 c4e795f verified

v0.5.4: vLLM CPU pre-built wheel, bfloat16, TCMalloc 6996b5a verified

docs: update model card for Marziel OS v1.0.1

58dbcad
verified

docs: update model card for Marziel OS v1.0.1 (take 4)

e0bedd5
verified

docs: update model card for Marziel OS v1.0.1

aa2c070
verified

Update README.md

143cb29
verified

revert accidental readme overwrite and insert v0.9.2 engine notes

eee5727

Upload README.md with huggingface_hub

0bad2c0
verified

Update model card for v0.9.0 with Hybrid RAG and TurboQuant KV cache details

186270e
verified

Delete marziel-8b-custom.gguf with huggingface_hub

9c9e653
verified

Upload README.md with huggingface_hub

9713b86
verified

Upload folder using huggingface_hub

a2619c2
verified

Upload marziel-v6-Q4_K_M.gguf with huggingface_hub

161e2b7
verified

Upload README.md with huggingface_hub

8ff59da
verified

Remove old Llama model files

3206819
verified

Remove old Llama model files

53c1ba9
verified

Remove old Llama model files

caa3812
verified

Remove old Llama model files

2441080
verified

Remove old Llama model files

e951f67
verified

Remove old Llama model files

147871c
verified

Remove old Llama model files

e7ede0a
verified

Remove old Llama model files

38d141f
verified

Remove old Llama model files

11c052e
verified

Remove old Llama model files

746d664
verified

v0.7.0: MLX 4-bit model for Apple Silicon (4.2GB, 4.5 bpw)

b2ba3a8
verified

Remove old chat_template.jinja — replaced by new v0.7.0 model

2616cd2
verified

Remove old recipe.yaml — replaced by new v0.7.0 model

6e54159
verified

Remove old model.safetensors — replaced by new v0.7.0 model

2455e45
verified

v0.7.0: Update model card

9c8caa9
verified

v0.7.0: Merged fp16 safetensors — v5 model

d786cbc
verified

v0.7.0: GGUF Q4_K_M — v5 model with new capabilities

14c3e01
verified

v0.6.0: auto-install semantic router

d9b73ad
verified

Restore original fused float16 model.safetensors for MLX (4.2GB)

0f60d3e
verified

Remove GPTQ model.safetensors.index.json

3fba9c1
verified

Remove GPTQ model-00002-of-00002.safetensors

5c95498
verified

Remove GPTQ model-00001-of-00002.safetensors

9ab0e79
verified

v0.5.9: semantic intent routing

9cd2a84
verified

v0.5.8: 3-tier inference (MLX/vLLM/llama.cpp)

be9c653
verified

v0.5.8: GPTQ W4A16 quantized model for vLLM CPU (~4GB)

6f37080
verified

Delete model.safetensors.index.json with huggingface_hub

3ccfa93
verified

Delete model-00004-of-00004.safetensors with huggingface_hub

7829d1c
verified

Delete model-00003-of-00004.safetensors with huggingface_hub

8a3dca8
verified

Delete model-00002-of-00004.safetensors with huggingface_hub

82c8c01
verified

Delete model-00001-of-00004.safetensors with huggingface_hub

2a6918f
verified

Delete model.safetensors with huggingface_hub

a356bcf
verified

v0.5.8: Replace MLX-quantized with proper dequantized safetensors for llm-compressor

a692ea7
verified

Fix config.json: remove invalid GGML quantization fields

37ba332
verified

v0.5.7

49979de
verified

Fix tokenizer_class: TokenizersBackend → PreTrainedTokenizerFast

47fef26
verified

v0.5.6

5385906
verified

v0.5.5

c4e795f
verified

v0.5.4: vLLM CPU pre-built wheel, bfloat16, TCMalloc

6996b5a
verified