Instructions to use MainStack/marvy-1-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MainStack/marvy-1-14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MainStack/marvy-1-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MainStack/marvy-1-14B")
model = AutoModelForCausalLM.from_pretrained("MainStack/marvy-1-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use MainStack/marvy-1-14B with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("MainStack/marvy-1-14B")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Inference
Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use MainStack/marvy-1-14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MainStack/marvy-1-14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MainStack/marvy-1-14B

SGLang

How to use MainStack/marvy-1-14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MainStack/marvy-1-14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MainStack/marvy-1-14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

How to use MainStack/marvy-1-14B with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "MainStack/marvy-1-14B"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MainStack/marvy-1-14B"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MainStack/marvy-1-14B with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "MainStack/marvy-1-14B"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MainStack/marvy-1-14B

Run Hermes

hermes

MLX LM

How to use MainStack/marvy-1-14B with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "MainStack/marvy-1-14B"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "MainStack/marvy-1-14B"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "MainStack/marvy-1-14B",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use MainStack/marvy-1-14B with Docker Model Runner:
```
docker model run hf.co/MainStack/marvy-1-14B
```

tgetsov commited on about 20 hours ago

Commit

1bbfd88

verified ·

1 Parent(s): a2dd49b

Upload USAGE.md with huggingface_hub

Browse files

Files changed (1) hide show

USAGE.md +23 -23

USAGE.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# Using marvy-14B
-marvy-14B is a ServiceNow delivery specialist. This guide covers every common
 way to run it — cloud or fully local — plus how to wire it into OpenCode.
 - [Choosing a format](#choosing-a-format)
@@ -11,7 +11,7 @@ way to run it — cloud or fully local — plus how to wire it into OpenCode.
 - [LM Studio (GUI + local server)](#lm-studio-gui--local-server)
 - [Ollama / llama.cpp (GGUF)](#ollama--llamacpp-gguf)
 - [LoRA adapter (apply on the base)](#lora-adapter-apply-on-the-base)
-- [Use marvy-14B in OpenCode](#use-marvy-14b-in-opencode)
 - [Prompt recipes per task](#prompt-recipes-per-task)
 ---
@@ -20,10 +20,10 @@ way to run it — cloud or fully local — plus how to wire it into OpenCode.
 | You want… | Use | Repo |
 |---|---|---|
-| Max quality, GPU/server | Merged FP16 | `MainStack/marvy-14B` |
-| Apple Silicon, native speed | Merged (MLX) | `MainStack/marvy-14B` |
-| Laptop / CPU / Ollama / LM Studio | GGUF (Q4_K_M or Q8_0) | `MainStack/marvy-14B-GGUF` |
-| Smallest download, compose yourself | LoRA adapter (~175 MB) | `MainStack/marvy-14B-lora` |
 ---
@@ -52,7 +52,7 @@ professional English.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
-model_id = "MainStack/marvy-14B"
 tok = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
@@ -70,12 +70,12 @@ print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
 ```bash
 pip install vllm
-vllm serve MainStack/marvy-14B --served-model-name marvy-14B
 ```
 ```bash
 curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
-  "model": "marvy-14B", "temperature": 0.4,
   "messages": [
     {"role":"system","content":"You are a senior ServiceNow delivery consultant. ..."},
     {"role":"user","content":"Draft the Incident Management section of an SDD."}
@@ -88,22 +88,22 @@ curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/
 pip install mlx-lm
 # one-off
-python -m mlx_lm generate --model MainStack/marvy-14B \
   --system-prompt "You are a senior ServiceNow delivery consultant. ..." \
   --prompt "Write test cases for a Major Incident workflow." --max-tokens 1024 --temp 0.4
 # OpenAI-compatible server
-python -m mlx_lm server --model MainStack/marvy-14B --port 8080
 ```
 ## LM Studio (GUI + local server)
-1. **Install the model** — either search `MainStack/marvy-14B-GGUF` in the
    in-app model browser, or place a local copy under
-   `~/.lmstudio/models/MainStack/marvy-14B/` (MLX or GGUF layout).
 2. **Load** it from the GUI, or:
    ```bash
-   lms load MainStack/marvy-14B
    lms server start          # OpenAI-compatible on http://localhost:1234/v1
    ```
 3. In the Chat tab, set the system prompt (above) and temperature ~0.4.
@@ -112,10 +112,10 @@ python -m mlx_lm server --model MainStack/marvy-14B --port 8080
 ```bash
 # Ollama — pull straight from the Hub
-ollama run hf.co/MainStack/marvy-14B-GGUF:Q4_K_M
 # llama.cpp
-llama-cli -hf MainStack/marvy-14B-GGUF:Q4_K_M \
   -p "Write a user story with acceptance criteria for P1 SLA escalation." --temp 0.4
 ```
@@ -139,19 +139,19 @@ from peft import PeftModel
 from transformers import AutoModelForCausalLM, AutoTokenizer
 base = "Qwen/Qwen2.5-14B-Instruct"
 model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
-model = PeftModel.from_pretrained(model, "MainStack/marvy-14B-lora")
 ```
 ---
-## Use marvy-14B in OpenCode
 marvy runs behind any OpenAI-compatible endpoint (LM Studio, mlx_lm server,
 vLLM). Register it as a custom provider in `opencode.json`.
 1. **Start a local server** (LM Studio shown; adjust port for others):
    ```bash
-   lms load MainStack/marvy-14B && lms server start   # http://localhost:1234/v1
    ```
 2. **Add the provider** to your project `opencode.json` (or global
    `~/.config/opencode/opencode.json`):
@@ -163,15 +163,15 @@ vLLM). Register it as a custom provider in `opencode.json`.
          "name": "LM Studio (local)",
          "options": { "baseURL": "http://localhost:1234/v1" },
          "models": {
-           "marvy-14B": { "name": "marvy-14B (ServiceNow delivery)" }
          }
        }
      }
    }
    ```
-3. **Select** `lmstudio/marvy-14B` in the OpenCode model picker.
-> marvy-14B is a drafting specialist, not a tool-use/agentic fine-tune. It excels
 > at producing delivery artifacts inside chat; for MCP tool-calling agent loops,
 > keep a frontier model as the orchestrator and switch to marvy for drafting.

+# Using marvy-1-14B
+marvy-1-14B is a ServiceNow delivery specialist. This guide covers every common
 way to run it — cloud or fully local — plus how to wire it into OpenCode.
 - [Choosing a format](#choosing-a-format)
 - [LM Studio (GUI + local server)](#lm-studio-gui--local-server)
 - [Ollama / llama.cpp (GGUF)](#ollama--llamacpp-gguf)
 - [LoRA adapter (apply on the base)](#lora-adapter-apply-on-the-base)
+- [Use marvy-1-14B in OpenCode](#use-marvy-14b-in-opencode)
 - [Prompt recipes per task](#prompt-recipes-per-task)
 ---
 | You want… | Use | Repo |
 |---|---|---|
+| Max quality, GPU/server | Merged FP16 | `MainStack/marvy-1-14B` |
+| Apple Silicon, native speed | Merged (MLX) | `MainStack/marvy-1-14B` |
+| Laptop / CPU / Ollama / LM Studio | GGUF (Q4_K_M or Q8_0) | `MainStack/marvy-1-14B-GGUF` |
+| Smallest download, compose yourself | LoRA adapter (~175 MB) | `MainStack/marvy-1-14B-lora` |
 ---
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "MainStack/marvy-1-14B"
 tok = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
 ```bash
 pip install vllm
+vllm serve MainStack/marvy-1-14B --served-model-name marvy-1-14B
 ```
 ```bash
 curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
+  "model": "marvy-1-14B", "temperature": 0.4,
   "messages": [
     {"role":"system","content":"You are a senior ServiceNow delivery consultant. ..."},
     {"role":"user","content":"Draft the Incident Management section of an SDD."}
 pip install mlx-lm
 # one-off
+python -m mlx_lm generate --model MainStack/marvy-1-14B \
   --system-prompt "You are a senior ServiceNow delivery consultant. ..." \
   --prompt "Write test cases for a Major Incident workflow." --max-tokens 1024 --temp 0.4
 # OpenAI-compatible server
+python -m mlx_lm server --model MainStack/marvy-1-14B --port 8080
 ```
 ## LM Studio (GUI + local server)
+1. **Install the model** — either search `MainStack/marvy-1-14B-GGUF` in the
    in-app model browser, or place a local copy under
+   `~/.lmstudio/models/MainStack/marvy-1-14B/` (MLX or GGUF layout).
 2. **Load** it from the GUI, or:
    ```bash
+   lms load MainStack/marvy-1-14B
    lms server start          # OpenAI-compatible on http://localhost:1234/v1
    ```
 3. In the Chat tab, set the system prompt (above) and temperature ~0.4.
 ```bash
 # Ollama — pull straight from the Hub
+ollama run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M
 # llama.cpp
+llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M \
   -p "Write a user story with acceptance criteria for P1 SLA escalation." --temp 0.4
 ```
 from transformers import AutoModelForCausalLM, AutoTokenizer
 base = "Qwen/Qwen2.5-14B-Instruct"
 model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
+model = PeftModel.from_pretrained(model, "MainStack/marvy-1-14B-lora")
 ```
 ---
+## Use marvy-1-14B in OpenCode
 marvy runs behind any OpenAI-compatible endpoint (LM Studio, mlx_lm server,
 vLLM). Register it as a custom provider in `opencode.json`.
 1. **Start a local server** (LM Studio shown; adjust port for others):
    ```bash
+   lms load MainStack/marvy-1-14B && lms server start   # http://localhost:1234/v1
    ```
 2. **Add the provider** to your project `opencode.json` (or global
    `~/.config/opencode/opencode.json`):
          "name": "LM Studio (local)",
          "options": { "baseURL": "http://localhost:1234/v1" },
          "models": {
+           "marvy-1-14B": { "name": "marvy-1-14B (ServiceNow delivery)" }
          }
        }
      }
    }
    ```
+3. **Select** `lmstudio/marvy-1-14B` in the OpenCode model picker.
+> marvy-1-14B is a drafting specialist, not a tool-use/agentic fine-tune. It excels
 > at producing delivery artifacts inside chat; for MCP tool-calling agent loops,
 > keep a frontier model as the orchestrator and switch to marvy for drafting.