Instructions to use MainStack/marvy-1-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MainStack/marvy-1-14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MainStack/marvy-1-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MainStack/marvy-1-14B")
model = AutoModelForCausalLM.from_pretrained("MainStack/marvy-1-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use MainStack/marvy-1-14B with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("MainStack/marvy-1-14B")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Inference
Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use MainStack/marvy-1-14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MainStack/marvy-1-14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MainStack/marvy-1-14B

SGLang

How to use MainStack/marvy-1-14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MainStack/marvy-1-14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MainStack/marvy-1-14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

How to use MainStack/marvy-1-14B with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "MainStack/marvy-1-14B"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MainStack/marvy-1-14B"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MainStack/marvy-1-14B with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "MainStack/marvy-1-14B"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MainStack/marvy-1-14B

Run Hermes

hermes

MLX LM

How to use MainStack/marvy-1-14B with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "MainStack/marvy-1-14B"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "MainStack/marvy-1-14B"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "MainStack/marvy-1-14B",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use MainStack/marvy-1-14B with Docker Model Runner:
```
docker model run hf.co/MainStack/marvy-1-14B
```

tgetsov commited on 1 day ago

Commit

7e2e677

verified ·

1 Parent(s): 6ae68c1

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +52 -23

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ tags:
   - sft
   - mlx
 model-index:
-  - name: marvy-14B
     results:
       - task:
           type: text-generation
@@ -37,17 +37,17 @@ model-index:
             name: Test cross-entropy loss
 ---
-# marvy-14B
 **The first open, fine-tuned LLM for the full ServiceNow delivery lifecycle — from business analysis to validation.**
-marvy-14B is an open-source language model fine-tuned for the complete ServiceNow delivery lifecycle: business analysis, requirements, stakeholder mapping, systems inventory, Solution Design Documents, user stories with acceptance criteria, implementation planning, test cases, and validation. Where general-purpose models treat ServiceNow as one topic among many, marvy is built to draft the actual artifacts a delivery team produces — in the structure and sequence real engagements follow. It is a first-draft specialist, not a consultant replacement, and it is not an agentic or tool-use fine-tune.
 It was built by [MainStack](https://huggingface.co/MainStack), a consultancy specializing in ServiceNow Agentic Delivery. marvy is a LoRA SFT fine-tune of [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache-2.0), trained on ~1,958 anonymized artifacts from real engagements (~887k tokens), rigorously redacted to zero residual PII per an automated leakage scanner. Its test perplexity of 13.107 was measured on a project- and customer-disjoint held-out split — the model generalizes to unseen work rather than memorizing the training set.
 > Released under **Apache-2.0**. Built with Qwen — see `NOTICE`.
-## Why marvy-14B
 - **Drafts the full lifecycle, not just snippets.** Business analysis through validation — the artifacts and sequence real delivery teams actually work in.
 - **OOTB-first and implementation-grade.** Tuned to favor out-of-the-box correctness and produce drafts you can review, not rewrite.
@@ -68,7 +68,7 @@ It was built by [MainStack](https://huggingface.co/MainStack), a consultancy spe
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
-model_id = "MainStack/marvy-14B"
 tok = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
@@ -93,22 +93,22 @@ print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
 ```bash
 pip install vllm
-vllm serve MainStack/marvy-14B
 ```
 ### Ollama (via GGUF)
-Use the companion repo [`MainStack/marvy-14B-GGUF`](https://huggingface.co/MainStack/marvy-14B-GGUF):
 ```bash
-ollama run hf.co/MainStack/marvy-14B-GGUF:Q4_K_M
 ```
 ### MLX (Apple Silicon native)
 ```bash
 pip install mlx-lm
-python -m mlx_lm generate --model MainStack/marvy-14B \
   --system-prompt "You are a senior ServiceNow delivery consultant..." \
   --prompt "Draft the Platform Architecture section of an ITSM SDD." \
   --max-tokens 1024 --temp 0.4
@@ -116,13 +116,13 @@ python -m mlx_lm generate --model MainStack/marvy-14B \
 ### LoRA-only (apply on top of the base)
-If you prefer a tiny adapter (~175 MB) on top of the BF16 base, see [`MainStack/marvy-14B-lora`](https://huggingface.co/MainStack/marvy-14B-lora).
 ---
 ## Intended use
-marvy-14B is designed to produce implementation-grade first drafts across the ServiceNow delivery lifecycle — accelerating the artifacts a practitioner would otherwise write from scratch, then review and refine. Built for solution architects, business analysts, technical consultants, and project managers. Typical tasks:
 | Task family            | What it produces                                                                |
 |------------------------|---------------------------------------------------------------------------------|
@@ -206,18 +206,47 @@ Train / val / test are split **by project**, so no customer appears in more than
 ## Evaluation
-Test-set evaluation on the **project-disjoint** test split (252 records from two customers never seen in training/val), 50 batches:
-| Metric | Value |
-|---|---|
-| Test cross-entropy loss | **2.573** |
-| Test perplexity | **13.107** |
-> Note: two test sequences exceed 2,048 tokens and are truncated by the MLX eval harness. The reported figure is therefore a slight upper bound on true loss. Full-length scoring is planned for v2.
-To reproduce or validate these results yourself — including a base-vs-marvy
-comparison and qualitative task probes — see [`VALIDATION.md`](./VALIDATION.md)
-and run [`validate.sh`](./validate.sh).
 ---
@@ -228,7 +257,7 @@ and run [`validate.sh`](./validate.sh).
 - **Synthetic instructions.** User prompts are templated paraphrases (3–5 variants per task); assistant outputs are the original human-authored artifacts.
 - **English-only.** The corpus is English.
 - **Not a replacement for a consultant.** Output is first-draft, implementation-grade content that requires expert review before client delivery or production use.
-- **No tool use / function calling fine-tune.** `marvy-14B` is a text-completion specialist; agentic tool use is left to the orchestrator.
 - **Hallucination risk on instance-specific facts.** The model will confidently invent `sys_id`s, plugin IDs, and table fields if asked about specifics it has not seen. Always verify against an actual ServiceNow instance.
 - **No safety fine-tune beyond the base.** Inherits Qwen2.5-14B-Instruct safety behavior; no additional RLHF.
@@ -244,10 +273,10 @@ This model is a derivative of **Qwen2.5-14B-Instruct** (Apache-2.0). See `NOTICE
 ```bibtex
 @software{marvy_14b_2026,
-  title  = {marvy-14B: A ServiceNow delivery lifecycle fine-tune of Qwen2.5-14B-Instruct},
   author = {MainStack},
   year   = {2026},
-  url    = {https://huggingface.co/MainStack/marvy-14B},
   license= {Apache-2.0}
 }

   - sft
   - mlx
 model-index:
+  - name: marvy-1-14B
     results:
       - task:
           type: text-generation
             name: Test cross-entropy loss
 ---
+# marvy-1-14B
 **The first open, fine-tuned LLM for the full ServiceNow delivery lifecycle — from business analysis to validation.**
+marvy-1-14B is an open-source language model fine-tuned for the complete ServiceNow delivery lifecycle: business analysis, requirements, stakeholder mapping, systems inventory, Solution Design Documents, user stories with acceptance criteria, implementation planning, test cases, and validation. Where general-purpose models treat ServiceNow as one topic among many, marvy is built to draft the actual artifacts a delivery team produces — in the structure and sequence real engagements follow. It is a first-draft specialist, not a consultant replacement, and it is not an agentic or tool-use fine-tune.
 It was built by [MainStack](https://huggingface.co/MainStack), a consultancy specializing in ServiceNow Agentic Delivery. marvy is a LoRA SFT fine-tune of [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache-2.0), trained on ~1,958 anonymized artifacts from real engagements (~887k tokens), rigorously redacted to zero residual PII per an automated leakage scanner. Its test perplexity of 13.107 was measured on a project- and customer-disjoint held-out split — the model generalizes to unseen work rather than memorizing the training set.
 > Released under **Apache-2.0**. Built with Qwen — see `NOTICE`.
+## Why marvy-1-14B
 - **Drafts the full lifecycle, not just snippets.** Business analysis through validation — the artifacts and sequence real delivery teams actually work in.
 - **OOTB-first and implementation-grade.** Tuned to favor out-of-the-box correctness and produce drafts you can review, not rewrite.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "MainStack/marvy-1-14B"
 tok = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
 ```bash
 pip install vllm
+vllm serve MainStack/marvy-1-14B
 ```
 ### Ollama (via GGUF)
+Use the companion repo [`MainStack/marvy-1-14B-GGUF`](https://huggingface.co/MainStack/marvy-1-14B-GGUF):
 ```bash
+ollama run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M
 ```
 ### MLX (Apple Silicon native)
 ```bash
 pip install mlx-lm
+python -m mlx_lm generate --model MainStack/marvy-1-14B \
   --system-prompt "You are a senior ServiceNow delivery consultant..." \
   --prompt "Draft the Platform Architecture section of an ITSM SDD." \
   --max-tokens 1024 --temp 0.4
 ### LoRA-only (apply on top of the base)
+If you prefer a tiny adapter (~175 MB) on top of the BF16 base, see [`MainStack/marvy-1-14B-lora`](https://huggingface.co/MainStack/marvy-1-14B-lora).
 ---
 ## Intended use
+marvy-1-14B is designed to produce implementation-grade first drafts across the ServiceNow delivery lifecycle — accelerating the artifacts a practitioner would otherwise write from scratch, then review and refine. Built for solution architects, business analysts, technical consultants, and project managers. Typical tasks:
 | Task family            | What it produces                                                                |
 |------------------------|---------------------------------------------------------------------------------|
 ## Evaluation
+### Fine-tuned vs. base — efficiency on the held-out test set
+The cleanest measure of the fine-tune's value is to score the **same base
+model twice** — plain vs. with the marvy adapter — on the **project-disjoint**
+test split (252 records from two customers never seen in training/val), using
+per-token cross-entropy/perplexity on the **assistant tokens only**
+(prompt-masked, the same objective used in training). Lower perplexity = the
+model assigns higher probability to the real, human-authored delivery artifact.
+![marvy-1-14B vs base — perplexity by task](./marvy_vs_base_ppl.png)
+![How much fine-tuning improved each task](./marvy_improvement.png)
+**Overall: perplexity 8.91 → 6.03, a 32.3% reduction** on unseen customers.
+| Task | Base ppl | marvy-1-14B ppl | Improvement |
+|---|---:|---:|---:|
+| Systems inventory | 77.07 | 10.53 | **−86.3%** |
+| Requirements extraction | 46.76 | 9.39 | **−79.9%** |
+| Stakeholder mapping | 27.81 | 6.91 | **−75.2%** |
+| Story authoring | 15.38 | 7.86 | **−48.9%** |
+| Validation / critique | 9.72 | 8.23 | −15.3% |
+| Business analysis | 7.14 | 6.66 | −6.6% |
+| SDD design | 4.48 | 4.40 | −1.7% |
+| **Overall** | **8.91** | **6.03** | **−32.3%** |
+The gains are largest on **structured, format-heavy artifacts** (inventories,
+requirements, stakeholder registers, stories) where the base model wanders from
+the expected schema; they are smaller on long-form prose (SDD sections, business
+analysis) where the base was already competent. This is the honest, expected
+shape of a domain SFT.
+> Notes: the test customers (`Customer-CHEM-01`, `Customer-FININST-01`) appear in
+> neither train nor val, so this reflects generalization, not memorization. The
+> test split happens to cover 7 of the 11 task families. An earlier MLX
+> batch-eval reported aggregate ppl ≈ 13.1 with 2,048-token truncation; the
+> figures above recompute per-task with full assistant-token masking, so the
+> base-vs-marvy **delta** is the result of interest.
+Reproduce it yourself: `bash benchmark/run_benchmark.sh` (see
+[`VALIDATION.md`](./VALIDATION.md) for qualitative probes too).
 ---
 - **Synthetic instructions.** User prompts are templated paraphrases (3–5 variants per task); assistant outputs are the original human-authored artifacts.
 - **English-only.** The corpus is English.
 - **Not a replacement for a consultant.** Output is first-draft, implementation-grade content that requires expert review before client delivery or production use.
+- **No tool use / function calling fine-tune.** `marvy-1-14B` is a text-completion specialist; agentic tool use is left to the orchestrator.
 - **Hallucination risk on instance-specific facts.** The model will confidently invent `sys_id`s, plugin IDs, and table fields if asked about specifics it has not seen. Always verify against an actual ServiceNow instance.
 - **No safety fine-tune beyond the base.** Inherits Qwen2.5-14B-Instruct safety behavior; no additional RLHF.
 ```bibtex
 @software{marvy_14b_2026,
+  title  = {marvy-1-14B: A ServiceNow delivery lifecycle fine-tune of Qwen2.5-14B-Instruct},
   author = {MainStack},
   year   = {2026},
+  url    = {https://huggingface.co/MainStack/marvy-1-14B},
   license= {Apache-2.0}
 }