Instructions to use MainStack/marvy-1-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MainStack/marvy-1-14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MainStack/marvy-1-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MainStack/marvy-1-14B")
model = AutoModelForCausalLM.from_pretrained("MainStack/marvy-1-14B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use MainStack/marvy-1-14B with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("MainStack/marvy-1-14B")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Inference
Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use MainStack/marvy-1-14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MainStack/marvy-1-14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MainStack/marvy-1-14B

SGLang

How to use MainStack/marvy-1-14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MainStack/marvy-1-14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MainStack/marvy-1-14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MainStack/marvy-1-14B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

How to use MainStack/marvy-1-14B with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "MainStack/marvy-1-14B"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MainStack/marvy-1-14B"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MainStack/marvy-1-14B with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "MainStack/marvy-1-14B"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MainStack/marvy-1-14B

Run Hermes

hermes

MLX LM

How to use MainStack/marvy-1-14B with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "MainStack/marvy-1-14B"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "MainStack/marvy-1-14B"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "MainStack/marvy-1-14B",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use MainStack/marvy-1-14B with Docker Model Runner:
```
docker model run hf.co/MainStack/marvy-1-14B
```

tgetsov commited on 1 day ago

Commit

be43504

verified ·

1 Parent(s): aa1b9ff

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +263 -0

README.md ADDED Viewed

	@@ -0,0 +1,263 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-14B-Instruct
+base_model_relation: finetune
+library_name: transformers
+pipeline_tag: text-generation
+language:
+  - en
+tags:
+  - servicenow
+  - itsm
+  - csdm
+  - itom
+  - delivery
+  - solution-design
+  - user-stories
+  - business-analysis
+  - qwen2.5
+  - lora
+  - sft
+  - mlx
+model-index:
+  - name: marvy-14B
+    results:
+      - task:
+          type: text-generation
+          name: ServiceNow Delivery SFT (project-disjoint test split)
+        metrics:
+          - type: perplexity
+            value: 13.107
+            name: Test perplexity
+          - type: loss
+            value: 2.573
+            name: Test cross-entropy loss
+---
+# marvy-14B
+**The first open, fine-tuned LLM for the full ServiceNow delivery lifecycle — from business analysis to validation.**
+marvy-14B is an open-source language model fine-tuned for the complete ServiceNow delivery lifecycle: business analysis, requirements, stakeholder mapping, systems inventory, Solution Design Documents, user stories with acceptance criteria, implementation planning, test cases, and validation. Where general-purpose models treat ServiceNow as one topic among many, marvy is built to draft the actual artifacts a delivery team produces — in the structure and sequence real engagements follow. It is a first-draft specialist, not a consultant replacement, and it is not an agentic or tool-use fine-tune.
+It was built by [MainStack](https://huggingface.co/MainStack), a consultancy specializing in ServiceNow Agentic Delivery. marvy is a LoRA SFT fine-tune of [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache-2.0), trained on ~1,958 anonymized artifacts from real engagements (~887k tokens), rigorously redacted to zero residual PII per an automated leakage scanner. Its test perplexity of 13.107 was measured on a project- and customer-disjoint held-out split — the model generalizes to unseen work rather than memorizing the training set.
+> Released under **Apache-2.0**. Built with Qwen — see `NOTICE`.
+## Why marvy-14B
+- **Drafts the full lifecycle, not just snippets.** Business analysis through validation — the artifacts and sequence real delivery teams actually work in.
+- **OOTB-first and implementation-grade.** Tuned to favor out-of-the-box correctness and produce drafts you can review, not rewrite.
+- **Runs locally and privately.** Merged FP16, a LoRA adapter, and GGUF quants — run it on Apple Silicon via LM Studio or Ollama, with your engagement data never leaving your machine.
+- **Trained on real, anonymized delivery work.** ~1,958 redacted engagement artifacts (~887k tokens), with zero residual PII verified by an automated leakage scanner.
+- **Open and Apache-2.0.** Built on Qwen2.5-14B-Instruct — inspect it, fine-tune it, and deploy it on your own terms.
+📖 **Full docs:** [`USAGE.md`](./USAGE.md) (every runtime + OpenCode wiring) ·
+[`VALIDATION.md`](./VALIDATION.md) (prove the fine-tune works) ·
+[`validate.sh`](./validate.sh) (one-command probe harness)
+---
+## Quick start
+### Transformers
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "MainStack/marvy-14B"
+tok = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+SYSTEM = (
+  "You are a senior ServiceNow delivery consultant. You produce precise, "
+  "implementation-grade artifacts: business analyses, requirements, solution "
+  "design documents, user stories with acceptance criteria, test cases, and "
+  "validation reviews. You favor out-of-the-box capabilities, cite concrete "
+  "tables/plugins/sys_ids when relevant, and write in clear professional English."
+)
+messages = [
+  {"role": "system", "content": SYSTEM},
+  {"role": "user", "content": "Write a ServiceNow user story with acceptance criteria for SLA escalation on P1 incidents."},
+]
+inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
+out = model.generate(inputs, max_new_tokens=1024, temperature=0.4)
+print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
+```
+### vLLM
+```bash
+pip install vllm
+vllm serve MainStack/marvy-14B
+```
+### Ollama (via GGUF)
+Use the companion repo [`MainStack/marvy-14B-GGUF`](https://huggingface.co/MainStack/marvy-14B-GGUF):
+```bash
+ollama run hf.co/MainStack/marvy-14B-GGUF:Q4_K_M
+```
+### MLX (Apple Silicon native)
+```bash
+pip install mlx-lm
+python -m mlx_lm generate --model MainStack/marvy-14B \
+  --system-prompt "You are a senior ServiceNow delivery consultant..." \
+  --prompt "Draft the Platform Architecture section of an ITSM SDD." \
+  --max-tokens 1024 --temp 0.4
+```
+### LoRA-only (apply on top of the base)
+If you prefer a tiny adapter (~175 MB) on top of the BF16 base, see [`MainStack/marvy-14B-lora`](https://huggingface.co/MainStack/marvy-14B-lora).
+---
+## Intended use
+marvy-14B is designed to produce implementation-grade first drafts across the ServiceNow delivery lifecycle — accelerating the artifacts a practitioner would otherwise write from scratch, then review and refine. Built for solution architects, business analysts, technical consultants, and project managers. Typical tasks:
+| Task family            | What it produces                                                                |
+|------------------------|---------------------------------------------------------------------------------|
+| `business_analysis`    | Structured BA reports from SOWs / discovery notes                               |
+| `requirements_extraction` | Functional/non-functional requirements with acceptance bullets               |
+| `stakeholder_mapping`  | RACI / influence-interest grids from raw notes                                  |
+| `systems_inventory`    | CMDB-shaped systems inventories from architecture inputs                        |
+| `sdd_design`           | Solution Design Document sections (architecture, integrations, data model)      |
+| `story_authoring`      | User stories with crisp acceptance criteria                                     |
+| `implementation_planning` | Story-level implementation plans citing tables/plugins                       |
+| `test_case_generation` | Test cases per story, mapped to acceptance criteria                             |
+| `validation_critique`  | Gap analysis, follow-up questions, assumption checks against source docs        |
+| `delivery_chain`       | Multi-turn: story → implementation → test, end-to-end                           |
+### Recommended system prompt
+```
+You are a senior ServiceNow delivery consultant. You produce precise, implementation-grade
+artifacts: business analyses, requirements, solution design documents, user stories with
+acceptance criteria, test cases, and validation reviews. You favor out-of-the-box
+capabilities, cite concrete tables/plugins/sys_ids when relevant, and write in clear
+professional English.
+```
+### Recommended generation settings
+| Use case                    | temperature | top_p | max_new_tokens |
+|-----------------------------|-------------|-------|----------------|
+| Structured artifacts (SDD, stories) | 0.3 – 0.5 | 0.9 | 1024 – 4096 |
+| Exploratory brainstorming   | 0.7 – 0.9   | 0.95  | 1024           |
+| Validation / critique       | 0.2 – 0.4   | 0.9   | 1024 – 2048    |
+---
+## Training data
+| Item | Value |
+|---|---|
+| Source | Anonymized real engagement artifacts (`.md`, `.csv`, `.json`, `.mmd`, `.txt`) |
+| Total records | **1,958** (after schema + exact-dedupe) |
+| Estimated tokens | **~887k** |
+| Splits (project-disjoint) | train 1,359 · val 347 · test 252 |
+| Tasks | 11 task families (see table above) |
+| Multi-turn share | `delivery_chain` (158 records) — story→implementation→test |
+### Privacy & redaction
+- All customer/partner names → stable aliases (e.g. `Customer-FIN-03`, `Customer-ENERGY-01`).
+- Emails → `user@example.com`; hostnames → `instance.example.service-now.com`; IPs → RFC 5737 range; `key: value` secrets → `[REDACTED]`.
+- Credential/login/VPN files excluded entirely; bulk CMDB dumps >1.5 MB excluded.
+- ServiceNow `sys_id`s and table/plugin names preserved (instance-local, technically valuable, low risk).
+- A leakage scanner asserts **0** residual emails, hostnames, or mapped real names in message content.
+### Split integrity
+Train / val / test are split **by project**, so no customer appears in more than one split. The largest project is forced into `train` to keep eval honest:
+- val projects: `Customer-ENERGY-01`
+- test projects: `Customer-CHEM-01`, `Customer-FININST-01`
+---
+## Training procedure
+| Setting | Value |
+|---|---|
+| Method | LoRA SFT (QLoRA-style: LoRA on 4-bit base) |
+| Base model | `mlx-community/Qwen2.5-14B-Instruct-4bit` (training) → fused onto `Qwen/Qwen2.5-14B-Instruct` BF16 (release) |
+| Framework | [MLX-LM](https://github.com/ml-explore/mlx-lm) 0.31.3 |
+| Hardware | Apple Silicon (M-series), Metal |
+| Max sequence length | 8,192 |
+| Batch size / grad accum | 1 / 16 (effective batch 16) |
+| Iterations | 350 (~4 epochs over 1,359 train records) |
+| Optimizer | AdamW, cosine decay, warmup 20, lr 1e-4 → 1e-6 |
+| LoRA rank / scale / dropout | 32 / 20.0 / 0.0 |
+| LoRA target keys | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
+| Adapted layers | top 16 transformer layers |
+| Prompt masking | yes — loss computed only on assistant turns |
+| Seed | 42 |
+---
+## Evaluation
+Test-set evaluation on the **project-disjoint** test split (252 records from two customers never seen in training/val), 50 batches:
+| Metric | Value |
+|---|---|
+| Test cross-entropy loss | **2.573** |
+| Test perplexity | **13.107** |
+> Note: two test sequences exceed 2,048 tokens and are truncated by the MLX eval harness. The reported figure is therefore a slight upper bound on true loss. Full-length scoring is planned for v2.
+To reproduce or validate these results yourself — including a base-vs-marvy
+comparison and qualitative task probes — see [`VALIDATION.md`](./VALIDATION.md)
+and run [`validate.sh`](./validate.sh).
+---
+## Limitations & known issues
+- **Text-only sources.** SOWs/SDDs/workbooks in `.docx/.pptx/.pdf/.xlsx` are not parsed in this build. Coverage of binary-only engagements is therefore thin.
+- **Project concentration.** ~95% of records come from ~12 data-rich projects; the long tail contributes a single case study each. Some task families (e.g. `case_study`, `validation_critique`) are smaller and may exhibit higher variance.
+- **Synthetic instructions.** User prompts are templated paraphrases (3–5 variants per task); assistant outputs are the original human-authored artifacts.
+- **English-only.** The corpus is English.
+- **Not a replacement for a consultant.** Output is first-draft, implementation-grade content that requires expert review before client delivery or production use.
+- **No tool use / function calling fine-tune.** `marvy-14B` is a text-completion specialist; agentic tool use is left to the orchestrator.
+- **Hallucination risk on instance-specific facts.** The model will confidently invent `sys_id`s, plugin IDs, and table fields if asked about specifics it has not seen. Always verify against an actual ServiceNow instance.
+- **No safety fine-tune beyond the base.** Inherits Qwen2.5-14B-Instruct safety behavior; no additional RLHF.
+---
+## License
+Released under the **Apache License 2.0** (see `LICENSE`).
+This model is a derivative of **Qwen2.5-14B-Instruct** (Apache-2.0). See `NOTICE` for attribution.
+## Citation
+```bibtex
+@software{marvy_14b_2026,
+  title  = {marvy-14B: A ServiceNow delivery lifecycle fine-tune of Qwen2.5-14B-Instruct},
+  author = {MainStack},
+  year   = {2026},
+  url    = {https://huggingface.co/MainStack/marvy-14B},
+  license= {Apache-2.0}
+}
+@misc{qwen2.5,
+  title  = {Qwen2.5: A Party of Foundation Models},
+  author = {Qwen Team},
+  year   = {2024},
+  url    = {https://qwenlm.github.io/blog/qwen2.5/}
+}
+```
+## Acknowledgements
+- **Qwen team** at Alibaba Cloud for the Qwen2.5 family.
+- **Apple MLX team** for `mlx` and `mlx-lm`, enabling native Apple Silicon training.
+- **Hugging Face** for hosting and the surrounding ecosystem.