Text Generation
Transformers
Safetensors
MLX
English
qwen2
servicenow
itsm
csdm
itom
delivery
solution-design
user-stories
business-analysis
qwen2.5
lora
sft
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use MainStack/marvy-1-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MainStack/marvy-1-14B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MainStack/marvy-1-14B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MainStack/marvy-1-14B") model = AutoModelForCausalLM.from_pretrained("MainStack/marvy-1-14B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use MainStack/marvy-1-14B with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("MainStack/marvy-1-14B") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- vLLM
How to use MainStack/marvy-1-14B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MainStack/marvy-1-14B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MainStack/marvy-1-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MainStack/marvy-1-14B
- SGLang
How to use MainStack/marvy-1-14B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MainStack/marvy-1-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MainStack/marvy-1-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MainStack/marvy-1-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MainStack/marvy-1-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Pi
How to use MainStack/marvy-1-14B with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "MainStack/marvy-1-14B"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "MainStack/marvy-1-14B" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use MainStack/marvy-1-14B with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "MainStack/marvy-1-14B"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default MainStack/marvy-1-14B
Run Hermes
hermes
- MLX LM
How to use MainStack/marvy-1-14B with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "MainStack/marvy-1-14B"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "MainStack/marvy-1-14B" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MainStack/marvy-1-14B", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use MainStack/marvy-1-14B with Docker Model Runner:
docker model run hf.co/MainStack/marvy-1-14B
File size: 6,205 Bytes
1266328 a411975 1266328 a411975 1266328 a411975 1266328 a411975 1266328 a411975 1266328 a411975 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | # Validating marvy-1-14B
This guide gives you three independent ways to confirm the fine-tune actually
learned the ServiceNow delivery style β from a 60-second smoke test to a
quantitative base-vs-marvy comparison on a held-out, customer-disjoint test set.
> TL;DR: run `bash docs/validate.sh` (from the model repo) for the quick path,
> or follow the manual steps below.
---
## What "working" means here
marvy-1-14B is a **specialist drafting model**. A successful fine-tune should show:
1. **Format fidelity** β it emits the delivery artifact shape on cue (user
stories with acceptance criteria, SDD sections, test cases with
pre-conditions/steps/expected results) without being told the structure.
2. **Domain voice** β OOTB-first framing, ServiceNow tables/plugins, ITIL/CSDM
vocabulary, `sys_id` citations where relevant.
3. **Lower loss than the base** on held-out ServiceNow delivery text.
The base model (Qwen2.5-14B-Instruct) is a strong generalist and will produce
*plausible* answers β the point of validation is to show marvy is **more
on-format, more domain-specific, and lower-perplexity** on this task.
---
## Test 1 β 60-second smoke test (qualitative)
Prompt the model with a bare instruction and check it produces a correctly
structured artifact with no format coaching.
### LM Studio (local)
```bash
lms load MainStack/marvy-1-14B
lms server start # OpenAI-compatible on http://localhost:1234/v1
curl -s http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "marvy-1-14B",
"temperature": 0.4,
"messages": [
{"role": "system", "content": "You are a senior ServiceNow delivery consultant. You produce precise, implementation-grade artifacts and favor out-of-the-box capabilities."},
{"role": "user", "content": "Write a user story with acceptance criteria for auto-escalating P1 incidents that breach a 15-minute response SLA."}
]
}' | python3 -c "import sys,json;print(json.load(sys.stdin)['choices'][0]['message']['content'])"
```
### MLX (Apple Silicon)
```bash
python -m mlx_lm generate --model MainStack/marvy-1-14B \
--system-prompt "You are a senior ServiceNow delivery consultant..." \
--prompt "Write a user story with acceptance criteria for auto-escalating P1 incidents that breach a 15-minute response SLA." \
--max-tokens 512 --temp 0.4
```
### Pass criteria
- [ ] Output is a **user story** (`As a β¦ I want β¦ so that β¦`) followed by
discrete, testable **acceptance criteria**.
- [ ] References ServiceNow concretely (e.g. `incident`, SLA definitions,
`sla_definition`, escalation/notification, assignment groups).
- [ ] No meta-chatter ("Sure, here isβ¦") dominating the answer; it reads like a
backlog item, not a chatbot reply.
---
## Test 2 β Task-coverage probes (qualitative, one per skill)
Run each prompt with the recommended system prompt. Each should yield the
artifact named, in the right shape.
| # | Prompt | Expect |
|---|--------|--------|
| 1 | "Draft the Incident Management section of an SDD for a greenfield ITSM implementation. Include assignment rules and SLA design." | SDD section: architecture/process, assignment rules (condition/action/order), SLA table |
| 2 | "Extract structured requirements (id, category, priority, target phase, success metric) from: 'We need to replace email-based access requests with a catalog item routed for manager approval.'" | Tabular/structured requirements with priorities & metrics |
| 3 | "Write a test case for the story: 'Restrict the Assignment Group field on incidents to groups with the itil role.'" | Test case: pre-conditions, steps, expected results, pass/fail |
| 4 | "We are migrating CMDB to CSDM. Produce the foundation-data load sequence and the CI classes involved." | CSDM/CMDB sequence, classes (cmdb_ci_*), foundation order |
| 5 | "Validate this requirement against best practice and list follow-up questions: 'All incidents must auto-close after 3 days.'" | Critique + concrete follow-up questions + risks |
### Pass criteria
At least **4 of 5** produce the correct artifact type with ServiceNow-specific,
implementation-grade content (not generic ITSM prose).
---
## Test 3 β Quantitative: base vs marvy on the held-out test set
This is the strongest signal. The test split is **customer-disjoint** β two
customers that never appear in training or validation β so it measures
generalization, not memorization.
### With the MLX training kit (in the source repo)
```bash
cd training
# marvy (fine-tuned adapter on the base)
python -m mlx_lm lora \
--model mlx-community/Qwen2.5-14B-Instruct-4bit \
--adapter-path train/adapters \
--data train/data --test --test-batches 50
# -> Test loss 2.573, Test ppl 13.107 (lower is better)
# base (no adapter) for comparison
python -m mlx_lm lora \
--model mlx-community/Qwen2.5-14B-Instruct-4bit \
--data train/data --test --test-batches 50
# -> expect a HIGHER loss/ppl than marvy
```
### Pass criteria
- [ ] marvy's **test perplexity is meaningfully lower** than the base on the
same held-out split.
- [ ] No data leakage: the test customers (`Customer-CHEM-01`,
`Customer-FININST-01`) are absent from `train.jsonl` / `valid.jsonl`.
> Reference result for this release: **test loss 2.573 / ppl 13.107** on 50
> batches of the project-disjoint test split (two sequences >2048 tokens are
> truncated by the eval harness, so this is a slight upper bound).
---
## Interpreting results
| Symptom | Likely cause | Action |
|---|---|---|
| Generic ITSM prose, no ServiceNow specifics | wrong/short system prompt | use the full recommended system prompt; temp 0.3β0.5 |
| Rambling, no artifact structure | temperature too high | lower to 0.3β0.4 |
| Invents `sys_id`s / plugin IDs | expected limitation | verify against a real instance; never trust IDs blindly |
| marvy ppl β base ppl | adapter not applied / wrong checkpoint | confirm `--adapter-path` points at the trained adapter (iter-150) |
marvy-1-14B is a first-draft assistant. All output must be reviewed by a qualified
ServiceNow consultant before client delivery or production configuration.
|