Text Generation
GGUF
English
servicenow
itsm
csdm
delivery
llama.cpp
ollama
quantized
qwen2.5
conversational
Instructions to use MainStack/marvy-1-14B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use MainStack/marvy-1-14B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="MainStack/marvy-1-14B-GGUF", filename="marvy-14B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use MainStack/marvy-1-14B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MainStack/marvy-1-14B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MainStack/marvy-1-14B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf MainStack/marvy-1-14B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf MainStack/marvy-1-14B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use MainStack/marvy-1-14B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MainStack/marvy-1-14B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MainStack/marvy-1-14B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M
- Ollama
How to use MainStack/marvy-1-14B-GGUF with Ollama:
ollama run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M
- Unsloth Studio
How to use MainStack/marvy-1-14B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MainStack/marvy-1-14B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MainStack/marvy-1-14B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MainStack/marvy-1-14B-GGUF to start chatting
- Pi
How to use MainStack/marvy-1-14B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MainStack/marvy-1-14B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "MainStack/marvy-1-14B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use MainStack/marvy-1-14B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MainStack/marvy-1-14B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default MainStack/marvy-1-14B-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use MainStack/marvy-1-14B-GGUF with Docker Model Runner:
docker model run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M
- Lemonade
How to use MainStack/marvy-1-14B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull MainStack/marvy-1-14B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.marvy-1-14B-GGUF-Q4_K_M
List all available models
lemonade list
File size: 6,205 Bytes
cb513af 3ba4a4b cb513af 3ba4a4b cb513af 3ba4a4b cb513af 3ba4a4b cb513af 3ba4a4b cb513af 3ba4a4b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | # Validating marvy-1-14B
This guide gives you three independent ways to confirm the fine-tune actually
learned the ServiceNow delivery style β from a 60-second smoke test to a
quantitative base-vs-marvy comparison on a held-out, customer-disjoint test set.
> TL;DR: run `bash docs/validate.sh` (from the model repo) for the quick path,
> or follow the manual steps below.
---
## What "working" means here
marvy-1-14B is a **specialist drafting model**. A successful fine-tune should show:
1. **Format fidelity** β it emits the delivery artifact shape on cue (user
stories with acceptance criteria, SDD sections, test cases with
pre-conditions/steps/expected results) without being told the structure.
2. **Domain voice** β OOTB-first framing, ServiceNow tables/plugins, ITIL/CSDM
vocabulary, `sys_id` citations where relevant.
3. **Lower loss than the base** on held-out ServiceNow delivery text.
The base model (Qwen2.5-14B-Instruct) is a strong generalist and will produce
*plausible* answers β the point of validation is to show marvy is **more
on-format, more domain-specific, and lower-perplexity** on this task.
---
## Test 1 β 60-second smoke test (qualitative)
Prompt the model with a bare instruction and check it produces a correctly
structured artifact with no format coaching.
### LM Studio (local)
```bash
lms load MainStack/marvy-1-14B
lms server start # OpenAI-compatible on http://localhost:1234/v1
curl -s http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "marvy-1-14B",
"temperature": 0.4,
"messages": [
{"role": "system", "content": "You are a senior ServiceNow delivery consultant. You produce precise, implementation-grade artifacts and favor out-of-the-box capabilities."},
{"role": "user", "content": "Write a user story with acceptance criteria for auto-escalating P1 incidents that breach a 15-minute response SLA."}
]
}' | python3 -c "import sys,json;print(json.load(sys.stdin)['choices'][0]['message']['content'])"
```
### MLX (Apple Silicon)
```bash
python -m mlx_lm generate --model MainStack/marvy-1-14B \
--system-prompt "You are a senior ServiceNow delivery consultant..." \
--prompt "Write a user story with acceptance criteria for auto-escalating P1 incidents that breach a 15-minute response SLA." \
--max-tokens 512 --temp 0.4
```
### Pass criteria
- [ ] Output is a **user story** (`As a β¦ I want β¦ so that β¦`) followed by
discrete, testable **acceptance criteria**.
- [ ] References ServiceNow concretely (e.g. `incident`, SLA definitions,
`sla_definition`, escalation/notification, assignment groups).
- [ ] No meta-chatter ("Sure, here isβ¦") dominating the answer; it reads like a
backlog item, not a chatbot reply.
---
## Test 2 β Task-coverage probes (qualitative, one per skill)
Run each prompt with the recommended system prompt. Each should yield the
artifact named, in the right shape.
| # | Prompt | Expect |
|---|--------|--------|
| 1 | "Draft the Incident Management section of an SDD for a greenfield ITSM implementation. Include assignment rules and SLA design." | SDD section: architecture/process, assignment rules (condition/action/order), SLA table |
| 2 | "Extract structured requirements (id, category, priority, target phase, success metric) from: 'We need to replace email-based access requests with a catalog item routed for manager approval.'" | Tabular/structured requirements with priorities & metrics |
| 3 | "Write a test case for the story: 'Restrict the Assignment Group field on incidents to groups with the itil role.'" | Test case: pre-conditions, steps, expected results, pass/fail |
| 4 | "We are migrating CMDB to CSDM. Produce the foundation-data load sequence and the CI classes involved." | CSDM/CMDB sequence, classes (cmdb_ci_*), foundation order |
| 5 | "Validate this requirement against best practice and list follow-up questions: 'All incidents must auto-close after 3 days.'" | Critique + concrete follow-up questions + risks |
### Pass criteria
At least **4 of 5** produce the correct artifact type with ServiceNow-specific,
implementation-grade content (not generic ITSM prose).
---
## Test 3 β Quantitative: base vs marvy on the held-out test set
This is the strongest signal. The test split is **customer-disjoint** β two
customers that never appear in training or validation β so it measures
generalization, not memorization.
### With the MLX training kit (in the source repo)
```bash
cd training
# marvy (fine-tuned adapter on the base)
python -m mlx_lm lora \
--model mlx-community/Qwen2.5-14B-Instruct-4bit \
--adapter-path train/adapters \
--data train/data --test --test-batches 50
# -> Test loss 2.573, Test ppl 13.107 (lower is better)
# base (no adapter) for comparison
python -m mlx_lm lora \
--model mlx-community/Qwen2.5-14B-Instruct-4bit \
--data train/data --test --test-batches 50
# -> expect a HIGHER loss/ppl than marvy
```
### Pass criteria
- [ ] marvy's **test perplexity is meaningfully lower** than the base on the
same held-out split.
- [ ] No data leakage: the test customers (`Customer-CHEM-01`,
`Customer-FININST-01`) are absent from `train.jsonl` / `valid.jsonl`.
> Reference result for this release: **test loss 2.573 / ppl 13.107** on 50
> batches of the project-disjoint test split (two sequences >2048 tokens are
> truncated by the eval harness, so this is a slight upper bound).
---
## Interpreting results
| Symptom | Likely cause | Action |
|---|---|---|
| Generic ITSM prose, no ServiceNow specifics | wrong/short system prompt | use the full recommended system prompt; temp 0.3β0.5 |
| Rambling, no artifact structure | temperature too high | lower to 0.3β0.4 |
| Invents `sys_id`s / plugin IDs | expected limitation | verify against a real instance; never trust IDs blindly |
| marvy ppl β base ppl | adapter not applied / wrong checkpoint | confirm `--adapter-path` points at the trained adapter (iter-150) |
marvy-1-14B is a first-draft assistant. All output must be reviewed by a qualified
ServiceNow consultant before client delivery or production configuration.
|