marvy-1-14B-GGUF

GGUF quants of marvy-1-14B, the first open LLM for the full ServiceNow delivery lifecycle. Run it locally and privately on Apple Silicon, LM Studio, or Ollama.

GGUF quantizations of MainStack/marvy-1-14B for use with llama.cpp, Ollama, LM Studio, and compatible runtimes.

Released under Apache-2.0. Built with Qwen โ€” see NOTICE.

Files

File Quant Size (approx) Use when
marvy-1-14B-Q4_K_M.gguf Q4_K_M ~9 GB Default โ€” best size/quality balance, laptops
marvy-1-14B-Q8_0.gguf Q8_0 ~16 GB Highest fidelity, near-FP16 quality

Quick start

Ollama

ollama run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M

llama.cpp

./llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M \
  -p "Write a ServiceNow user story with acceptance criteria for P1 SLA escalation." \
  --temp 0.4

LM Studio

  1. In the model browser, search MainStack/marvy-1-14B-GGUF and download a quant (Q4_K_M recommended), or drop the .gguf into ~/.lmstudio/models/MainStack/marvy-1-14B-GGUF/.
  2. Load it, set the system prompt below, temperature ~0.4.
  3. To use from code/OpenCode, start the local server:
    lms server start          # OpenAI-compatible on http://localhost:1234/v1
    

Use in OpenCode

Point OpenCode at the local LM Studio (or llama.cpp) server as an OpenAI-compatible provider โ€” see USAGE.md for the exact opencode.json snippet.

Recommended system prompt

You are a senior ServiceNow delivery consultant. You produce precise, implementation-grade
artifacts: business analyses, requirements, solution design documents, user stories with
acceptance criteria, test cases, and validation reviews. You favor out-of-the-box
capabilities, cite concrete tables/plugins/sys_ids when relevant, and write in clear
professional English.

๐Ÿ“– Full usage (all runtimes + OpenCode wiring): USAGE.md ยท Validate it works: VALIDATION.md

Provenance & limitations

See the merged model card for the full training data, anonymization methodology, evaluation (test ppl 13.107 on a project-disjoint split), and limitations. Quantization adds the usual minor quality reduction versus the FP16 model.

License & attribution

Dual-licensed: weights Apache-2.0, MainStack contributions (cards, docs, benchmark) CC-BY-4.0 โ€” see LICENSING.md. If you use marvy-1-14B as a baseline, fine-tune it, distill from it, or evaluate against it, please credit MainStack and link to https://huggingface.co/MainStack/marvy-1-14B. Keep the NOTICE file intact (required by Apache-2.0 ยง4) and cite the entry on the merged model card.

Downloads last month
39
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MainStack/marvy-1-14B-GGUF

Base model

Qwen/Qwen2.5-14B
Quantized
(1)
this model