How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MainStack/marvy-1-14B-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

marvy-1-14B-GGUF

GGUF quants of marvy-1-14B, the first open LLM for the full ServiceNow delivery lifecycle. Run it locally and privately on Apple Silicon, LM Studio, or Ollama.

GGUF quantizations of MainStack/marvy-1-14B for use with llama.cpp, Ollama, LM Studio, and compatible runtimes.

Released under Apache-2.0. Built with Qwen — see NOTICE.

Files

File Quant Size (approx) Use when
marvy-1-14B-Q4_K_M.gguf Q4_K_M ~9 GB Default — best size/quality balance, laptops
marvy-1-14B-Q8_0.gguf Q8_0 ~16 GB Highest fidelity, near-FP16 quality

Quick start

Ollama

ollama run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M

llama.cpp

./llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M \
  -p "Write a ServiceNow user story with acceptance criteria for P1 SLA escalation." \
  --temp 0.4

LM Studio

  1. In the model browser, search MainStack/marvy-1-14B-GGUF and download a quant (Q4_K_M recommended), or drop the .gguf into ~/.lmstudio/models/MainStack/marvy-1-14B-GGUF/.
  2. Load it, set the system prompt below, temperature ~0.4.
  3. To use from code/OpenCode, start the local server:
    lms server start          # OpenAI-compatible on http://localhost:1234/v1
    

Use in OpenCode

Point OpenCode at the local LM Studio (or llama.cpp) server as an OpenAI-compatible provider — see USAGE.md for the exact opencode.json snippet.

Recommended system prompt

You are a senior ServiceNow delivery consultant. You produce precise, implementation-grade
artifacts: business analyses, requirements, solution design documents, user stories with
acceptance criteria, test cases, and validation reviews. You favor out-of-the-box
capabilities, cite concrete tables/plugins/sys_ids when relevant, and write in clear
professional English.

📖 Full usage (all runtimes + OpenCode wiring): USAGE.md · Validate it works: VALIDATION.md

Provenance & limitations

See the merged model card for the full training data, anonymization methodology, evaluation (test ppl 13.107 on a project-disjoint split), and limitations. Quantization adds the usual minor quality reduction versus the FP16 model.

License & attribution

Dual-licensed: weights Apache-2.0, MainStack contributions (cards, docs, benchmark) CC-BY-4.0 — see LICENSING.md. If you use marvy-1-14B as a baseline, fine-tune it, distill from it, or evaluate against it, please credit MainStack and link to https://huggingface.co/MainStack/marvy-1-14B. Keep the NOTICE file intact (required by Apache-2.0 §4) and cite the entry on the merged model card.

Downloads last month
39
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MainStack/marvy-1-14B-GGUF

Base model

Qwen/Qwen2.5-14B
Quantized
(1)
this model