marvy-1-14B-GGUF / README.md
tgetsov's picture
Upload README.md with huggingface_hub
87b4514 verified
metadata
license: apache-2.0
base_model: MainStack/marvy-1-14B
base_model_relation: quantized
pipeline_tag: text-generation
language:
  - en
tags:
  - servicenow
  - itsm
  - csdm
  - delivery
  - gguf
  - llama.cpp
  - ollama
  - quantized
  - qwen2.5

marvy-1-14B-GGUF

GGUF quants of marvy-1-14B, the first open LLM for the full ServiceNow delivery lifecycle. Run it locally and privately on Apple Silicon, LM Studio, or Ollama.

GGUF quantizations of MainStack/marvy-1-14B for use with llama.cpp, Ollama, LM Studio, and compatible runtimes.

Released under Apache-2.0. Built with Qwen — see NOTICE.

Files

File Quant Size (approx) Use when
marvy-1-14B-Q4_K_M.gguf Q4_K_M ~9 GB Default — best size/quality balance, laptops
marvy-1-14B-Q8_0.gguf Q8_0 ~16 GB Highest fidelity, near-FP16 quality

Quick start

Ollama

ollama run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M

llama.cpp

./llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M \
  -p "Write a ServiceNow user story with acceptance criteria for P1 SLA escalation." \
  --temp 0.4

LM Studio

  1. In the model browser, search MainStack/marvy-1-14B-GGUF and download a quant (Q4_K_M recommended), or drop the .gguf into ~/.lmstudio/models/MainStack/marvy-1-14B-GGUF/.
  2. Load it, set the system prompt below, temperature ~0.4.
  3. To use from code/OpenCode, start the local server:
    lms server start          # OpenAI-compatible on http://localhost:1234/v1
    

Use in OpenCode

Point OpenCode at the local LM Studio (or llama.cpp) server as an OpenAI-compatible provider — see USAGE.md for the exact opencode.json snippet.

Recommended system prompt

You are a senior ServiceNow delivery consultant. You produce precise, implementation-grade
artifacts: business analyses, requirements, solution design documents, user stories with
acceptance criteria, test cases, and validation reviews. You favor out-of-the-box
capabilities, cite concrete tables/plugins/sys_ids when relevant, and write in clear
professional English.

📖 Full usage (all runtimes + OpenCode wiring): USAGE.md · Validate it works: VALIDATION.md

Provenance & limitations

See the merged model card for the full training data, anonymization methodology, evaluation (test ppl 13.107 on a project-disjoint split), and limitations. Quantization adds the usual minor quality reduction versus the FP16 model.

License & attribution

Dual-licensed: weights Apache-2.0, MainStack contributions (cards, docs, benchmark) CC-BY-4.0 — see LICENSING.md. If you use marvy-1-14B as a baseline, fine-tune it, distill from it, or evaluate against it, please credit MainStack and link to https://huggingface.co/MainStack/marvy-1-14B. Keep the NOTICE file intact (required by Apache-2.0 §4) and cite the entry on the merged model card.