tgetsov commited on
Commit
1bbfd88
·
verified ·
1 Parent(s): a2dd49b

Upload USAGE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. USAGE.md +23 -23
USAGE.md CHANGED
@@ -1,6 +1,6 @@
1
- # Using marvy-14B
2
 
3
- marvy-14B is a ServiceNow delivery specialist. This guide covers every common
4
  way to run it — cloud or fully local — plus how to wire it into OpenCode.
5
 
6
  - [Choosing a format](#choosing-a-format)
@@ -11,7 +11,7 @@ way to run it — cloud or fully local — plus how to wire it into OpenCode.
11
  - [LM Studio (GUI + local server)](#lm-studio-gui--local-server)
12
  - [Ollama / llama.cpp (GGUF)](#ollama--llamacpp-gguf)
13
  - [LoRA adapter (apply on the base)](#lora-adapter-apply-on-the-base)
14
- - [Use marvy-14B in OpenCode](#use-marvy-14b-in-opencode)
15
  - [Prompt recipes per task](#prompt-recipes-per-task)
16
 
17
  ---
@@ -20,10 +20,10 @@ way to run it — cloud or fully local — plus how to wire it into OpenCode.
20
 
21
  | You want… | Use | Repo |
22
  |---|---|---|
23
- | Max quality, GPU/server | Merged FP16 | `MainStack/marvy-14B` |
24
- | Apple Silicon, native speed | Merged (MLX) | `MainStack/marvy-14B` |
25
- | Laptop / CPU / Ollama / LM Studio | GGUF (Q4_K_M or Q8_0) | `MainStack/marvy-14B-GGUF` |
26
- | Smallest download, compose yourself | LoRA adapter (~175 MB) | `MainStack/marvy-14B-lora` |
27
 
28
  ---
29
 
@@ -52,7 +52,7 @@ professional English.
52
  ```python
53
  from transformers import AutoTokenizer, AutoModelForCausalLM
54
 
55
- model_id = "MainStack/marvy-14B"
56
  tok = AutoTokenizer.from_pretrained(model_id)
57
  model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
58
 
@@ -70,12 +70,12 @@ print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
70
 
71
  ```bash
72
  pip install vllm
73
- vllm serve MainStack/marvy-14B --served-model-name marvy-14B
74
  ```
75
 
76
  ```bash
77
  curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
78
- "model": "marvy-14B", "temperature": 0.4,
79
  "messages": [
80
  {"role":"system","content":"You are a senior ServiceNow delivery consultant. ..."},
81
  {"role":"user","content":"Draft the Incident Management section of an SDD."}
@@ -88,22 +88,22 @@ curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/
88
  pip install mlx-lm
89
 
90
  # one-off
91
- python -m mlx_lm generate --model MainStack/marvy-14B \
92
  --system-prompt "You are a senior ServiceNow delivery consultant. ..." \
93
  --prompt "Write test cases for a Major Incident workflow." --max-tokens 1024 --temp 0.4
94
 
95
  # OpenAI-compatible server
96
- python -m mlx_lm server --model MainStack/marvy-14B --port 8080
97
  ```
98
 
99
  ## LM Studio (GUI + local server)
100
 
101
- 1. **Install the model** — either search `MainStack/marvy-14B-GGUF` in the
102
  in-app model browser, or place a local copy under
103
- `~/.lmstudio/models/MainStack/marvy-14B/` (MLX or GGUF layout).
104
  2. **Load** it from the GUI, or:
105
  ```bash
106
- lms load MainStack/marvy-14B
107
  lms server start # OpenAI-compatible on http://localhost:1234/v1
108
  ```
109
  3. In the Chat tab, set the system prompt (above) and temperature ~0.4.
@@ -112,10 +112,10 @@ python -m mlx_lm server --model MainStack/marvy-14B --port 8080
112
 
113
  ```bash
114
  # Ollama — pull straight from the Hub
115
- ollama run hf.co/MainStack/marvy-14B-GGUF:Q4_K_M
116
 
117
  # llama.cpp
118
- llama-cli -hf MainStack/marvy-14B-GGUF:Q4_K_M \
119
  -p "Write a user story with acceptance criteria for P1 SLA escalation." --temp 0.4
120
  ```
121
 
@@ -139,19 +139,19 @@ from peft import PeftModel
139
  from transformers import AutoModelForCausalLM, AutoTokenizer
140
  base = "Qwen/Qwen2.5-14B-Instruct"
141
  model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
142
- model = PeftModel.from_pretrained(model, "MainStack/marvy-14B-lora")
143
  ```
144
 
145
  ---
146
 
147
- ## Use marvy-14B in OpenCode
148
 
149
  marvy runs behind any OpenAI-compatible endpoint (LM Studio, mlx_lm server,
150
  vLLM). Register it as a custom provider in `opencode.json`.
151
 
152
  1. **Start a local server** (LM Studio shown; adjust port for others):
153
  ```bash
154
- lms load MainStack/marvy-14B && lms server start # http://localhost:1234/v1
155
  ```
156
  2. **Add the provider** to your project `opencode.json` (or global
157
  `~/.config/opencode/opencode.json`):
@@ -163,15 +163,15 @@ vLLM). Register it as a custom provider in `opencode.json`.
163
  "name": "LM Studio (local)",
164
  "options": { "baseURL": "http://localhost:1234/v1" },
165
  "models": {
166
- "marvy-14B": { "name": "marvy-14B (ServiceNow delivery)" }
167
  }
168
  }
169
  }
170
  }
171
  ```
172
- 3. **Select** `lmstudio/marvy-14B` in the OpenCode model picker.
173
 
174
- > marvy-14B is a drafting specialist, not a tool-use/agentic fine-tune. It excels
175
  > at producing delivery artifacts inside chat; for MCP tool-calling agent loops,
176
  > keep a frontier model as the orchestrator and switch to marvy for drafting.
177
 
 
1
+ # Using marvy-1-14B
2
 
3
+ marvy-1-14B is a ServiceNow delivery specialist. This guide covers every common
4
  way to run it — cloud or fully local — plus how to wire it into OpenCode.
5
 
6
  - [Choosing a format](#choosing-a-format)
 
11
  - [LM Studio (GUI + local server)](#lm-studio-gui--local-server)
12
  - [Ollama / llama.cpp (GGUF)](#ollama--llamacpp-gguf)
13
  - [LoRA adapter (apply on the base)](#lora-adapter-apply-on-the-base)
14
+ - [Use marvy-1-14B in OpenCode](#use-marvy-14b-in-opencode)
15
  - [Prompt recipes per task](#prompt-recipes-per-task)
16
 
17
  ---
 
20
 
21
  | You want… | Use | Repo |
22
  |---|---|---|
23
+ | Max quality, GPU/server | Merged FP16 | `MainStack/marvy-1-14B` |
24
+ | Apple Silicon, native speed | Merged (MLX) | `MainStack/marvy-1-14B` |
25
+ | Laptop / CPU / Ollama / LM Studio | GGUF (Q4_K_M or Q8_0) | `MainStack/marvy-1-14B-GGUF` |
26
+ | Smallest download, compose yourself | LoRA adapter (~175 MB) | `MainStack/marvy-1-14B-lora` |
27
 
28
  ---
29
 
 
52
  ```python
53
  from transformers import AutoTokenizer, AutoModelForCausalLM
54
 
55
+ model_id = "MainStack/marvy-1-14B"
56
  tok = AutoTokenizer.from_pretrained(model_id)
57
  model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
58
 
 
70
 
71
  ```bash
72
  pip install vllm
73
+ vllm serve MainStack/marvy-1-14B --served-model-name marvy-1-14B
74
  ```
75
 
76
  ```bash
77
  curl -s http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
78
+ "model": "marvy-1-14B", "temperature": 0.4,
79
  "messages": [
80
  {"role":"system","content":"You are a senior ServiceNow delivery consultant. ..."},
81
  {"role":"user","content":"Draft the Incident Management section of an SDD."}
 
88
  pip install mlx-lm
89
 
90
  # one-off
91
+ python -m mlx_lm generate --model MainStack/marvy-1-14B \
92
  --system-prompt "You are a senior ServiceNow delivery consultant. ..." \
93
  --prompt "Write test cases for a Major Incident workflow." --max-tokens 1024 --temp 0.4
94
 
95
  # OpenAI-compatible server
96
+ python -m mlx_lm server --model MainStack/marvy-1-14B --port 8080
97
  ```
98
 
99
  ## LM Studio (GUI + local server)
100
 
101
+ 1. **Install the model** — either search `MainStack/marvy-1-14B-GGUF` in the
102
  in-app model browser, or place a local copy under
103
+ `~/.lmstudio/models/MainStack/marvy-1-14B/` (MLX or GGUF layout).
104
  2. **Load** it from the GUI, or:
105
  ```bash
106
+ lms load MainStack/marvy-1-14B
107
  lms server start # OpenAI-compatible on http://localhost:1234/v1
108
  ```
109
  3. In the Chat tab, set the system prompt (above) and temperature ~0.4.
 
112
 
113
  ```bash
114
  # Ollama — pull straight from the Hub
115
+ ollama run hf.co/MainStack/marvy-1-14B-GGUF:Q4_K_M
116
 
117
  # llama.cpp
118
+ llama-cli -hf MainStack/marvy-1-14B-GGUF:Q4_K_M \
119
  -p "Write a user story with acceptance criteria for P1 SLA escalation." --temp 0.4
120
  ```
121
 
 
139
  from transformers import AutoModelForCausalLM, AutoTokenizer
140
  base = "Qwen/Qwen2.5-14B-Instruct"
141
  model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
142
+ model = PeftModel.from_pretrained(model, "MainStack/marvy-1-14B-lora")
143
  ```
144
 
145
  ---
146
 
147
+ ## Use marvy-1-14B in OpenCode
148
 
149
  marvy runs behind any OpenAI-compatible endpoint (LM Studio, mlx_lm server,
150
  vLLM). Register it as a custom provider in `opencode.json`.
151
 
152
  1. **Start a local server** (LM Studio shown; adjust port for others):
153
  ```bash
154
+ lms load MainStack/marvy-1-14B && lms server start # http://localhost:1234/v1
155
  ```
156
  2. **Add the provider** to your project `opencode.json` (or global
157
  `~/.config/opencode/opencode.json`):
 
163
  "name": "LM Studio (local)",
164
  "options": { "baseURL": "http://localhost:1234/v1" },
165
  "models": {
166
+ "marvy-1-14B": { "name": "marvy-1-14B (ServiceNow delivery)" }
167
  }
168
  }
169
  }
170
  }
171
  ```
172
+ 3. **Select** `lmstudio/marvy-1-14B` in the OpenCode model picker.
173
 
174
+ > marvy-1-14B is a drafting specialist, not a tool-use/agentic fine-tune. It excels
175
  > at producing delivery artifacts inside chat; for MCP tool-calling agent loops,
176
  > keep a frontier model as the orchestrator and switch to marvy for drafting.
177