intrect commited on
Commit
2448e8a
ยท
verified ยท
1 Parent(s): f650ef7

docs: update model card with GGUF formats, benchmarks, usage examples

Browse files
Files changed (1) hide show
  1. README.md +98 -35
README.md CHANGED
@@ -10,6 +10,8 @@ tags:
10
  - stock-analysis
11
  - reasoning
12
  - dpo
 
 
13
  base_model: Qwen/Qwen2.5-7B-Instruct
14
  pipeline_tag: text-generation
15
  ---
@@ -25,51 +27,92 @@ VELA๋Š” ํ•œ๊ตญ ์ฃผ์‹์‹œ์žฅ ๋‰ด์Šค ๋ถ„์„ ๋ฐ ํˆฌ์ž ๋ฆฌ์„œ์น˜๋ฅผ ์œ„ํ•ด ํŠนํ™”
25
  | ํ•ญ๋ชฉ | ๋‚ด์šฉ |
26
  |------|------|
27
  | **Base Model** | Qwen/Qwen2.5-7B-Instruct |
28
- | **Training Stage** | SFT + DPO v4 |
29
  | **Parameters** | 7.6B |
30
  | **Context Length** | 8,192 tokens |
31
- | **Precision** | BFloat16 |
32
  | **License** | Apache 2.0 |
33
 
 
 
 
 
 
 
 
 
34
  ## Training Pipeline
35
 
36
  ```
37
  Qwen2.5-7B-Instruct
38
  โ†“
39
  SFT (930K samples)
40
- - ํ•œ๊ตญ ์ฃผ์‹ ๋‰ด์Šค ๋ถ„์„
41
- - ๋ฆฌ์„œ์น˜ ๋ฆฌํฌํŠธ ์ƒ์„ฑ
42
- - Reasoning Trace ํ•™์Šต
43
  โ†“
44
- DPO v4 (7,681 pairs)
45
  - ์ค‘๊ตญ์–ด/์˜์–ด leak ๊ต์ •
46
  - ํ•œ๊ตญ์–ด ์ถœ๋ ฅ ๊ฐ•ํ™”
47
  - ํ˜•์‹ ์ค€์ˆ˜ ํ–ฅ์ƒ
48
  โ†“
49
- VELA v1.0
50
  ```
51
 
52
  ## Capabilities
53
 
54
  - **๋‰ด์Šค ์˜ํ–ฅ ๋ถ„์„**: ์ฃผ์‹ ๊ด€๋ จ ๋‰ด์Šค์˜ ์‹œ์žฅ ์˜ํ–ฅ๋„ ์˜ˆ์ธก
55
- - **๋ฆฌ์„œ์น˜ ๋ฆฌํฌํŠธ ์ƒ์„ฑ**: ๊ตฌ์กฐํ™”๋œ ํˆฌ์ž ๋ถ„์„ ๋ณด๊ณ ์„œ ์ž‘์„ฑ
56
- - **Reasoning Trace**: ๋‹จ๊ณ„๋ณ„ ๋ถ„์„ ์‚ฌ๊ณ ๊ณผ์ • ์ƒ์„ฑ
57
  - **๋‹ค์ค‘ ์†Œ์Šค ์ข…ํ•ฉ**: ๋‰ด์Šค, ์‹œ์„ธ, ์ˆ˜๊ธ‰ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ ๋ถ„์„
58
 
 
 
 
 
 
 
 
 
 
 
 
59
  ## Usage
60
 
61
- ### Transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ```python
64
  from transformers import AutoModelForCausalLM, AutoTokenizer
65
  import torch
66
 
67
  model = AutoModelForCausalLM.from_pretrained(
68
- "intrect/vela",
69
  torch_dtype=torch.bfloat16,
70
  device_map="auto"
71
  )
72
- tokenizer = AutoTokenizer.from_pretrained("intrect/vela")
73
 
74
  messages = [
75
  {"role": "system", "content": "๋‹น์‹ ์€ ํ•œ๊ตญ ์ฃผ์‹ ์ „๋ฌธ ์• ๋„๋ฆฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค."},
@@ -88,46 +131,66 @@ outputs = model.generate(
88
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
89
  ```
90
 
91
- ### vLLM (Recommended for Production)
92
 
93
  ```python
94
  from vllm import LLM, SamplingParams
95
 
96
- llm = LLM(model="intrect/vela", dtype="bfloat16")
97
  params = SamplingParams(temperature=0.7, max_tokens=1024)
98
 
99
  prompts = ["์‚ผ์„ฑ์ „์ž HBM ์‹œ์žฅ ์ „๋ง์„ ๋ถ„์„ํ•ด์ฃผ์„ธ์š”."]
100
  outputs = llm.generate(prompts, params)
101
  ```
102
 
103
- ### MLX (Apple Silicon)
104
-
105
- MLX ๋ณ€ํ™˜ ๋ชจ๋ธ์€ ๋ณ„๋„ ์ €์žฅ์†Œ์—์„œ ์ œ๊ณต ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
 
 
 
 
 
 
 
 
 
 
 
106
 
107
  ## Output Format
108
 
109
- VELA๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
  ```markdown
 
 
112
  ## Executive Summary
113
  [2-3๋ฌธ์žฅ ํ•ต์‹ฌ ์š”์•ฝ]
114
 
115
  ## Key Metrics
116
  | ์ง€ํ‘œ | ์ˆ˜์น˜ |
117
  |------|------|
118
- | ํ˜„์žฌ๊ฐ€ | โ‚ฉXX,XXX |
119
- | PER | XX.X |
120
- | ... | ... |
121
 
122
  ## ์‹œ์žฅ ๋™ํ–ฅ ๋ถ„์„
123
- [์ƒ์„ธ ๋ถ„์„]
124
-
125
  ## ๋ฆฌ์Šคํฌ ์š”์ธ
126
- - ๋ฆฌ์Šคํฌ 1
127
- - ๋ฆฌ์Šคํฌ 2
128
-
129
  ## ํˆฌ์ž ์˜๊ฒฌ
130
- [์ข…ํ•ฉ ์˜๊ฒฌ]
131
  ```
132
 
133
  ## Training Data
@@ -139,13 +202,11 @@ VELA๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
139
  | Reasoning Traces | 5K | ์‚ฌ๊ณ ๊ณผ์ • ํ•™์Šต |
140
  | DPO Pairs | 7.7K | ์„ ํ˜ธ๋„ ์ •๋ ฌ |
141
 
142
- ## DPO v4 Improvements
143
-
144
- DPO v4๋Š” ๋‹ค์Œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค:
145
 
146
- - โœ… **์ค‘๊ตญ์–ด leak ์ œ๊ฑฐ**: ์ค‘๊ตญ์–ด ๋ฌธ์ž ์ถœ๋ ฅ ๋ฐฉ์ง€
147
  - โœ… **์˜์–ด leak ๊ฐ์†Œ**: ๋ถˆํ•„์š”ํ•œ ์˜์–ด ์‚ฌ์šฉ ์ตœ์†Œํ™”
148
- - โœ… **ํ˜•์‹ ์ค€์ˆ˜**: ์ง€์ •๋œ ์ถœ๋ ฅ ํ˜•์‹ ์—„๊ฒฉ ์ค€์ˆ˜
149
  - โœ… **ํ•œ๊ตญ์–ด ํ’ˆ์งˆ**: ์ž์—ฐ์Šค๋Ÿฌ์šด ํ•œ๊ตญ์–ด ํ‘œํ˜„
150
 
151
  ## Limitations
@@ -153,6 +214,7 @@ DPO v4๋Š” ๋‹ค์Œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค:
153
  - ์‹ค์‹œ๊ฐ„ ์‹œ์„ธ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ๋ถˆ๊ฐ€ (์™ธ๋ถ€ API ํ•„์š”)
154
  - ํˆฌ์ž ์กฐ์–ธ์ด ์•„๋‹Œ ์ •๋ณด ์ œ๊ณต ๋ชฉ์ 
155
  - 8K ์ปจํ…์ŠคํŠธ ์ œํ•œ์œผ๋กœ ๊ธด ๋ฌธ์„œ ์ฒ˜๋ฆฌ ํ•œ๊ณ„
 
156
 
157
  ## Citation
158
 
@@ -162,7 +224,7 @@ DPO v4๋Š” ๋‹ค์Œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค:
162
  author={intrect},
163
  year={2026},
164
  publisher={Hugging Face},
165
- url={https://huggingface.co/intrect/vela}
166
  }
167
  ```
168
 
@@ -170,8 +232,9 @@ DPO v4๋Š” ๋‹ค์Œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค:
170
 
171
  | ๋ฒ„์ „ | ๋‚ ์งœ | ๋ณ€๊ฒฝ์‚ฌํ•ญ |
172
  |------|------|----------|
173
- | v1.0 (DPO v4) | 2026-01-28 | DPO v4 ๋ณ‘ํ•ฉ, ์ค‘๊ตญ์–ด/์˜์–ด leak ํ•ด๊ฒฐ |
174
- | v0.9 (SFT) | 2026-01-15 | SFT ๋ฒ ์ด์Šค ๋ชจ๋ธ ๊ณต๊ฐœ |
 
175
 
176
  ---
177
 
 
10
  - stock-analysis
11
  - reasoning
12
  - dpo
13
+ - gguf
14
+ - llama-cpp
15
  base_model: Qwen/Qwen2.5-7B-Instruct
16
  pipeline_tag: text-generation
17
  ---
 
27
  | ํ•ญ๋ชฉ | ๋‚ด์šฉ |
28
  |------|------|
29
  | **Base Model** | Qwen/Qwen2.5-7B-Instruct |
30
+ | **Training** | SFT (930K) + DPO (7,681 pairs) |
31
  | **Parameters** | 7.6B |
32
  | **Context Length** | 8,192 tokens |
 
33
  | **License** | Apache 2.0 |
34
 
35
+ ### Available Formats
36
+
37
+ | Format | File | Size | Use Case |
38
+ |--------|------|------|----------|
39
+ | **BF16** (safetensors) | `model.safetensors` | 15 GB | Full precision, GPU inference |
40
+ | **GGUF Q8_0** | `vela-q8_0.gguf` | 7.6 GB | High quality quantized, GPU/CPU |
41
+ | **GGUF Q4_K_M** | `vela-q4_k_m.gguf` | 4.4 GB | Fast & lightweight, GPU/CPU |
42
+
43
  ## Training Pipeline
44
 
45
  ```
46
  Qwen2.5-7B-Instruct
47
  โ†“
48
  SFT (930K samples)
49
+ - ํ•œ๊ตญ ์ฃผ์‹ ๋‰ด์Šค ๋ถ„์„ (412K)
50
+ - ๋ฆฌ์„œ์น˜ ๋ฆฌํฌํŠธ ์ƒ์„ฑ (50K)
51
+ - Reasoning Trace ํ•™์Šต (5K)
52
  โ†“
53
+ DPO (7,681 pairs)
54
  - ์ค‘๊ตญ์–ด/์˜์–ด leak ๊ต์ •
55
  - ํ•œ๊ตญ์–ด ์ถœ๋ ฅ ๊ฐ•ํ™”
56
  - ํ˜•์‹ ์ค€์ˆ˜ ํ–ฅ์ƒ
57
  โ†“
58
+ VELA
59
  ```
60
 
61
  ## Capabilities
62
 
63
  - **๋‰ด์Šค ์˜ํ–ฅ ๋ถ„์„**: ์ฃผ์‹ ๊ด€๋ จ ๋‰ด์Šค์˜ ์‹œ์žฅ ์˜ํ–ฅ๋„ ์˜ˆ์ธก
64
+ - **๋ฆฌ์„œ์น˜ ๋ฆฌํฌํŠธ ์ƒ์„ฑ**: ๊ตฌ์กฐํ™”๋œ ํˆฌ์ž ๋ถ„์„ ๋ณด๊ณ ์„œ (7๊ฐœ ์„น์…˜)
65
+ - **Reasoning Trace**: ๋‹จ๊ณ„๋ณ„ ๋ถ„์„ ์‚ฌ๊ณ ๊ณผ์ • (JSON ํ˜•์‹)
66
  - **๋‹ค์ค‘ ์†Œ์Šค ์ข…ํ•ฉ**: ๋‰ด์Šค, ์‹œ์„ธ, ์ˆ˜๊ธ‰ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ ๋ถ„์„
67
 
68
+ ## Quantization Benchmark
69
+
70
+ RTX 3060 12GB, llama-cpp-python, n_gpu_layers=-1, n_ctx=4096
71
+
72
+ | Format | Speed (tok/s) | Chinese Leak | Quality |
73
+ |--------|--------------|--------------|---------|
74
+ | **Q4_K_M** | **36 tok/s** | 0/5 CLEAN | Reasoning Trace + Report OK |
75
+ | **Q8_0** | 25 tok/s | 0/5 CLEAN | Reasoning Trace + Report OK |
76
+
77
+ > Stress test: 5ํšŒ ์—ฐ์† (Synthesis + 3K Reasoning Trace ๊ต๋Œ€) - ์–‘์ชฝ ๋ชจ๋‘ Chinese leak ์ œ๋กœ
78
+
79
  ## Usage
80
 
81
+ ### llama-cpp-python (Recommended for GGUF)
82
+
83
+ ```python
84
+ from llama_cpp import Llama
85
+
86
+ model = Llama(
87
+ model_path="vela-q4_k_m.gguf", # or vela-q8_0.gguf
88
+ n_ctx=4096,
89
+ n_gpu_layers=-1, # Full GPU offload
90
+ chat_format="chatml",
91
+ )
92
+
93
+ response = model.create_chat_completion(
94
+ messages=[
95
+ {"role": "system", "content": "๋‹น์‹ ์€ ํ•œ๊ตญ ์ฃผ์‹ ์ „๋ฌธ ์• ๋„๋ฆฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค."},
96
+ {"role": "user", "content": "์‚ผ์„ฑ์ „์ž HBM ์‚ฌ์—… ์ „๋ง์„ ๋ถ„์„ํ•ด์ฃผ์„ธ์š”."},
97
+ ],
98
+ max_tokens=1024,
99
+ temperature=0.7,
100
+ )
101
+ print(response["choices"][0]["message"]["content"])
102
+ ```
103
+
104
+ ### Transformers (BF16)
105
 
106
  ```python
107
  from transformers import AutoModelForCausalLM, AutoTokenizer
108
  import torch
109
 
110
  model = AutoModelForCausalLM.from_pretrained(
111
+ "intrect/VELA",
112
  torch_dtype=torch.bfloat16,
113
  device_map="auto"
114
  )
115
+ tokenizer = AutoTokenizer.from_pretrained("intrect/VELA")
116
 
117
  messages = [
118
  {"role": "system", "content": "๋‹น์‹ ์€ ํ•œ๊ตญ ์ฃผ์‹ ์ „๋ฌธ ์• ๋„๋ฆฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค."},
 
131
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
132
  ```
133
 
134
+ ### vLLM
135
 
136
  ```python
137
  from vllm import LLM, SamplingParams
138
 
139
+ llm = LLM(model="intrect/VELA", dtype="bfloat16")
140
  params = SamplingParams(temperature=0.7, max_tokens=1024)
141
 
142
  prompts = ["์‚ผ์„ฑ์ „์ž HBM ์‹œ์žฅ ์ „๋ง์„ ๋ถ„์„ํ•ด์ฃผ์„ธ์š”."]
143
  outputs = llm.generate(prompts, params)
144
  ```
145
 
146
+ ### Ollama
147
+
148
+ ```bash
149
+ # Modelfile
150
+ FROM ./vela-q4_k_m.gguf
151
+ TEMPLATE """<|im_start|>system
152
+ {{ .System }}<|im_end|>
153
+ <|im_start|>user
154
+ {{ .Prompt }}<|im_end|>
155
+ <|im_start|>assistant
156
+ """
157
+ PARAMETER temperature 0.7
158
+ PARAMETER num_ctx 4096
159
+ ```
160
 
161
  ## Output Format
162
 
163
+ VELA๋Š” ๋‘ ๊ฐ€์ง€ ์ถœ๋ ฅ ๋ชจ๋“œ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค:
164
+
165
+ ### 1. Reasoning Trace (๋ถ„์„ ๊ณผ์ •)
166
+
167
+ ```json
168
+ {
169
+ "step": 1,
170
+ "thought": "์‚ผ์„ฑ์ „์ž HBM3E 12๋‹จ ์–‘์‚ฐ ๊ด€๋ จ ๋‰ด์Šค ํ™•์ธ. ์ถ”๊ฐ€ ์ˆ˜์ฃผ ํ˜„ํ™ฉ๊ณผ ์‹œ์žฅ ์ ์œ ์œจ ํŒŒ์•… ํ•„์š”.",
171
+ "action": "search",
172
+ "query": "์‚ผ์„ฑ์ „์ž HBM3E 12๋‹จ ์ˆ˜์ฃผ ์‹œ์žฅ์ ์œ ์œจ",
173
+ "confidence": 0.45
174
+ }
175
+ ```
176
+
177
+ ### 2. Synthesis Report (์ตœ์ข… ๋ฆฌํฌํŠธ)
178
 
179
  ```markdown
180
+ # EOD ๋ฆฌํฌํŠธ: ์‚ผ์„ฑ์ „์ž (005930.KS)
181
+
182
  ## Executive Summary
183
  [2-3๋ฌธ์žฅ ํ•ต์‹ฌ ์š”์•ฝ]
184
 
185
  ## Key Metrics
186
  | ์ง€ํ‘œ | ์ˆ˜์น˜ |
187
  |------|------|
 
 
 
188
 
189
  ## ์‹œ์žฅ ๋™ํ–ฅ ๋ถ„์„
190
+ ## ์ˆ˜๊ธ‰ ๋ถ„์„
191
+ ## ๋‰ด์Šค ์˜ํ–ฅ ๋ถ„์„
192
  ## ๋ฆฌ์Šคํฌ ์š”์ธ
 
 
 
193
  ## ํˆฌ์ž ์˜๊ฒฌ
 
194
  ```
195
 
196
  ## Training Data
 
202
  | Reasoning Traces | 5K | ์‚ฌ๊ณ ๊ณผ์ • ํ•™์Šต |
203
  | DPO Pairs | 7.7K | ์„ ํ˜ธ๋„ ์ •๋ ฌ |
204
 
205
+ ## DPO Improvements
 
 
206
 
207
+ - โœ… **์ค‘๊ตญ์–ด leak ์ œ๊ฑฐ**: Stress test 10/10 CLEAN
208
  - โœ… **์˜์–ด leak ๊ฐ์†Œ**: ๋ถˆํ•„์š”ํ•œ ์˜์–ด ์‚ฌ์šฉ ์ตœ์†Œํ™”
209
+ - โœ… **ํ˜•์‹ ์ค€์ˆ˜**: Reasoning Trace JSON + 7-section Report
210
  - โœ… **ํ•œ๊ตญ์–ด ํ’ˆ์งˆ**: ์ž์—ฐ์Šค๋Ÿฌ์šด ํ•œ๊ตญ์–ด ํ‘œํ˜„
211
 
212
  ## Limitations
 
214
  - ์‹ค์‹œ๊ฐ„ ์‹œ์„ธ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ๋ถˆ๊ฐ€ (์™ธ๋ถ€ API ํ•„์š”)
215
  - ํˆฌ์ž ์กฐ์–ธ์ด ์•„๋‹Œ ์ •๋ณด ์ œ๊ณต ๋ชฉ์ 
216
  - 8K ์ปจํ…์ŠคํŠธ ์ œํ•œ์œผ๋กœ ๊ธด ๋ฌธ์„œ ์ฒ˜๋ฆฌ ํ•œ๊ณ„
217
+ - ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ์ˆ˜์น˜ ๊ฐ€๋Šฅ (์ˆ˜์น˜ ๋ฐ์ดํ„ฐ๋Š” ์™ธ๋ถ€ ๊ฒ€์ฆ ํ•„์š”)
218
 
219
  ## Citation
220
 
 
224
  author={intrect},
225
  year={2026},
226
  publisher={Hugging Face},
227
+ url={https://huggingface.co/intrect/VELA}
228
  }
229
  ```
230
 
 
232
 
233
  | ๋ฒ„์ „ | ๋‚ ์งœ | ๋ณ€๊ฒฝ์‚ฌํ•ญ |
234
  |------|------|----------|
235
+ | v1.1 | 2026-02-12 | GGUF ์–‘์žํ™” ๋ชจ๋ธ ์ถ”๊ฐ€ (Q4_K_M, Q8_0), ๋ฒค์น˜๋งˆํฌ |
236
+ | v1.0 | 2026-01-28 | DPO ๋ณ‘ํ•ฉ, ์ค‘๊ตญ์–ด/์˜์–ด leak ํ•ด๊ฒฐ |
237
+ | v0.9 | 2026-01-15 | SFT ๋ฒ ์ด์Šค ๋ชจ๋ธ ๊ณต๊ฐœ |
238
 
239
  ---
240