intrect commited on
Commit
8bdb2c7
·
verified ·
1 Parent(s): d7f1985

docs: add MLX 4-bit format, update quant links in model card

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -13,6 +13,9 @@ tags:
13
  - gguf
14
  - llama-cpp
15
  - mlx
 
 
 
16
  base_model: Qwen/Qwen2.5-7B-Instruct
17
  pipeline_tag: text-generation
18
  ---
@@ -52,10 +55,9 @@ Reasoning Trace 기반 구조화된 투자 분석을 수행합니다.
52
 
53
  | Format | File | Size | Use Case |
54
  |--------|------|------|----------|
55
- | **BF16** (safetensors) | `model-*.safetensors` | 14.5 GB | Full precision, GPU inference |
56
- | **GGUF Q4_K_M** | `vela-dpo-v6-q4_k_m.gguf` | 4.4 GB | Fast & lightweight, GPU/CPU |
57
-
58
- > MLX 4-bit 양자화 모델은 별도 레포에서 제공 예정 (Apple Silicon 최적화)
59
 
60
  ---
61
 
@@ -300,7 +302,10 @@ print(outputs[0].outputs[0].text)
300
  ```python
301
  from mlx_lm import load, generate
302
 
303
- model, tokenizer = load("intrect/VELA") # or local MLX 4-bit path
 
 
 
304
 
305
  response = generate(
306
  model,
 
13
  - gguf
14
  - llama-cpp
15
  - mlx
16
+ - apple-silicon
17
+ - 4bit
18
+ - quantized
19
  base_model: Qwen/Qwen2.5-7B-Instruct
20
  pipeline_tag: text-generation
21
  ---
 
55
 
56
  | Format | File | Size | Use Case |
57
  |--------|------|------|----------|
58
+ | **BF16** (safetensors) | [`model-*.safetensors`](https://huggingface.co/intrect/VELA/tree/main) | 14.5 GB | Full precision, GPU inference |
59
+ | **GGUF Q4_K_M** | [`vela-dpo-v6-q4_k_m.gguf`](https://huggingface.co/intrect/VELA/blob/main/vela-dpo-v6-q4_k_m.gguf) | 4.4 GB | llama.cpp / Ollama / LM Studio |
60
+ | **MLX 4-bit** | [`mlx-int4/`](https://huggingface.co/intrect/VELA/tree/main/mlx-int4) | 4.0 GB | Apple Silicon (M1/M2/M3/M4) |
 
61
 
62
  ---
63
 
 
302
  ```python
303
  from mlx_lm import load, generate
304
 
305
+ # HF에서 mlx-int4 폴더만 다운로드
306
+ from huggingface_hub import snapshot_download
307
+ mlx_path = snapshot_download("intrect/VELA", allow_patterns="mlx-int4/*")
308
+ model, tokenizer = load(f"{mlx_path}/mlx-int4")
309
 
310
  response = generate(
311
  model,