File size: 16,592 Bytes
4c80134
 
 
 
 
 
 
 
 
f6007ba
 
 
4c80134
 
 
 
f6007ba
4c80134
 
f6007ba
4c80134
 
f6007ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c80134
f6007ba
4c80134
f6007ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c80134
 
f6007ba
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
language: ko
license: apache-2.0
base_model: google/gemma-3-1b-it
tags:
- math
- korean
- sft
- gemma
- distillation
datasets:
- NotoriousH2/HRM8K
---

# Gemma-3-1B-IT Math SFT

`google/gemma-3-1b-it`๋ฅผ ํ•œ๊ตญ์–ด ์ˆ˜ํ•™ ๋ฌธ์ œ(GSM8K)์— ๋Œ€ํ•ด ๊ต์‚ฌ ์ฆ๋ฅ˜ SFTํ•œ ๋ชจ๋ธ.

## ์„ฑ๋Šฅ

| Benchmark | Score |
|-----------|-------|
| HRM8K eval GSM8K (264๋ฌธ์ œ, Korean) | **~44.9%** (3ํšŒ ํ‰๊ท ) |
| HRM8K eval MATH (577๋ฌธ์ œ, Korean) | ~17% |

ํ‰๊ฐ€: temperature=0, vLLM ์„œ๋น™, max_tokens=2048

## ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ

### ์›๋ณธ ๋ฐ์ดํ„ฐ
- **GSM8K train set**: ์˜์–ด ์ดˆ๋“ฑ ์ˆ˜ํ•™ 7,473๋ฌธ์ œ ([openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k))
- **ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ**: [HRM8K](https://huggingface.co/datasets/NotoriousH2/HRM8K) eval set 841๋ฌธ์ œ (GSM8K 264 + MATH 577, ํ•œ๊ตญ์–ด)

### ๊ต์‚ฌ ๋ชจ๋ธ
- **๋ชจ๋ธ**:  (AWQ 4bit)
- **์„œ๋น™**: vLLM 0.11.0 (๊ตฌ๋ฒ„์ „ ํ•„์š”, AWQ ํ˜ธํ™˜), 
- **์ค‘์š”**: ์ด ๊ต์‚ฌ ๋ชจ๋ธ์€ HRM8K ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ ๋ฐ”๋กœ ๊ทธ ๋ชจ๋ธ. ๋‹ค๋ฅธ ๊ต์‚ฌ ๋ชจ๋ธ(Qwen3.5-9B/35B ๋“ฑ)์€ ์Šคํƒ€์ผ ๋ถˆ์ผ์น˜๋กœ -10%p ์ด์ƒ ์„ฑ๋Šฅ ํ•˜๋ฝ.

### 2๋‹จ๊ณ„ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ


### ์ตœ์ข… ํ•™์Šต ๋ฐ์ดํ„ฐ ํ˜•์‹

- ์ด 26,254๊ฐœ (train 95% / eval 5% split)
- ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ: 

## ํ•™์Šต ์„ค์ •



## ์žฌํ˜„ ๋ฐฉ๋ฒ•

INFO 03-19 14:51:58 [__init__.py:216] Automatically detected platform cuda.
(APIServer pid=3426235) INFO 03-19 14:52:04 [api_server.py:1839] vLLM API server version 0.11.0
(APIServer pid=3426235) INFO 03-19 14:52:04 [utils.py:233] non-default args: {'model_tag': 'cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', 'port': 8001, 'model': 'cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', 'max_model_len': 4096, 'gpu_memory_utilization': 0.8}
(APIServer pid=3426235) INFO 03-19 14:52:06 [model.py:547] Resolved architecture: Qwen3MoeForCausalLM
(APIServer pid=3426235) INFO 03-19 14:52:06 [model.py:1510] Using max model len 4096
(APIServer pid=3426235) INFO 03-19 14:52:07 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 03-19 14:52:19 [__init__.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=3426796) INFO 03-19 14:52:25 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=3426796) INFO 03-19 14:52:25 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', speculative_config=None, tokenizer='cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=3426796) INFO 03-19 14:52:26 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     self._init_executor()
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 54, in _init_executor
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     self.collective_rpc("init_device")
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 259, in init_device
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     self.worker.init_device()  # type: ignore
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 187, in init_device
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708]     raise ValueError(
(EngineCore_DP0 pid=3426796) ERROR 03-19 14:52:26 [core.py:708] ValueError: Free memory on device (7.29/93.1 GiB) on startup is less than desired GPU memory utilization (0.8, 74.48 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
INFO 03-19 14:52:34 [__init__.py:216] Automatically detected platform cuda.
(APIServer pid=3427292) INFO 03-19 14:52:40 [api_server.py:1839] vLLM API server version 0.11.0
(APIServer pid=3427292) INFO 03-19 14:52:40 [utils.py:233] non-default args: {'model_tag': './outputs/models/c17d-gemma-3-1b-it-Math', 'model': './outputs/models/c17d-gemma-3-1b-it-Math', 'dtype': 'bfloat16', 'max_model_len': 4096, 'gpu_memory_utilization': 0.85}
(APIServer pid=3427292) INFO 03-19 14:52:51 [model.py:547] Resolved architecture: Gemma3ForCausalLM
(APIServer pid=3427292) INFO 03-19 14:52:51 [model.py:1510] Using max model len 4096
(APIServer pid=3427292) INFO 03-19 14:52:51 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 03-19 14:52:58 [__init__.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=3428117) INFO 03-19 14:53:03 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=3428117) INFO 03-19 14:53:03 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='./outputs/models/c17d-gemma-3-1b-it-Math', speculative_config=None, tokenizer='./outputs/models/c17d-gemma-3-1b-it-Math', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=./outputs/models/c17d-gemma-3-1b-it-Math, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=3428117) INFO 03-19 14:53:05 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     self._init_executor()
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 54, in _init_executor
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     self.collective_rpc("init_device")
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 259, in init_device
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     self.worker.init_device()  # type: ignore
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]   File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 187, in init_device
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708]     raise ValueError(
(EngineCore_DP0 pid=3428117) ERROR 03-19 14:53:05 [core.py:708] ValueError: Free memory on device (7.29/93.1 GiB) on startup is less than desired GPU memory utilization (0.85, 79.13 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.

## ํŒŒ์ผ
- : SFT ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
- : HRM8K ํ‰๊ฐ€ ์Šคํฌ๋ฆฝํŠธ (vLLM OpenAI API ํ˜ธํ™˜ ์„œ๋ฒ„ ํ•„์š”)