File size: 16,592 Bytes
4c80134 f6007ba 4c80134 f6007ba 4c80134 f6007ba 4c80134 f6007ba 4c80134 f6007ba 4c80134 f6007ba 4c80134 f6007ba | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
language: ko
license: apache-2.0
base_model: google/gemma-3-1b-it
tags:
- math
- korean
- sft
- gemma
- distillation
datasets:
- NotoriousH2/HRM8K
---
# Gemma-3-1B-IT Math SFT
`google/gemma-3-1b-it`๋ฅผ ํ๊ตญ์ด ์ํ ๋ฌธ์ (GSM8K)์ ๋ํด ๊ต์ฌ ์ฆ๋ฅ SFTํ ๋ชจ๋ธ.
## ์ฑ๋ฅ
| Benchmark | Score |
|-----------|-------|
| HRM8K eval GSM8K (264๋ฌธ์ , Korean) | **~44.9%** (3ํ ํ๊ท ) |
| HRM8K eval MATH (577๋ฌธ์ , Korean) | ~17% |
ํ๊ฐ: temperature=0, vLLM ์๋น, max_tokens=2048
## ๋ฐ์ดํฐ ์์ฑ ํ์ดํ๋ผ์ธ
### ์๋ณธ ๋ฐ์ดํฐ
- **GSM8K train set**: ์์ด ์ด๋ฑ ์ํ 7,473๋ฌธ์ ([openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k))
- **ํ๊ฐ ๋ฐ์ดํฐ**: [HRM8K](https://huggingface.co/datasets/NotoriousH2/HRM8K) eval set 841๋ฌธ์ (GSM8K 264 + MATH 577, ํ๊ตญ์ด)
### ๊ต์ฌ ๋ชจ๋ธ
- **๋ชจ๋ธ**: (AWQ 4bit)
- **์๋น**: vLLM 0.11.0 (๊ตฌ๋ฒ์ ํ์, AWQ ํธํ),
- **์ค์**: ์ด ๊ต์ฌ ๋ชจ๋ธ์ HRM8K ํ์ต ๋ฐ์ดํฐ๋ฅผ ์์ฑํ ๋ฐ๋ก ๊ทธ ๋ชจ๋ธ. ๋ค๋ฅธ ๊ต์ฌ ๋ชจ๋ธ(Qwen3.5-9B/35B ๋ฑ)์ ์คํ์ผ ๋ถ์ผ์น๋ก -10%p ์ด์ ์ฑ๋ฅ ํ๋ฝ.
### 2๋จ๊ณ ๋ฐ์ดํฐ ์์ฑ
### ์ต์ข
ํ์ต ๋ฐ์ดํฐ ํ์
- ์ด 26,254๊ฐ (train 95% / eval 5% split)
- ์์คํ
ํ๋กฌํํธ:
## ํ์ต ์ค์
## ์ฌํ ๋ฐฉ๋ฒ
INFO 03-19 14:51:58 [__init__.py:216] Automatically detected platform cuda.
[1;36m(APIServer pid=3426235)[0;0m INFO 03-19 14:52:04 [api_server.py:1839] vLLM API server version 0.11.0
[1;36m(APIServer pid=3426235)[0;0m INFO 03-19 14:52:04 [utils.py:233] non-default args: {'model_tag': 'cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', 'port': 8001, 'model': 'cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', 'max_model_len': 4096, 'gpu_memory_utilization': 0.8}
[1;36m(APIServer pid=3426235)[0;0m INFO 03-19 14:52:06 [model.py:547] Resolved architecture: Qwen3MoeForCausalLM
[1;36m(APIServer pid=3426235)[0;0m INFO 03-19 14:52:06 [model.py:1510] Using max model len 4096
[1;36m(APIServer pid=3426235)[0;0m INFO 03-19 14:52:07 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 03-19 14:52:19 [__init__.py:216] Automatically detected platform cuda.
[1;36m(EngineCore_DP0 pid=3426796)[0;0m INFO 03-19 14:52:25 [core.py:644] Waiting for init message from front-end.
[1;36m(EngineCore_DP0 pid=3426796)[0;0m INFO 03-19 14:52:25 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', speculative_config=None, tokenizer='cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=compressed-tensors, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[1;36m(EngineCore_DP0 pid=3426796)[0;0m INFO 03-19 14:52:26 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] EngineCore failed to start.
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] Traceback (most recent call last):
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] super().__init__(vllm_config, executor_class, log_stats,
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] self.model_executor = executor_class(vllm_config)
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] self._init_executor()
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 54, in _init_executor
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] self.collective_rpc("init_device")
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] return func(*args, **kwargs)
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 259, in init_device
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] self.worker.init_device() # type: ignore
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 187, in init_device
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] raise ValueError(
[1;36m(EngineCore_DP0 pid=3426796)[0;0m ERROR 03-19 14:52:26 [core.py:708] ValueError: Free memory on device (7.29/93.1 GiB) on startup is less than desired GPU memory utilization (0.8, 74.48 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
INFO 03-19 14:52:34 [__init__.py:216] Automatically detected platform cuda.
[1;36m(APIServer pid=3427292)[0;0m INFO 03-19 14:52:40 [api_server.py:1839] vLLM API server version 0.11.0
[1;36m(APIServer pid=3427292)[0;0m INFO 03-19 14:52:40 [utils.py:233] non-default args: {'model_tag': './outputs/models/c17d-gemma-3-1b-it-Math', 'model': './outputs/models/c17d-gemma-3-1b-it-Math', 'dtype': 'bfloat16', 'max_model_len': 4096, 'gpu_memory_utilization': 0.85}
[1;36m(APIServer pid=3427292)[0;0m INFO 03-19 14:52:51 [model.py:547] Resolved architecture: Gemma3ForCausalLM
[1;36m(APIServer pid=3427292)[0;0m INFO 03-19 14:52:51 [model.py:1510] Using max model len 4096
[1;36m(APIServer pid=3427292)[0;0m INFO 03-19 14:52:51 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 03-19 14:52:58 [__init__.py:216] Automatically detected platform cuda.
[1;36m(EngineCore_DP0 pid=3428117)[0;0m INFO 03-19 14:53:03 [core.py:644] Waiting for init message from front-end.
[1;36m(EngineCore_DP0 pid=3428117)[0;0m INFO 03-19 14:53:03 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='./outputs/models/c17d-gemma-3-1b-it-Math', speculative_config=None, tokenizer='./outputs/models/c17d-gemma-3-1b-it-Math', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=./outputs/models/c17d-gemma-3-1b-it-Math, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[1;36m(EngineCore_DP0 pid=3428117)[0;0m INFO 03-19 14:53:05 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] EngineCore failed to start.
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] Traceback (most recent call last):
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] super().__init__(vllm_config, executor_class, log_stats,
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] self.model_executor = executor_class(vllm_config)
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] self._init_executor()
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 54, in _init_executor
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] self.collective_rpc("init_device")
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] return func(*args, **kwargs)
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 259, in init_device
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] self.worker.init_device() # type: ignore
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] File "/tmp/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 187, in init_device
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] raise ValueError(
[1;36m(EngineCore_DP0 pid=3428117)[0;0m ERROR 03-19 14:53:05 [core.py:708] ValueError: Free memory on device (7.29/93.1 GiB) on startup is less than desired GPU memory utilization (0.85, 79.13 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
## ํ์ผ
- : SFT ํ์ต ์คํฌ๋ฆฝํธ
- : HRM8K ํ๊ฐ ์คํฌ๋ฆฝํธ (vLLM OpenAI API ํธํ ์๋ฒ ํ์)
|