2025-06-17 12:44:52,511 - INFO - [main_preprocess] - --- PREPROCESS-PHASE 1: 開始生成 Rejected 回應 --- 2025-06-17 12:44:52,511 - INFO - [prepare_datasets] - 從 train-reasoning-v2.csv 載入數據集 2025-06-17 12:44:56,454 - INFO - [get_balanced_memory] - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). 2025-06-17 12:44:58,974 - INFO - [main_preprocess] - 使用 map_batch_size = 4 進行生成 2025-06-17 14:24:28,082 - INFO - [main_preprocess] - Rejected 回應生成完畢。 2025-06-17 14:24:28,082 - INFO - [main_preprocess] - 釋放生成模型的 VRAM... 2025-06-17 14:24:28,082 - INFO - [main_preprocess] - VRAM 已釋放。 2025-06-17 14:24:28,082 - INFO - [main_preprocess] - --- PREPROCESS-PHASE 2: 開始計算獎勵分數 --- 2025-06-17 14:24:28,082 - INFO - [__init__] - 載入離線獎勵模型: google/gemma-2b-it 2025-06-17 14:24:31,163 - INFO - [get_balanced_memory] - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). 2025-06-17 15:01:13,102 - INFO - [main_preprocess] - 獎勵分數計算完畢。 2025-06-17 15:01:13,102 - INFO - [main_preprocess] - 釋放獎勵模型的 VRAM... 2025-06-17 15:01:13,150 - INFO - [main_preprocess] - VRAM 已釋放。 2025-06-17 15:01:13,150 - INFO - [main_preprocess] - --- PREPROCESS-PHASE 3: 儲存預處理完成的資料集 --- 2025-06-17 15:01:13,276 - INFO - [main_preprocess] - 資料已儲存至 train_preprocessed.jsonl 和 eval_preprocessed.jsonl 2025-06-17 15:01:13,276 - INFO - [main_preprocess] - 預處理全部完成!