Tiiny
/

SmallThinker-3B-Preview

Text Generation

text-generation-inference

Model card Files Files and versions

Yixin Song commited on Jan 6, 2025

Commit

d29e0f4

·

verified ·

1 Parent(s): ccbda51

Update README.md

Files changed (1) hide show

README.md +50 -16

README.md CHANGED Viewed

@@ -35,30 +35,64 @@ SmallThinker is designed for the following use cases:
 The model was trained using 8 H100 GPUs with a global batch size of 16. The specific configuration is as follows:
-```
-neat_packing: true
-cutoff_len: 16384
-per_device_train_batch_size: 2
-gradient_accumulation_steps: 1
-learning_rate: 1.0e-5
-num_train_epochs: 3
-lr_scheduler_type: cosine
-warmup_ratio: 0.02
-bf16: true
-ddp_timeout: 180000000
-weight_decay: 0.0
-```
 The SFT (Supervised Fine-Tuning) process was conducted in two phases:
 1. First Phase:
    - Used only the PowerInfer/QWQ-LONGCOT-500K dataset
    - Trained for 1.5 epochs
 2. Second Phase:
    - Combined training with PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets
    - Continued training for 2 additional epochs
 ## Limitations & Disclaimer

 The model was trained using 8 H100 GPUs with a global batch size of 16. The specific configuration is as follows:
 The SFT (Supervised Fine-Tuning) process was conducted in two phases:
 1. First Phase:
    - Used only the PowerInfer/QWQ-LONGCOT-500K dataset
    - Trained for 1.5 epochs
+```
+### model
+model_name_or_path: saves/qwen2-01-qat/full/sft/checkpoint-24000
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+deepspeed: examples/deepspeed/ds_z3_config.json
+### dataset
+dataset: o1-v2
+template: qwen
+neat_packing: true
+cutoff_len: 16384
+overwrite_cache: true
+preprocessing_num_workers: 16
+### output
+output_dir: saves/qwen2-01-qat/full/sft
+logging_steps: 1
+save_steps: 1000
+plot_loss: true
+overwrite_output_dir: true
+```
 2. Second Phase:
    - Combined training with PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets
    - Continued training for 2 additional epochs
+```
+### model
+model_name_or_path: /home/syx/Qwen2.5-3B-Instruct
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+deepspeed: examples/deepspeed/ds_z3_config.json
+### dataset
+dataset: o1-v2, o1-v3
+template: qwen
+neat_packing: true
+cutoff_len: 16384
+overwrite_cache: true
+preprocessing_num_workers: 16
+### output
+output_dir: saves/qwen2-01-qat/full/sft
+logging_steps: 1
+save_steps: 1000
+plot_loss: true
+overwrite_output_dir: true
+```
 ## Limitations & Disclaimer