rosieyzh/rlvr_qwen15_warmstart_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Rosie Zhao
rosieyzh
·
AI & ML interests
theory of machine learning, deep learning
Recent Activity
updated
a model
1 day ago
rosieyzh/sft_llama1_xsum_lr_1e-5_cosine_bsz_64_ckpt_5_of_5
published
a model
1 day ago
rosieyzh/sft_llama1_xsum_lr_1e-5_cosine_bsz_64_ckpt_5_of_5
updated
a model
1 day ago
rosieyzh/sft_llama1_xsum_lr_1e-5_cosine_bsz_64_ckpt_4_of_5
Organizations
Llama-3.2-1B Warmstart RLVR - Translation
rlvr_llama1_warmstart_bleu_alma_rbz_{128, 256}_ckpt_{i}_of_10
-
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_1_of_10
1B • Updated • 20 -
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_2_of_10
1B • Updated • 36 -
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_3_of_10
1B • Updated • 21 -
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_4_of_10
1B • Updated • 37
Qwen2.5-1.5B Warmstart RLVR - Code
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_{32, 64}_ckpt_{I}_of_10
-
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_1_of_10
2B • Updated • 25 -
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_2_of_10
2B • Updated • 28 -
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_3_of_10
2B • Updated • 27 -
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_4_of_10
2B • Updated • 29
Qwen2.5-1.5B SFT - Code
rosieyzh/sft_qwen15_code200_lr_{1e-5, 5e-6}_{cosine, constant}_bsz_{64, 128}_ckpt_{i}_of_5
-
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_1_of_5
2B • Updated • 26 -
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_2_of_5
2B • Updated • 21 -
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_3_of_5
2B • Updated • 25 -
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_4_of_5
2B • Updated • 26
OLMo-1B-as_fm3_tg_omi1_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, OMI1, and OMI2. Includes checkpoints from doing PPO using GSM8K train.
-
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo
Text Generation • 1B • Updated • 1 -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode1
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode2
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode3
Text Generation • 1B • Updated • 2
Qwen2.5-1.5B RLVR - GSM8K
rosieyzh/rlvr_qwen15_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Llama-3.2-1B RLVR - Translation
rlvr_llama1_bleu_alma_rbz_{128,256}_ckpt_{i}_of_10
128: [7, 12, 21, 36, 62, 106, 182, 313, 535, 917]
256: [3, 6, 10, 18, 31, 53, 91, 156, 267, 458]
Qwen2.5-1.5B RLVR - Code
rosieyzh/rlvr_qwen15_code200_rbz_{32, 64}_ckpt_{I}_of_10
OLMo-150M and OLMo-1B Pretrained Models
Pretrained models from scratch used in "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining".
OLMo-1B-as_fm3_tg_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, and OpenMathInstruct2. Includes checkpoints from doing PPO using GSM8K train.
Qwen2.5-1.5B Warmstart RLVR - GSM8K
rosieyzh/rlvr_qwen15_warmstart_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Qwen2.5-1.5B RLVR - GSM8K
rosieyzh/rlvr_qwen15_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Llama-3.2-1B Warmstart RLVR - Translation
rlvr_llama1_warmstart_bleu_alma_rbz_{128, 256}_ckpt_{i}_of_10
-
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_1_of_10
1B • Updated • 20 -
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_2_of_10
1B • Updated • 36 -
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_3_of_10
1B • Updated • 21 -
rosieyzh/rlvr_llama1_warmstart_bleu_alma_rbz_256_ckpt_4_of_10
1B • Updated • 37
Llama-3.2-1B RLVR - Translation
rlvr_llama1_bleu_alma_rbz_{128,256}_ckpt_{i}_of_10
128: [7, 12, 21, 36, 62, 106, 182, 313, 535, 917]
256: [3, 6, 10, 18, 31, 53, 91, 156, 267, 458]
Qwen2.5-1.5B Warmstart RLVR - Code
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_{32, 64}_ckpt_{I}_of_10
-
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_1_of_10
2B • Updated • 25 -
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_2_of_10
2B • Updated • 28 -
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_3_of_10
2B • Updated • 27 -
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_32_ckpt_4_of_10
2B • Updated • 29
Qwen2.5-1.5B RLVR - Code
rosieyzh/rlvr_qwen15_code200_rbz_{32, 64}_ckpt_{I}_of_10
Qwen2.5-1.5B SFT - Code
rosieyzh/sft_qwen15_code200_lr_{1e-5, 5e-6}_{cosine, constant}_bsz_{64, 128}_ckpt_{i}_of_5
-
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_1_of_5
2B • Updated • 26 -
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_2_of_5
2B • Updated • 21 -
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_3_of_5
2B • Updated • 25 -
rosieyzh/sft_qwen15_code200_lr_1e-5_cosine_bsz_64_ckpt_4_of_5
2B • Updated • 26
OLMo-150M and OLMo-1B Pretrained Models
Pretrained models from scratch used in "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining".
OLMo-1B-as_fm3_tg_omi1_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, OMI1, and OMI2. Includes checkpoints from doing PPO using GSM8K train.
-
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo
Text Generation • 1B • Updated • 1 -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode1
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode2
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode3
Text Generation • 1B • Updated • 2
OLMo-1B-as_fm3_tg_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, and OpenMathInstruct2. Includes checkpoints from doing PPO using GSM8K train.