Datasets used in "Understanding the Design Space and Cross-Modality Transfer for Vision-Language Models"
Rosie Zhao
rosieyzh
·
AI & ML interests
theory of machine learning, deep learning
Recent Activity
updated
a model 16 days ago
rosieyzh/rlvr_qwen15_gsm8k_rbz_642_epochs_ckpt_10_of_10 published
a model 16 days ago
rosieyzh/rlvr_qwen15_gsm8k_rbz_642_epochs_ckpt_10_of_10 updated
a model 16 days ago
rosieyzh/rlvr_qwen15_gsm8k_rbz_642_epochs_ckpt_9_of_10 Organizations
Llama-3.2-1B Warmstart RLVR - Summarization
rosieyzh/rlvr_llama1_warmstart_rouge_xsum_rbz_{32,64}_ckpt_{i}_of_10
Llama-3.2-1B SFT - Summarization
rosieyzh/sft_llama1_xsum_lr_1e-5_cosine_bsz_{64,128}_ckpt_{i}_of_5
Qwen2.5-1.5B RLVR - GSM8K
rosieyzh/rlvr_qwen15_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Llama-3.2-1B RLVR - Translation
rlvr_llama1_bleu_alma_rbz_{128,256}_ckpt_{i}_of_10
128: [7, 12, 21, 36, 62, 106, 182, 313, 535, 917]
256: [3, 6, 10, 18, 31, 53, 91, 156, 267, 458]
Qwen2.5-1.5B Warmstart RLVR - Code
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_{32, 64}_ckpt_{I}_of_10
Qwen2.5-1.5B SFT - Code
rosieyzh/sft_qwen15_code200_lr_{1e-5, 5e-6}_{cosine, constant}_bsz_{64, 128}_ckpt_{i}_of_5
OLMo-1B-as_fm3_tg_omi1_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, OMI1, and OMI2. Includes checkpoints from doing PPO using GSM8K train.
-
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode1
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode2
Text Generation • 1B • Updated • 1 -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode3
Text Generation • 1B • Updated • 3
Qwen2.5-1.5B SFT - Unstructured Code
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_{64,128}_ckpt_{i}_of_5
-
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_1_of_5
2B • Updated -
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_2_of_5
2B • Updated • 2 -
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_3_of_5
2B • Updated -
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_4_of_5
2B • Updated
Llama-3.2-1B RLVR - Summarization
rosieyzh/rlvr_llama1_rouge_xsum_rbz_{32,64}_ckpt_{i}_of_10
Qwen2.5-1.5B Warmstart RLVR - GSM8K
rosieyzh/rlvr_qwen15_warmstart_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Llama-3.2-1B Warmstart RLVR - Translation
rlvr_llama1_warmstart_bleu_alma_rbz_{128, 256}_ckpt_{i}_of_10
Llama-3.2-1B SFT - Translation
rosieyzh/sft_llama1_alma_lr_1e-5_cosine_bsz_{64,128}_ckpt_{i}_of_5
Qwen2.5-1.5B RLVR - Code
rosieyzh/rlvr_qwen15_code200_rbz_{32, 64}_ckpt_{I}_of_10
OLMo-150M and OLMo-1B Pretrained Models
Pretrained models from scratch used in "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining".
OLMo-1B-as_fm3_tg_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, and OpenMathInstruct2. Includes checkpoints from doing PPO using GSM8K train.
Synthetic Multimodal Datasets
Datasets used in "Understanding the Design Space and Cross-Modality Transfer for Vision-Language Models"
Qwen2.5-1.5B SFT - Unstructured Code
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_{64,128}_ckpt_{i}_of_5
-
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_1_of_5
2B • Updated -
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_2_of_5
2B • Updated • 2 -
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_3_of_5
2B • Updated -
rosieyzh/sft_qwen15_swallow_lr_1e-5_cosine_bsz_128_ckpt_4_of_5
2B • Updated
Llama-3.2-1B Warmstart RLVR - Summarization
rosieyzh/rlvr_llama1_warmstart_rouge_xsum_rbz_{32,64}_ckpt_{i}_of_10
Llama-3.2-1B RLVR - Summarization
rosieyzh/rlvr_llama1_rouge_xsum_rbz_{32,64}_ckpt_{i}_of_10
Llama-3.2-1B SFT - Summarization
rosieyzh/sft_llama1_xsum_lr_1e-5_cosine_bsz_{64,128}_ckpt_{i}_of_5
Qwen2.5-1.5B Warmstart RLVR - GSM8K
rosieyzh/rlvr_qwen15_warmstart_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Qwen2.5-1.5B RLVR - GSM8K
rosieyzh/rlvr_qwen15_gsm8k_rbz_{32, 64}_ckpt_{I}_of_10
Llama-3.2-1B Warmstart RLVR - Translation
rlvr_llama1_warmstart_bleu_alma_rbz_{128, 256}_ckpt_{i}_of_10
Llama-3.2-1B RLVR - Translation
rlvr_llama1_bleu_alma_rbz_{128,256}_ckpt_{i}_of_10
128: [7, 12, 21, 36, 62, 106, 182, 313, 535, 917]
256: [3, 6, 10, 18, 31, 53, 91, 156, 267, 458]
Llama-3.2-1B SFT - Translation
rosieyzh/sft_llama1_alma_lr_1e-5_cosine_bsz_{64,128}_ckpt_{i}_of_5
Qwen2.5-1.5B Warmstart RLVR - Code
rosieyzh/rlvr_qwen15_warmstart_code200_rbz_{32, 64}_ckpt_{I}_of_10
Qwen2.5-1.5B RLVR - Code
rosieyzh/rlvr_qwen15_code200_rbz_{32, 64}_ckpt_{I}_of_10
Qwen2.5-1.5B SFT - Code
rosieyzh/sft_qwen15_code200_lr_{1e-5, 5e-6}_{cosine, constant}_bsz_{64, 128}_ckpt_{i}_of_5
OLMo-150M and OLMo-1B Pretrained Models
Pretrained models from scratch used in "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining".
OLMo-1B-as_fm3_tg_omi1_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, OMI1, and OMI2. Includes checkpoints from doing PPO using GSM8K train.
-
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode1
Text Generation • 1B • Updated -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode2
Text Generation • 1B • Updated • 1 -
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode3
Text Generation • 1B • Updated • 3
OLMo-1B-as_fm3_tg_omi2
OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, and OpenMathInstruct2. Includes checkpoints from doing PPO using GSM8K train.