·
AI & ML interests
None yet
Organizations
1231czx/llama32_pt_5b_rl_70
4B • Updated
• 1
1231czx/llama32_pt_5b_rl_30
4B • Updated
• 1
1231czx/llama32_it_rl_120
4B • Updated
• 1
1231czx/llama32_it_rl_100
4B • Updated
• 1
4B • Updated
• 1
4B • Updated
• 1
1231czx/llama32_it_rl_140
4B • Updated
• 1
Text Generation
• 3B • Updated
Text Generation
• 8B • Updated
• 1
1231czx/qwen_self_corr_star_baseline_ep1
Text Generation
• 8B • Updated
• 1
1231czx/qwen_self_corr_star_plus_baseline_ep1
Text Generation
• 8B • Updated
• 1
1231czx/qw_ppo_self_corr_dpo_iter2_turn2
Text Generation
• 8B • Updated
1231czx/qw_ppo_self_corr_dpo_iter1_turn2
Text Generation
• 8B • Updated
1231czx/qwen_self_corr_star_plus_baseline
Text Generation
• 8B • Updated
• 1
1231czx/qwen_self_corr_star_baseline
Text Generation
• 8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step260
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step240
8B • Updated
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step220
8B • Updated
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step200
8B • Updated
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step180
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step160
8B • Updated
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step140
8B • Updated
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step120
8B • Updated
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step100
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step80
8B • Updated
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step60
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step40
8B • Updated
• 1
1231czx/gg_regular_prompt_ppo_correctness_reward_round2_step20
8B • Updated
1231czx/gg_regular_prompt_ppo_format_reward_step250
8B • Updated