AI & ML interests
None yet
Organizations
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-40k-2
Updated
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-40k
8B • Updated zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-new-40k
Updated
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-medium-0_01-new
Text Generation
• 8B • Updated • 1
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-medium-0_05-new
Text Generation
• 8B • Updated • 5
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-high-0_1-new
Text Generation
• 8B • Updated • 2
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-high-0_5-new
Text Generation
• 8B • Updated • 1
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-3
8B • Updated zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-2
8B • Updated • 1
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-high-0_3-1024
Text Generation
• 8B • Updated • 2
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-high-0_1-1024
Text Generation
• 8B • Updated • 2
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-high-0_5-1024
8B • Updated zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-new
Text Generation
• 8B • Updated • 5
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-high-0_05-1024
8B • Updated zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-medium-0_01-1024
Text Generation
• 8B • Updated • 2
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-medium-high
Text Generation
• 8B • Updated • 2
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-medium
8B • Updated • 1
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low
Text Generation
• 8B • Updated • 2
zijianh/Qwen-2.5-7B-Simple-RL-length
Text Generation
• 8B • Updated • 2
zijianh/Qwen-2.5-7B-Simple-RL
Text Generation
• 8B • Updated • 1
zijianh/DeepSeek-R1-Distill-Qwen-1.5B-GRPO
Updated
zijianh/Qwen2.5-1.5B-Open-R1-Distill
Text Generation
• 2B • Updated • 1