·
AI & ML interests
None yet
Organizations
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-NoPrompt-iter1
Updated
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-DRIFT-iter3-T1.0
Updated
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-iter3-T1.0
Text Generation
• 333k • Updated • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter3-RPO
Text Generation
• 841k • Updated • 1
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-SPIN-iter4
Updated
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-iter4
Updated
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-DRIFT-NoPrompt-iter1
Updated
AmberYifan/Phi-4-mini-instruct-wildfeedback-seed
Updated
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-DRIFT-iter3-RPO
Updated
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter3-RPO
Updated
AmberYifan/Qwen3-4B-MATH-GRPO-len-control-tuned-test
Updated
AmberYifan/Qwen3-4B-MATH-GRPO-len-control-tuned
Updated
AmberYifan/Qwen3-4B-OpenR1Math-MARL-structure-v2
Updated
AmberYifan/Qwen3-4B-Polaris-MARL-structure-v2
Updated
AmberYifan/Qwen3-4B-MATH-MARL-structure-v2
Updated
AmberYifan/Qwen3-1.7B-MATH-MARL-structure
Updated
AmberYifan/Qwen3-1.7B-MATH-MARL-tuned
Updated
AmberYifan/Qwen3-4B-MATH-MARL-tuned
Updated
AmberYifan/Qwen3-4B-MATH-GRPO-tuned
Updated
AmberYifan/Qwen3-4B-MATH-MARL-structure-loop-penalty-v2-32
Updated
AmberYifan/Qwen3-4B-OpenR1Math-MARL-structure-loop-penalty-v2
Updated
AmberYifan/Phi-4-mini-reasoning-MATH-MARL-structure-loop-penalty-v2
Updated
AmberYifan/Qwen3-4B-MATH-MARL-structure-loop-penalty-v2
4B • Updated AmberYifan/Qwen3-4B-MATH-MARL-structure
Text Generation
• 4B • Updated • 4
• 1
AmberYifan/Qwen3-4B-MATH-GRPO-len-control
Text Generation
• 4B • Updated AmberYifan/Qwen3-4B-MATH-GRPO-acc
Text Generation
• 4B • Updated • 1
AmberYifan/llama3-8b-full-pretrain-low-len-1m-en-sft
Text Generation
• 8B • Updated AmberYifan/llama3-8b-full-pretrain-high-len-1m-en-sft
Text Generation
• 8B • Updated AmberYifan/llama3-8b-full-pretrain-high-len-1m-en
Text Generation
• 8B • Updated AmberYifan/llama3-8b-full-pretrain-low-len-1m-en
Text Generation
• 8B • Updated • 1