·
AI & ML interests
None yet
Organizations
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-SPIN-iter2
Text Generation
• 8B • Updated AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-DRIFT-iter2
Text Generation
• 8B • Updated AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-iterDPO-iter1
Text Generation
• 8B • Updated • 2
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-SPIN-iter1
Text Generation
• 8B • Updated AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-DRIFT-iter1
Text Generation
• 8B • Updated • 1
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed
Text Generation
• 8B • Updated • 2
AmberYifan/Qwen2.5-7B-Instruct-userfeedback-on-policy-iter3
Text Generation
• 8B • Updated AmberYifan/Qwen2.5-7B-Instruct-userfeedback-SPIN-iter3
Text Generation
• 8B • Updated AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-spin-iter3
Text Generation
• 8B • Updated AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-nspin-iter3
Text Generation
• 8B • Updated AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-iterdpo-iter3
Text Generation
• 8B • Updated AmberYifan/Qwen2.5-7B-Code-dense-reward-3k
Text Generation
• 8B • Updated • 1
• 1
AmberYifan/Qwen2.5-1.5B-Code-GRPO-dense-reward-3k
Text Generation
• 2B • Updated • 1
AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en-gpt-sft
Text Generation
• 8B • Updated • 1
AmberYifan/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-gpt-sft
Text Generation
• 8B • Updated • 1
AmberYifan/llama3-8b-full-pretrain-mix-high-tweet-1m-en-gpt-sft
Text Generation
• 8B • Updated • 2
AmberYifan/llama3-8b-full-pretrain-junk-tweet-1m-en-gpt-sft
Text Generation
• 8B • Updated • 1
AmberYifan/llama3-8b-full-pretrain-control-tweet-1m-en-gpt-sft
Text Generation
• 8B • Updated • 1
AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en-gpt
Text Generation
• 8B • Updated • 4
AmberYifan/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-gpt
Text Generation
• 8B • Updated AmberYifan/llama3-8b-full-pretrain-mix-high-tweet-1m-en-gpt
Text Generation
• 8B • Updated • 2
AmberYifan/llama3-8b-full-pretrain-control-tweet-1m-en-gpt
Text Generation
• 8B • Updated AmberYifan/llama3-8b-full-pretrain-junk-tweet-1m-en-gpt
Text Generation
• 8B • Updated • 2
AmberYifan/DAPO-Coding-Qwen2.5-1.5B-Instruct
Text Generation
• 2B • Updated • 5
• 1
AmberYifan/Qwen2.5-1.5B-Instruct-GRPO-Code-old-reward
Text Generation
• 2B • Updated • 5
AmberYifan/Qwen2.5-1.5B-Code-GRPO-dense-reward
Text Generation
• 2B • Updated • 1
AmberYifan/Qwen2.5-7B-Code-GRPO-fix-reward
Text Generation
• 8B • Updated • 1
AmberYifan/Qwen2.5-1.5B-Code-GRPO-fix-reward
Text Generation
• 2B • Updated • 1
• 1
AmberYifan/Qwen2.5-7B-SFT-Code-GRPO
Text Generation
• 8B • Updated AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-iterDPO-iter2
Text Generation
• 8B • Updated • 1