Yifan Wang's picture

Yifan Wang

AmberYifan

·

AI & ML interests

None yet

Recent Activity

authored a paper 4 days ago

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

upvoted a paper 4 days ago

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

published a model 4 months ago

AmberYifan/Qwen2.5-3B-MATH-MARL-structure-only

View all activity

Organizations

AmberYifan 's models 276

AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-SPIN-iter2

Text Generation • 8B • Updated Jul 19, 2025 • 1

AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-DRIFT-iter2

Text Generation • 8B • Updated Jul 19, 2025 • 1

AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-iterDPO-iter1

Text Generation • 8B • Updated Jul 18, 2025 • 6

AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-SPIN-iter1

Text Generation • 8B • Updated Jul 18, 2025 • 1

AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-DRIFT-iter1

Text Generation • 8B • Updated Jul 18, 2025 • 2 • 1

AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed

Text Generation • 8B • Updated Jul 17, 2025 • 1

AmberYifan/Qwen2.5-7B-Instruct-userfeedback-on-policy-iter3

Text Generation • 8B • Updated Jul 11, 2025 • 4

AmberYifan/Qwen2.5-7B-Instruct-userfeedback-SPIN-iter3

Text Generation • 8B • Updated Jul 11, 2025 • 2

AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-spin-iter3

Text Generation • 8B • Updated Jul 10, 2025 • 2

AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-nspin-iter3

Text Generation • 8B • Updated Jul 10, 2025 • 2

AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-iterdpo-iter3

Text Generation • 8B • Updated Jul 10, 2025 • 2

AmberYifan/Qwen2.5-7B-Code-dense-reward-3k

Text Generation • 8B • Updated Jul 4, 2025 • 4 • 1

AmberYifan/Qwen2.5-1.5B-Code-GRPO-dense-reward-3k

Text Generation • 2B • Updated Jul 4, 2025 • 13 • 1

AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en-gpt-sft

Text Generation • 8B • Updated Jul 3, 2025 • 1

AmberYifan/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-gpt-sft

Text Generation • 8B • Updated Jul 3, 2025 • 2

AmberYifan/llama3-8b-full-pretrain-mix-high-tweet-1m-en-gpt-sft

Text Generation • 8B • Updated Jul 3, 2025 • 2

AmberYifan/llama3-8b-full-pretrain-junk-tweet-1m-en-gpt-sft

Text Generation • 8B • Updated Jul 3, 2025 • 2

AmberYifan/llama3-8b-full-pretrain-control-tweet-1m-en-gpt-sft

Text Generation • 8B • Updated Jul 3, 2025 • 3

AmberYifan/llama3-8b-full-pretrain-mix-low-tweet-1m-en-gpt

Text Generation • 8B • Updated Jul 3, 2025 • 1

AmberYifan/llama3-8b-full-pretrain-mix-mid-tweet-1m-en-gpt

Text Generation • 8B • Updated Jul 3, 2025 • 1

AmberYifan/llama3-8b-full-pretrain-mix-high-tweet-1m-en-gpt

Text Generation • 8B • Updated Jul 3, 2025 • 2

AmberYifan/llama3-8b-full-pretrain-control-tweet-1m-en-gpt

Text Generation • 8B • Updated Jul 3, 2025 • 1

AmberYifan/llama3-8b-full-pretrain-junk-tweet-1m-en-gpt

Text Generation • 8B • Updated Jul 3, 2025 • 1

AmberYifan/DAPO-Coding-Qwen2.5-1.5B-Instruct

Text Generation • 2B • Updated Jul 3, 2025 • 9 • 1

AmberYifan/Qwen2.5-1.5B-Instruct-GRPO-Code-old-reward

Text Generation • 2B • Updated Jul 2, 2025 • 5

AmberYifan/Qwen2.5-1.5B-Code-GRPO-dense-reward

Text Generation • 2B • Updated Jul 2, 2025 • 4 • 1

AmberYifan/Qwen2.5-7B-Code-GRPO-fix-reward

Text Generation • 8B • Updated Jul 2, 2025 • 2 • 1

AmberYifan/Qwen2.5-1.5B-Code-GRPO-fix-reward

Text Generation • 2B • Updated Jul 2, 2025 • 9 • 1

AmberYifan/Qwen2.5-7B-SFT-Code-GRPO

Text Generation • 8B • Updated Jun 30, 2025 • 2

AmberYifan/Qwen2.5-7B-Instruct-ultrafeedback-iterDPO-iter2

Text Generation • 8B • Updated Jun 30, 2025 • 4 • 1