sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca-mix-injected-llm-judge-42-checkpoint-3000 Updated Jul 14, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca-mix-injected-llm-judge-42-checkpoint-4000 Updated Jul 14, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-mixed-llm-judge-42 Text Generation • 8B • Updated Jul 10, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-7 Text Generation • 8B • Updated Jul 7, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-6 Text Generation • 8B • Updated Jul 7, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-5 Text Generation • 8B • Updated Jul 6, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-4 Text Generation • 8B • Updated Jul 6, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-3 Text Generation • 8B • Updated Jul 6, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-2 Text Generation • 8B • Updated Jul 6, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-1 Text Generation • 8B • Updated Jul 5, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-AT-1 Text Generation • 8B • Updated Jul 5, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-hard-AT-4 Text Generation • 8B • Updated Jul 3, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-easy-AT-4 Text Generation • 8B • Updated Jul 3, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-hard-AT-3 Text Generation • 8B • Updated Jul 3, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-easy-AT-3 Text Generation • 8B • Updated Jul 2, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-hard-AT-2 Text Generation • 8B • Updated Jul 2, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-easy-AT-2 Text Generation • 8B • Updated Jul 2, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-hard-AT-1 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-directly-output-easy-AT-1 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-AT-3 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-16-NEW-TRAIN-new-reward-AT-3 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-AT-2 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-16-NEW-TRAIN-new-reward-AT-2 Text Generation • 8B • Updated Jul 1, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-20-NEW-TRAIN-AT-1 Text Generation • 8B • Updated Jun 30, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-short-16-NEW-TRAIN-new-reward-AT-1 Text Generation • 8B • Updated Jun 30, 2025