W-61/llama-3-8b-base-beta-dpo-ultrafeedback-4xh200-batch-128-20260424-044124 Text Generation • 8B • Updated 30 days ago • 475
jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.4 Text Generation • 8B • Updated 29 days ago • 320
jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.6 Text Generation • 8B • Updated 29 days ago • 324
jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-0.85 Text Generation • 8B • Updated 29 days ago • 355
jackf857/llama-3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.5-s_star-1.0 Text Generation • 8B • Updated 29 days ago • 422
jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.4 Text Generation • 8B • Updated 29 days ago • 340
jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.6 Text Generation • 8B • Updated 29 days ago • 290
jackf857/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.5-s_star-0.85 Text Generation • 8B • Updated 29 days ago • 383
W-61/llama3-8b-base-new-method-s_star0.6-20260425-180936 Text Generation • 8B • Updated 28 days ago • 269
jackf857/qwen3-8b-base-sft-hh-helpful-4xh200-batch-64-20260417-214452 Text Generation • 8B • Updated 28 days ago • 886
jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452 Text Generation • 8B • Updated 28 days ago • 960
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.4 Text Generation • 8B • Updated 28 days ago • 254
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.6 Text Generation • 8B • Updated 28 days ago • 247
jackf857/qwen3-8b-base-new-dpo-hh-harmless-4xh200-batch-64-q_t-0.45-s_star-0.85 Text Generation • 8B • Updated 28 days ago • 239
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.4 Text Generation • 8B • Updated 28 days ago • 26
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.6 Text Generation • 8B • Updated 28 days ago • 75
jackf857/qwen3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-q_t-0.45-s_star-0.85 Text Generation • 8B • Updated 28 days ago • 266
W-61/llama-3-8b-base-r-dpo-ultrafeedback-4xh200-batch-128-20260426-105614 Text Generation • 8B • Updated 28 days ago • 336
W-61/llama-3-8b-base-orpo-ultrafeedback-4xh200-batch-128-20260426-105614 Text Generation • 8B • Updated 28 days ago • 18
W-61/qwen3-8b-base-kto-ultrafeedback-4xh200-batch-128-20260426-105614 Text Generation • 8B • Updated 27 days ago • 27
jackf857/llama-3-8b-base-orpo-ultrafeedback-4xh200-rerun Text Generation • 8B • Updated 27 days ago • 127
jackf857/qwen3-8b-base-kto-ultrafeedback-4xH200-batch-128 Text Generation • 8B • Updated 21 days ago • 33
W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.4-s_star-0.2-20260426-185938 Text Generation • 8B • Updated 27 days ago • 39
jackf857/llama-3-8b-base-cpo-ultrafeedback-4xH200-batch-128-rerun Text Generation • 8B • Updated 27 days ago • 131
W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.4-s_star-0.2-20260425-111846 Text Generation • 8B • Updated 27 days ago • 10
W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.4-s_star-0.4-20260425-111846 Text Generation • 8B • Updated 27 days ago • 14
W-61/llama3-8b-base-new-method-s_star0.6-20260426-230653 Text Generation • 8B • Updated 27 days ago • 238
jackf857/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.4-s_star-0.5 Text Generation • 8B • Updated 27 days ago • 174
W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-s_star-0.2-20260425-111846 Text Generation • 8B • Updated 27 days ago • 16
W-61/llama-3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-s_star-0.4-20260425-111846 Text Generation • 8B • Updated 27 days ago • 208