W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star0.4-4xh200-batch-64-20260421-204233 Text Generation • 8B • Updated Apr 21 • 8
W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star0.6-4xh200-batch-64-20260421-213851 Text Generation • 8B • Updated Apr 21 • 5
jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.4-4xh200-batch-64-20260421-214335-rerun Text Generation • 8B • Updated Apr 21 • 6
W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star0.85-4xh200-batch-64-20260421-213851 Text Generation • 8B • Updated Apr 21 • 6
jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.6-4xh200-batch-64-20260421-214335-rerun Text Generation • 8B • Updated Apr 21 • 5
W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260421-213851 Text Generation • 8B • Updated Apr 21 • 7
jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.85-4xh200-batch-64-20260421-233802 Text Generation • 8B • Updated Apr 22 • 5
jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star1.0-4xh200-batch-64-20260421-233802 Text Generation • 8B • Updated Apr 22 • 8
W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star0.6-4xh200-batch-64-20260422-051621 Text Generation • 8B • Updated Apr 22 • 7
W-61/llama-3-8b-base-new-dpo-hh-harmless-s_star1.0-4xh200-batch-64-20260422-051621 Text Generation • 8B • Updated Apr 22 • 9
jackf857/qwen3-8b-base-epsilon-dpo-ultrafeedback-4xh200-batch-128 Text Generation • 8B • Updated Apr 22 • 11
jackf857/qwen3-8b-base-sft-hh-helpful-8xh200-20260414-192602-232981 Text Generation • 8B • Updated Apr 23 • 4
W-61/qwen3-8b-base-cpo-ultrafeedback-4xh200-batch-128-20260422-131855 Text Generation • 8B • Updated Apr 23 • 31
W-61/qwen3-8b-base-epsilon-dpo-ultrafeedback-4xh200-batch-128-20260422-131855 Text Generation • 8B • Updated Apr 23 • 22
W-61/qwen3-8b-base-ipo-ultrafeedback-4xh200-batch-128-20260422-131855 Text Generation • 8B • Updated Apr 23 • 23
W-61/qwen3-8b-base-slic-hf-ultrafeedback-4xh200-batch-128-20260422-131855 Text Generation • 8B • Updated Apr 23 • 27
W-61/qwen3-8b-base-r-dpo-ultrafeedback-4xh200-batch-128-20260422-131855 Text Generation • 8B • Updated about 1 month ago • 32
jackf857/llama-3-8b-base-new-dpo-harmless-s_star0.4-q_t0.4 Text Generation • 8B • Updated about 1 month ago • 522 • 1
jackf857/llama-3-8b-base-new-dpo-harmless-s_star0.6-q_t0.4 Text Generation • 8B • Updated about 1 month ago • 599
W-61/llama3-8b-base-new-method-q_t-0.4-s_star0.6 Text Generation • 8B • Updated about 1 month ago • 526
jackf857/qwen3-8b-base-margin-dpo-hh-helpful-4xh200-batch-64-20260423-233948 Text Generation • 8B • Updated about 1 month ago • 466
jackf857/qwen3-8b-base-margin-dpo-hh-harmless-4xh200-batch-64-20260423-234249 Text Generation • 8B • Updated about 1 month ago • 446
W-61/llama3-8b-base-new-method-q_t-0.4-s_star0.6-beta-next-batch Text Generation • 8B • Updated about 1 month ago • 379
jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64-20260424-013732 Text Generation • 8B • Updated about 1 month ago • 318 • 1
W-61/qwen3-8b-base-beta-dpo-ultrafeedback-4xh200-batch-128-20260423-040315 Text Generation • 8B • Updated 30 days ago • 360
jackf857/qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64-20260424-025105 Text Generation • 8B • Updated 30 days ago • 306
W-61/qwen3-8b-base-margin-dpo-ultrafeedback-4xh200-batch-128-20260423-040315 Text Generation • 8B • Updated 30 days ago • 485
jackf857/qwen3-8b-base-epsilon-dpo-hh-helpful-4xh200-batch-64-20260424-040306 Text Generation • 8B • Updated 30 days ago • 430
jackf857/qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64-20260424-040415 Text Generation • 8B • Updated 30 days ago • 421