jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.85-4xh200-batch-64-20260421-233802 Text Generation • 8B • Updated Apr 22 • 4 •
jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.6-4xh200-batch-64-20260421-214335-rerun Text Generation • 8B • Updated Apr 21 • 3 •
jackf857/llama-3-8b-base-new-dpo-hh-helpful-s_star0.4-4xh200-batch-64-20260421-214335-rerun Text Generation • 8B • Updated Apr 21 • 3 •
jackf857/llama-3-8b-base-new-dpo-harmless-4xh200-s_star1.0 Text Generation • 8B • Updated Apr 21 • 4 •
jackf857/qwen3-8b-base-beta-dpo-hh-helpful-4xh200-batch-64 Text Generation • 8B • Updated Apr 20 • 4 •
jackf857/qwen3-8b-base-beta-dpo-hh-harmless-4xh200-batch-64 Text Generation • 8B • Updated Apr 20 • 5 •
jackf857/qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64 Text Generation • 8B • Updated Apr 20 • 3 •
jackf857/llama-3-8b-base-margin-dpo-hh-harmless-batch-size-64 Text Generation • 8B • Updated Apr 17 • 3 •