Offline RL with Neg Samples - a SpectralPO Collection

SpectralPO 's Collections

DeepSeek-R1-Distill-Llama-8B

DeepSeek-R1-Distill-Qwen-32B

Qwen2.5-32B-Instruct

Qwen2.5-14B-Instruct

DeepSeek-R1-Distill-Qwen-7B

Qwen2.5-7B-Instruct

Offline RL with Neg Samples

Offline RL with Neg Samples

updated Apr 27, 2025