Phantomcloak19/TV-CGRPO-Qwen2-5-3B-Instruct_two_obj_scalar-QLoRA-TRL 3B • Updated 7 days ago • 54
Phantomcloak19/safe-grpo-qlora-Qwen3-4B-long-saftey-grpo-mixed-llm-sources Text Generation • Updated 19 days ago • 37
Phantomcloak19/TV-CGRPO-Qwen2-5-3B-Instruct_no_advantage_adj-QLoRA-TRL 3B • Updated 20 days ago • 19
Phantomcloak19/safe-grpo-qlora-Qwen3-4B-long-saftey-grpo-mixed-merged Text Generation • Updated 21 days ago • 35
Phantomcloak19/TV-CGRPO-Qwen2-5-3B-Instruct_no_lagrangian-QLoRA-TRL 3B • Updated 21 days ago • 14
Phantomcloak19/safe-grpo-qlora-Qwen2.5-3B-Instruct-long-saftey-grpo-mixed-llm-sources Text Generation • Updated 22 days ago • 36
Phantomcloak19/safe-grpo-qlora-Qwen2.5-3B-Instruct-long-saftey-grpo-mixed-merged Text Generation • Updated 23 days ago • 32
Phantomcloak19/safe-grpo-qlora-gemma-2-2b-it-long-saftey-grpo-mixed-llm-sources Text Generation • Updated 24 days ago • 26