Safetensors
qwen2
leave-one-out
loo-domain-knowledge
grpo
lr=5e-6
batch_size=2
group_size=2
max_steps=12
qlora
quantize=4bit_nf4
lora_rank=64
lora_alpha=128
lora_dropout=0.05
max_completion_length=2048
gradient_checkpointing
bf16
ddp_workers=2
hub_model_id=tinyllms/qwen2.5-7b-instruct-grpo-loo-domain-knowledge-20steps
ray_job=raysubmit_YbfK7VEEYMGjEp9h
Welcome to the community
The community tab is the place to discuss and collaborate with the HF community!