How to reproduce the results in your blog?
At present, my student model is LLaMA3.2-1B, and the teacher model is Qwen3-4B. The training and testing data I am using is the CountDown Qwen3-4B version (27.7K).
I am training on two GPUs, and the training hyperparameters are as follows:
'''
training_args = GOLDConfig(
save_strategy="steps",
save_steps=500,
learning_rate=5e-5,
warmup_ratio=0.05,
per_device_train_batch_size=16,
max_completion_length = 512,
teacher_model_name_or_path=teacher_name,
teacher_tokenizer_name_or_path=teacher_name,
bf16=True,
use_uld_loss=True,
uld_use_hybrid_loss=True,
push_to_hub=False,
report_to=[],
lr_scheduler_type = 'cosine',
num_train_epochs=5,
max_steps=3000,
logging_steps=10,
gradient_accumulation_steps=1,
lmbda = 1.0,
beta = 0.0,
uld_crossentropy_weight = 0.0,
uld_distillation_weight = 1.0,
)
'''
Currently, the loss can decrease from 1.8 to around 0.1. However, the trained model is unable to generate outputs in the required and format. Moreover, its responses are almost irrelevant to the questions and do not form complete sentences.
Are you using the model for tasks different from Countdown? Just curious about your setup.
Are you using the model for tasks different from Countdown? Just curious about your setup.
Nope, just this Countdown task.