Question about NaN logprob
#23
by
hzhiqi
- opened
Hi,
Thanks for the model. I have an issue when PEFT fine-tuning it with TRL GRPOTrainer + vLLM.
First, TRL gives a bunch of warning messages like the following:
WARNING vllm_serve.py:413: Generated NaN logprob, token logprob 'Logprob(logprob=nan, rank=0, decoded_token='<pad>')' will be ignored
Then, it fails with the error message:
{'type': 'float_type', 'loc': ('response', 'logprobs', 7, 4092), 'msg': 'Input should be a valid number', 'input': None}
It seems the <pad> token does not have a valid log probability. I tried the following, but the error persists.
- set
prompt_logprobs = Noneingeneration_kwargsofGRPOConfig - set
tokenizer.pad_token = tokenizer.eos_tokenandmodel.config.pad_token_id = tokenizer.eos_token_id
Any suggestions? Thank you!