Question about NaN logprob

#23
by hzhiqi - opened

Hi,

Thanks for the model. I have an issue when PEFT fine-tuning it with TRL GRPOTrainer + vLLM.

First, TRL gives a bunch of warning messages like the following:

WARNING vllm_serve.py:413: Generated NaN logprob, token logprob 'Logprob(logprob=nan, rank=0, decoded_token='<pad>')' will be ignored

Then, it fails with the error message:

{'type': 'float_type', 'loc': ('response', 'logprobs', 7, 4092), 'msg': 'Input should be a valid number', 'input': None}

It seems the <pad> token does not have a valid log probability. I tried the following, but the error persists.

  • set prompt_logprobs = None in generation_kwargs of GRPOConfig
  • set tokenizer.pad_token = tokenizer.eos_token and model.config.pad_token_id = tokenizer.eos_token_id

Any suggestions? Thank you!

Sign up or log in to comment