When evaluating Wiki2, I just get Loss: Nan, while with gemma-3-1b-it it works..

by jvonrad - opened Jun 11, 2025

Jun 11, 2025

Why doesn't it work for the -pt version? Can someone help?

    model = AutoModelForCausalLM.from_pretrained(
        args.path,
        torch_dtype=getattr(torch, args.torch_dtype.split('.')[-1]),
        trust_remote_code=True,
    ).to("cuda" if torch.cuda.is_available() else "cpu").eval()

        with torch.no_grad():
            outputs = model(input_ids, labels=target_ids)
            loss = outputs.loss

        if torch.isnan(loss):
            print(f"NaN loss at i={i}, begin={begin}, end={end}")
            continue

BalakrishnaCh

Google org Jun 11, 2025

Hi @jonny-vr ,

Welcome to the Google Gemma family of open-source models. The primary distinction between the pre-trained (pt) and instruction-tuned (it) models lies in their training objectives. Pre-trained models are trained on general information from sources such as Wikipedia and books, etc, whereas instruction-tuned models undergo further training specifically to adhere to instructions.

I have executed both the pre-trained and instruction-tuned models locally and evaluated their loss values. Both the models are producing the numeric loss values. Please find the attached gist file for you reference. I have tested with normal sample ids available in the sample example code.

Key points to consider:

Kindly verify that the parameters and arguments, particularly the data type, being passed to the model are correct. The use of unsupported data types can lead to incorrect loading of model weights, resulting in erroneous outputs.
The issue may stem from the input and label IDs provided to the model.

If you required any further help reach out to me, I'm more than happy to help you out.

Thanks.

Danik8

Sep 1, 2025

This comment has been hidden (marked as Off-Topic)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment