Multiple bugs prevent model.generate() from working - AttributeError and KeyError issues

#6
by teddyk251 - opened

The model's implementation has several bugs that prevent model.generate() from working properly. The issues appear to be related to inadequate null checking when handling cached key-value pairs during generation.

Steps to Reproduce:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lelapa/InkubaLM-0.4B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("lelapa/InkubaLM-0.4B", trust_remote_code=True)
model.to('cuda')

text = "Today I planned to"
inputs = tokenizer(text, return_tensors="pt").to('cuda')

outputs = model.generate(
    inputs.input_ids, 
    attention_mask=inputs.attention_mask, 
    max_length=20,
    pad_token_id=tokenizer.eos_token_id
)

Errors Encountered:

  1. Line 578 in VulavulaLlamaModel.forward():
AttributeError: 'NoneType' object has no attribute 'shape'

Code: past_key_values_length = past_key_values[0][0].shape[2]

  1. Line 232 in VulavulaLlamaAttention.forward():
AttributeError: 'NoneType' object has no attribute 'shape'

Code: kv_seq_len += past_key_value[0].shape[-2]

  1. Lines 149-150 in VulavulaLlamaAttention.forward():
TypeError: expected Tensor as element 0 in argument 0, but got NoneType

Code: key_states = torch.cat([past_key_value[0], key_states], dim=2)

  1. Cache access issues:
KeyError: 'Cache only has 1 layers, attempted to access layer with index 1'

Pattern:
All errors seem to stem from the code assuming that if past_key_values or past_key_value is not None, then it contains valid tensor data. However, during the generation process, these structures can contain None elements or have indexing mismatches.

Current Workaround:
Setting use_cache=False works but significantly impacts generation speed.

Environment:

  • transformers version: 4.55.4
  • torch version: 2.7.1+cu128
  • Python version: 3.12.11

Sign up or log in to comment