Transformer Pipeline
Loading Gemma 2b.it model with this code:
model_version = 2
model_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
model_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/config.json"
tokenizer_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
tokenizer_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/tokenizer_config.json"
model_config = AutoConfig.from_pretrained(model_config)
model = AutoModelForCausalLM.from_pretrained(model_id, config=model_config, device_map='auto')
tokenizer_config = AutoConfig.from_pretrained(tokenizer_config)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, config=tokenizer_config, device_map='auto', return_tensors="pt")
Executing the generation as follow:
input_text = "Write a python function to print all elements of a list."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=16)
print(tokenizer.decode(outputs[0]))
Some text is generated. But creating a transformers.pipeline as follow, the only text in output is the input text.
query_pipeline = transformers.pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
device_map="auto",
framework="pt",
)
input_text = "Write a python function to print all elements of a list."
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)
print(f"Result: {result}")
This is the output:
Result: [{'generated_text': 'Write a python function to print all elements of a list.'}]
This procedure is correct or there are some mistakes?
Instead, when the pipeline is applying the chat-template in this way before executing the pipeline generates some text:
chat = [
{ "role": "user", "content": input_text },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
But the text is also generated creatina a pipeline of type "conversational" and passing a chat like this:
chat = [
{ "role": "user", "content": input_text },
]
There's a problem with the TextGenerationPipeline?
Even I am struggling with this
Is this using the right chat template and control tokens under the hood?
I had the same issue the generated_text is the same as input. I found a way to fix this.
Modify the code:
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)
to:
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
add_special_tokens=True
)
To utilize pipeline the chat template must be used. Using pipeline without chat template does not generate any new tokens.
Interesting, cc @ArthurZ @Rocketknight1 do you think there is something we need to upstream in transformers pipeline?
But shouldn't the text-generation pipeline produce new tokens as for all the other models?
Also for gemma-7b-it it sometimes generates tokens for me.
Hi,
Apologies for the late reply, could you please confirm whether the above mentioned issue is resolved or not. If you required any further assistance please let us know. Thanks for your patience.
Thanks.