Update README.md
#18
by
aynot
- opened
This PR proposes improvements to the vLLM usage example:
Updates the instruction and query template to match the format used in the Transformers example (removes unnecessary newlines).
Fixes a bug in input creation procedure: Sets
add_generation_prompt=Trueinapply_chat_templateand removes thesuffixandsuffix_tokensvariables.
Previously, the combination of<|im_end|>\ntokens was added twice: once byapply_chat_templateand again viasuffix_tokens, which resulted in inconsistent input strings.