Update README.md

#18
by aynot - opened

This PR proposes improvements to the vLLM usage example:

  1. Updates the instruction and query template to match the format used in the Transformers example (removes unnecessary newlines).

  2. Fixes a bug in input creation procedure: Sets add_generation_prompt=True in apply_chat_template and removes the suffix and suffix_tokens variables.
    Previously, the combination of <|im_end|>\n tokens was added twice: once by apply_chat_template and again via suffix_tokens, which resulted in inconsistent input strings.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment