fastest inference

by ehartford - opened Apr 3, 2024

Apr 3, 2024

Hi I would like advice about the fastest way to do inference with this?
I wanna run this on 5 million samples, it seems it will take several months, unless i find a faster way.

evan-nexusflow

Nexusflow org Apr 3, 2024

Hi @ehartford ,

I have found Deepspeed inference to be quite good for inferencing this model which allows you to use tensor parallelism.

Here are some links to get started:
https://deepspeed.readthedocs.io/en/latest/inference-init.html
https://www.deepspeed.ai/tutorials/inference-tutorial/

Also note that it is a bit faster to have the tokenizer pad to 'longest' rather than 'max_length'.

Hope this helps!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment