How are the ONNX files for this model generated?

#21
by bhavikatekwani - opened

๐Ÿ‘‹๐Ÿฝ Hello!

I'm trying to use nomic-embed-text-v1.5 and was wondering how the ONNX files here were created?

I would like to optimize them for use with TensorRT but I am running into some issues that might be solved by understanding how you export the models.

Thanks for your help.

I believe @Xenova converted them, he may be able to share the script. I can answer any questions on errors that you might be seeing though! are you able to post your error logs?

I used Optimum, and you can see how to do it here: https://github.com/huggingface/optimum/pull/1874 (still a WIP PR)

Thank you @zpn and @Xenova !

@zpn there are no errors as of now, just that a simple ONNX to TensorRT conversion doesn't lead to a very performant model as I expected.

@Xenova I'm actually the author of that PR ๐Ÿ˜„ I was asking about the conversion because I see the inputs and outputs are different in this repo and in what I get via Optimum:

  • model.onnx as you generated has inputs

Screenshot 2024-05-30 at 9.53.47โ€ฏAM.png

  • This is what I get from Optimum (token_embeddings and sentence_embedding)
    Screenshot 2024-05-30 at 9.54.01โ€ฏAM.png

Just wanted to make sure as that the PR is still correct.

hmm iโ€™ve had mixed results with TensorRT in the past. Are you able to post the onnx/tensorrt graph? i imagine that there may be a lot of unoptimized code

@zpn actually the TensorRT stuff worked out fine. There may be a lot of unoptimized code but possibly something that can be detected with https://github.com/daquexian/onnx-simplifier?

Thanks for the resource, I'll take a look! I'm sure there are a lot of unnecessary expensive ops :)

zpn changed discussion status to closed

Just putting it here, another great resource for optimizing ONNX models: https://github.com/tsingmicro-toolchain/OnnxSlim

Sign up or log in to comment