embeddinggemma-300m-GGUF-with-dense-modules
Original model: google/embeddinggemma-300m
Original research into numerical stability of embedding models between llama-cpp and Ollama
Motivation
When migrating our inference environment from Ollama to llama-cpp we noticed that the currently available GGUF conversions for this model were missing the "dense modules" resulting in vastly different output.
Also, the original GGUF files from Ollama were incompatible with llama-cpp as the model architecture deviated slightly.
We thus decided to create a custom model derived from the original model.
Process
uv run python ../llama.cpp/convert_hf_to_gguf.py ../embeddinggemma-300m --outfile ../embeddinggemma-300m-GGUF-with-dense-modules/embeddinggemma-300M-BF16-with-dense.gguf --outtype bf16 --sentence-transformers-dense-modules
Where ../llama.cpp is a local clone of the llama.cpp repository and ../embeddinggemma-300m is a local clone of the original model repository google/embeddinggemma-300m and
../embeddinggemma-300m-GGUF-with-dense-modules is the target directory for this repository.
- Downloads last month
- 18
16-bit
Model tree for flyingcircusio/embeddinggemma-300m-GGUF-with-dense-modules
Base model
google/embeddinggemma-300m