|
|
--- |
|
|
tags: |
|
|
- static-embeddings |
|
|
--- |
|
|
# Static Embeddings |
|
|
|
|
|
This project contains multilingual static embeddings that are appropriate for generating |
|
|
quick embeddings in edge devices. They are re-packaged from other projects in production |
|
|
ready assets. |
|
|
|
|
|
## Models |
|
|
|
|
|
* [minishlab/potion-retrieval-32M/](models/minishlab/potion-retrieval-32M/README.md) |
|
|
* [minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md) |
|
|
* [sentence-transformers/static-retrieval-mrl-en-v1/](models/sentence-transformers/static-retrieval-mrl-en-v1/README.md) |
|
|
* [sentence-transformers/static-similarity-mrl-multilingual-v1](models/sentence-transformers/static-similarity-mrl-multilingual-v1/README.md) |
|
|
|
|
|
## Updating |
|
|
|
|
|
Add models to `scripts/build_models.py`. |
|
|
|
|
|
```sh |
|
|
# Install dependencies and login to huggingface: |
|
|
pipx install huggingface_hub |
|
|
huggingface-cli login |
|
|
|
|
|
# Re-build the models: |
|
|
uv run scripts/build_models.py |
|
|
|
|
|
# Version control: |
|
|
git add . |
|
|
git commit -m 'Updated the models' |
|
|
git push |
|
|
git tag v1.0.0 -m 'Model release description' |
|
|
git push origin tag v1.0.0 |
|
|
|
|
|
# Upload the models |
|
|
uv run scripts/upload_models.py --tag v1.0.0 |
|
|
``` |
|
|
|
|
|
## Precision |
|
|
|
|
|
For static embeddings and cosine similarity, precision isn't as important. For an end |
|
|
to end to test in Firefox on some vectors here was the cosine similarity for the same |
|
|
mean pooled result. Note that the vector math happens in the f32 space, but storage |
|
|
for the embeddings is in a lower precision. |
|
|
|
|
|
> f32 vs f16: cosine similarity = 1.00000000<br/> |
|
|
> → They are essentially identical in direction. |
|
|
> |
|
|
> f32 vs f8: cosine similarity = 0.99956375<br/> |
|
|
> → Very close, only tiny quantization effects. |
|
|
|
|
|
Note that this was done on the `torch.float8_e4m3fn`, while `torch.float8_e5m2` generally |
|
|
has more loss. |
|
|
|
|
|
Precision also affects download size. For instance with larger |
|
|
[minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md) |
|
|
model. The `fp32` is 228M compressed, while only 51M for `fp8_e4m3`, which has competetive |
|
|
quantization values. |
|
|
|
|
|
| precision | dimensions | size | |
|
|
| ------------- | ---------- | ------- | |
|
|
| fp32 | 128 | 228M | |
|
|
| fp16 | 128 | 114M | |
|
|
| **fp8_e4m3** | 128 | **51M** | |
|
|
| fp8_e5m2 | 128 | 44M | |
|
|
|