Mozilla
/

static-embeddings

static-embeddings

Model card Files Files and versions

static-embeddings / README.md

gregtatum's picture

Add the initial models

f7fef32 4 months ago

|

history blame contribute delete

2.27 kB

	---
	tags:
	- static-embeddings
	---
	# Static Embeddings

	This project contains multilingual static embeddings that are appropriate for generating
	quick embeddings in edge devices. They are re-packaged from other projects in production
	ready assets.

	## Models

	* [minishlab/potion-retrieval-32M/](models/minishlab/potion-retrieval-32M/README.md)
	* [minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md)
	* [sentence-transformers/static-retrieval-mrl-en-v1/](models/sentence-transformers/static-retrieval-mrl-en-v1/README.md)
	* [sentence-transformers/static-similarity-mrl-multilingual-v1](models/sentence-transformers/static-similarity-mrl-multilingual-v1/README.md)

	## Updating

	Add models to `scripts/build_models.py`.

	```sh
	# Install dependencies and login to huggingface:
	pipx install huggingface_hub
	huggingface-cli login

	# Re-build the models:
	uv run scripts/build_models.py

	# Version control:
	git add .
	git commit -m 'Updated the models'
	git push
	git tag v1.0.0 -m 'Model release description'
	git push origin tag v1.0.0

	# Upload the models
	uv run scripts/upload_models.py --tag v1.0.0
	```

	## Precision

	For static embeddings and cosine similarity, precision isn't as important. For an end
	to end to test in Firefox on some vectors here was the cosine similarity for the same
	mean pooled result. Note that the vector math happens in the f32 space, but storage
	for the embeddings is in a lower precision.

	> f32 vs f16: cosine similarity = 1.00000000<br/>
	> → They are essentially identical in direction.
	>
	> f32 vs f8: cosine similarity = 0.99956375<br/>
	> → Very close, only tiny quantization effects.

	Note that this was done on the `torch.float8_e4m3fn`, while `torch.float8_e5m2` generally
	has more loss.

	Precision also affects download size. For instance with larger
	[minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md)
	model. The `fp32` is 228M compressed, while only 51M for `fp8_e4m3`, which has competetive
	quantization values.

	\| precision \| dimensions \| size \|
	\| ------------- \| ---------- \| ------- \|
	\| fp32 \| 128 \| 228M \|
	\| fp16 \| 128 \| 114M \|
	\| fp8_e4m3 \| 128 \| 51M \|
	\| fp8_e5m2 \| 128 \| 44M \|