[Announce] polars-luxical package
#2
by
permutans
- opened
I made a Polars extension to use the model: https://github.com/lmmx/polars-luxical on PyPI at https://pypi.org/project/polars-luxical/
It’s similar in spirit to polars-fastembed (a Polars extension wrapping the fastembed Rust crate) and by running the same benchmark from that repo it can be shown to be the fastest available model as far as I can see:
- polars-luxical embeds the 708 Python PEPs at 0.5ms per 1k tokens on CPU, or 1.8s total runtime [including Python interpreter startup etc]
- for comparison Snowflake Arctic Embed XS on GPU is 3.5ms/1kT and All-Mini-LM-V6 is 8ms/1kT
Because the model is not used for search (but can be used for deduplication) I added a ‘half match demo’ script to the benchmark subdir, on which it achieves about 97% accuracy at matching Python PEP halves, similar to the experiment described in the blog post.