| # Multilingual Colbert embeddings as a service | |
| ## Goal | |
| - Deploy [Antoine Louis](https://huggingface.co/antoinelouis)' [colbert-xm](https://huggingface.co/antoinelouis/colbert-xm) as an inference service: text(s) in, vector(s) out | |
| ## Motivation | |
| - use the service in a broader RAG solution | |
| ## Steps followed | |
| - Clone the original repo following [this procedure](https://huggingface.co/docs/hub/repositories-next-steps#how-to-duplicate-or-fork-a-repo-including-lfs-pointers) | |
| - Add a custom handler script as described [here](https://huggingface.co/docs/inference-endpoints/guides/custom_handler) | |
| ## Local development and testing | |
| ### Build and start docker container hf_endpoints_emulator | |
| See [hf_endpoints_emulator](https://pypi.org/project/hf-endpoints-emulator/) | |
| ````bash | |
| docker-compose up -d --build | |
| ```` | |
| This can take a few moments to load, given the size of the model (> 3 GB)! | |
| ## How to test locally | |
| ```bash | |
| ./embed_single_query.sh | |
| ./embed_two_chunks.sh | |
| ``` | |
| ```bash | |
| docker-compose exec hf_endpoints_emulator pytest | |
| ``` | |
| ## Check output | |
| ```bash | |
| docker-compose logs --follow hf_endpoints_emulator | |
| ``` | |