colbert-xm-for-inference-api / ADDITIONAL_README.md

add ADDITIONAL_README.md

acc9003 over 1 year ago

1.12 kB

	# Multilingual Colbert embeddings as a service

	## Goal

	- Deploy [Antoine Louis](https://huggingface.co/antoinelouis)' [colbert-xm](https://huggingface.co/antoinelouis/colbert-xm) as an inference service: text(s) in, vector(s) out

	## Motivation

	- use the service in a broader RAG solution

	## Steps followed

	- Clone the original repo following [this procedure](https://huggingface.co/docs/hub/repositories-next-steps#how-to-duplicate-or-fork-a-repo-including-lfs-pointers)
	- Add a custom handler script as described [here](https://huggingface.co/docs/inference-endpoints/guides/custom_handler)

	## Local development and testing

	### Build and start docker container hf_endpoints_emulator

	See [hf_endpoints_emulator](https://pypi.org/project/hf-endpoints-emulator/)

	````bash
	docker-compose up -d --build
	````

	This can take a few moments to load, given the size of the model (> 3 GB)!

	## How to test locally

	```bash
	./embed_single_query.sh
	./embed_two_chunks.sh
	```

	```bash
	docker-compose exec hf_endpoints_emulator pytest
	```

	## Check output

	```bash
	docker-compose logs --follow hf_endpoints_emulator
	```