Spaces:

emilbm
/

text2vector

Sleeping

App Files Files Community

text2vector / README.md

emilbm

resolved python version and added hf link to readme

12db3c1 4 months ago

preview code

raw

history blame contribute delete

3.22 kB

	---
	title: Text2vector
	emoji: 📊
	colorFrom: purple
	colorTo: green
	sdk: docker
	pinned: false
	short_description: Create a vector embedding from text
	---

	# Embedding API

	API to call an embedding model ([intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large)) for generating multilingual text embeddings.<br>
	The embedding model takes a text string and converts it into 1024 dimension vector.<br>
	Using a `POST` request to the `/embed` endpoint with a list of texts, the API returns their corresponding embeddings.<br>
	A maximum of 2000 characters per text is enforced to avoid truncation, and thereby loss of information, by the tokenizer.<br>
	Each text must start with either "query: " or "passage: ".<br>

	The API is deployed at a Hugging Face Docker space:<br>
	[https://emilbm-text2vector.hf.space](https://emilbm-text2vector.hf.space)

	Otherwise, the Swagger UI can be acccessed at:<br>
	[https://emilbm-text2vector.hf.space/docs](https://emilbm-text2vector.hf.space/docs)


	## Features

	- FastAPI-based REST API
	- `/embed` endpoint for generating embeddings from a list of texts
	- `/health` endpoint for checking the API status
	- Uses HuggingFace Transformers and PyTorch
	- Includes linting and unit tests
	- Dockerfile for containerization
	- CI/CD with GitHub Actions to build, lint, test, and deploy to Hugging Face

	## Local Development
	### Requirements

	- Python 3.12+
	- [UV](https://docs.astral.sh/uv/)
	- (Optional) Docker
	### Installation

	1. Clone the repository:
	```sh
	git clone https://github.com/EmilbMadsen/embedding-api.git
	cd embedding-api
	```

	2. Create a virtual environment and activate it:
	```sh
	uv venv
	source .venv/bin/activate
	```

	3. Install dependencies:
	```sh
	uv sync
	```

	### Formatting, Linting and Unit Tests
	- Formatting (with Black and Ruff) and linting (with Black, Ruff, and MyPy):
	```sh
	make format
	make lint
	```
	- Run unit tests:
	```sh
	make test
	```

	### Running Locally (without Docker)

	Start the API server with Uvicorn:

	```sh
	uvicorn app.main:app --reload --port 7860
	```

	### Running Locally (with Docker)
	Build and start the API server with Docker:

	```sh
	docker build -t embedding-api .
	docker run -p 7860:7860 embedding-api
	```
	### Test the endpoint
	Test the endpoint with either:
	```sh
	curl -X 'POST' \
	'http://127.0.0.1:7860/embed' \
	-H 'accept: application/json' \
	-H 'Content-Type: application/json' \
	-d '{
	"texts": [
	"query: what is the capital of France?",
	"passage: Paris is the capital of France."
	]
	}'
	```
	Or through the Swagger UI.



	## Usage

	### Embed Endpoint

	- POST `/embed`
	- Request Body:
	```json
	{
	"texts": [
	"query: what is the capital of France?",
	"passage: Paris is the capital of France."
	]
	}
	```
	- Response:
	```json
	{
	"embeddings": [[...], [...]]
	}
	```

	### Health Endpoint

	- GET `/health`
	- Response:
	```json
	{
	"status": "ok"
	}
	```

	## Project Structure

	```
	app/
	main.py # FastAPI app
	embeddings.py # Embedding logic
	models.py # Request/response models
	logger.py # Logging setup
	tests/
	test_api.py # API tests
	test_embeddings.py # Embedding tests
	```