Spaces:

BinKhoaLe1812
/

Embedding

Sleeping

Embedding / README.md

Init commit

ddb9445 3 months ago

1.29 kB

	---
	title: Embedding
	emoji: 🐠
	colorFrom: purple
	colorTo: gray
	sdk: docker
	pinned: false
	short_description: Simple API run sentence-transformers/all-MiniLM-L6-v2
	---

	# Embedder Service (HuggingFace Space)

	A lightweight microservice exposing sentence-transformers embeddings over HTTP.

	- Model: `sentence-transformers/all-MiniLM-L6-v2`
	- Sequential queueing: handles one request at a time to avoid resource spikes.

	## Endpoints

	- `GET /health` → `{ ok: true, model: string, loaded: boolean }`
	- `POST /embed`
	- Request:

	```
	{
	"texts": ["hello world", "another document"]
	}
	```

	- Response:

	```
	{
	"vectors": [[0.01, -0.02, ...], [0.03, -0.01, ...]],
	"model": "sentence-transformers/all-MiniLM-L6-v2"
	}
	```

	## Deploy on HF Spaces

	1. Create a new Space (Docker type)
	2. Upload `app.py`, `Dockerfile`, `requirements.txt`
	3. Set Space hardware to CPU (Small is fine)
	4. Space will run on port 7860 by default

	## Example cURL

	```
	curl -s -X POST https://binkhoale1812-embedding.hf.space/embed \
	-H 'Content-Type: application/json' \
	-d '{"texts": ["An embedding request", "Second input"]}' \| jq .
	```

	## Notes

	- The service lazily loads the model on first request.
	- If concurrent clients hit it, requests are serialized by a semaphore to reduce memory and CPU spikes.