Spaces:
Sleeping
Sleeping
File size: 1,735 Bytes
a144c98 d64245f 502fed8 a144c98 502fed8 a144c98 502fed8 5fc354b 502fed8 5fc354b 54d3a50 5fc354b af3f8c6 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 5fc354b 502fed8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ---
title: FastEmbed EN Embeddings
emoji: 🚀
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
---
# FastEmbed Code Embeddings Server
CPU-optimized embedding server using **FastEmbed** with ONNX quantized models.
## Models
Models:
- Dense: BAAI/bge-base-en-v1.5 (768 dim)
- Sparse: Qdrant/bm25 (BM25, 0.01GB)
- Reranker: jinaai/jina-reranker-v1-turbo-en (0.13GB)
**Total: ~0.78 GB** - Fits easily in CPU Basic (2 vCPU, 16GB RAM)
## API Endpoints
### Dense Embeddings
```bash
curl -X POST https://YOUR_SPACE.hf.space/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": ["def hello(): pass", "class Foo: ..."], "model": "code-embed"}'
```
### Sparse BM25 Embeddings
```bash
curl -X POST https://YOUR_SPACE.hf.space/v1/sparse/embeddings \
-H "Content-Type: application/json" \
-d '{"input": ["search query", "document text"]}'
```
### Hybrid Search Embeddings
```bash
curl -X POST https://YOUR_SPACE.hf.space/v1/hybrid/embeddings \
-H "Content-Type: application/json" \
-d '{"input": ["code snippet"]}'
```
### Reranking
```bash
curl -X POST https://YOUR_SPACE.hf.space/v1/rerank \
-H "Content-Type: application/json" \
-d '{"query": "python async function", "documents": ["doc1", "doc2", "doc3"]}'
```
## Features
- **ONNX Runtime**: Optimized CPU inference, no PyTorch overhead
- **Model Caching**: Models loaded once, reused across requests
- **Hybrid Search**: Dense + sparse (BM25) for better retrieval
- **Code-Optimized**: `jina-embeddings-v2-base-code` specifically trained for code
## Performance
Compared to PyTorch-based SentenceTransformers:
- **5-10x faster** on CPU
- **5x smaller** model footprint
- **Lower latency**: ONNX quantization + caching |