Spaces:

mycompanyajt
/

inference

Running

File size: 10,891 Bytes

---
title: Embedding Inference API
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# Embedding Inference API

A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI.

## Features

- **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art)
- **RESTful API**: Easy-to-use HTTP endpoints
- **Batch Processing**: Process multiple texts in a single request
- **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.)
- **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment

## Supported Models

| Model | Dimension | Max Tokens | Best For |
|-------|-----------|------------|----------|
| JobBERT v2 | 768 | 512 | Job titles and descriptions |
| JobBERT v3 | 768 | 512 | Job titles (improved performance) |
| Jina AI v3 | 1024 | 8,192 | General text, long documents |
| Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) |

## Quick Start

### Local Development

1. **Install dependencies:**
   ```bash
   cd embedding
   pip install -r requirements.txt
   ```

2. **Run the API:**
   ```bash
   python api.py
   ```

3. **Access the API:**
   - API: http://localhost:7860
   - Docs: http://localhost:7860/docs

### Docker Deployment

1. **Build the image:**
   ```bash
   docker build -t embedding-api .
   ```

2. **Run the container:**
   ```bash
   docker run -p 7860:7860 embedding-api
   ```

3. **With Voyage AI (optional):**
   ```bash
   docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api
   ```

## Hugging Face Spaces Deployment

### Option 1: Using Hugging Face CLI

1. **Install Hugging Face CLI:**
   ```bash
   pip install huggingface_hub
   huggingface-cli login
   ```

2. **Create a new Space:**
   - Go to https://huggingface.co/spaces
   - Click "Create new Space"
   - Choose "Docker" as the Space SDK
   - Name your space (e.g., `your-username/embedding-api`)

3. **Clone and push:**
   ```bash
   git clone https://huggingface.co/spaces/your-username/embedding-api
   cd embedding-api
   
   # Copy files from embedding folder
   cp /path/to/embedding/Dockerfile .
   cp /path/to/embedding/api.py .
   cp /path/to/embedding/requirements.txt .
   cp /path/to/embedding/README.md .
   
   git add .
   git commit -m "Initial commit"
   git push
   ```

4. **Configure environment (optional):**
   - Go to your Space settings
   - Add `VOYAGE_API_KEY` secret if using Voyage AI

### Option 2: Manual Upload

1. Create a new Docker Space on Hugging Face
2. Upload these files:
   - `Dockerfile`
   - `api.py`
   - `requirements.txt`
   - `README.md`
3. Add environment variables in Settings if needed

## API Usage

### Health Check

```bash
curl http://localhost:7860/health
```

Response:
```json
{
  "status": "healthy",
  "models_loaded": ["jobbertv2", "jobbertv3", "jina"],
  "voyage_available": false,
  "api_key_required": false
}
```

### Generate Embeddings (Elasticsearch Compatible)

The main `/embed` endpoint uses Elasticsearch inference API format with model selection via query parameter.

#### Single Text (JobBERT v3 - default)

Without API key:
```bash
curl -X POST "http://localhost:7860/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Software Engineer"
  }'
```

With API key:
```bash
curl -X POST "http://localhost:7860/embed" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "input": "Software Engineer"
  }'
```

Response:
```json
{
  "embedding": [0.123, -0.456, 0.789, ...]
}
```

#### Single Text with Model Selection

```bash
# JobBERT v2
curl -X POST "http://localhost:7860/embed?model=jobbertv2" \
  -H "Content-Type: application/json" \
  -d '{"input": "Data Scientist"}'

# JobBERT v3 (recommended)
curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
  -H "Content-Type: application/json" \
  -d '{"input": "Product Manager"}'

# Jina AI
curl -X POST "http://localhost:7860/embed?model=jina" \
  -H "Content-Type: application/json" \
  -d '{"input": "Machine Learning Engineer"}'
```

#### Multiple Texts (Batch)

```bash
curl -X POST "http://localhost:7860/embed?model=jobbertv3" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Software Engineer", "Data Scientist", "Product Manager"]
  }'
```

Response:
```json
{
  "embeddings": [
    [0.123, -0.456, ...],
    [0.234, -0.567, ...],
    [0.345, -0.678, ...]
  ]
}
```

#### Jina AI with Task Type

```bash
curl -X POST "http://localhost:7860/embed?model=jina&task=retrieval.query" \
  -H "Content-Type: application/json" \
  -d '{"input": "What is machine learning?"}'
```

**Jina AI Tasks (query parameter):**
- `retrieval.query`: For search queries
- `retrieval.passage`: For documents
- `text-matching`: For similarity (default)

#### Voyage AI (requires API key)

```bash
curl -X POST "http://localhost:7860/embed?model=voyage&input_type=document" \
  -H "Content-Type: application/json" \
  -d '{"input": "This is a document to embed"}'
```

**Voyage AI Input Types (query parameter):**
- `document`: For documents/passages
- `query`: For search queries

### Batch Endpoint (Original Format)

For compatibility, the original batch endpoint is still available at `/embed/batch`:

```bash
curl -X POST http://localhost:7860/embed/batch \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Software Engineer", "Data Scientist"],
    "model": "jobbertv3"
  }'
```

Response includes metadata:
```json
{
  "embeddings": [[0.123, ...], [0.234, ...]],
  "model": "jobbertv3",
  "dimension": 768,
  "num_texts": 2
}
```

### List Available Models

```bash
curl http://localhost:7860/models
```

## Python Client Examples

### Elasticsearch-Compatible Format (Recommended)

```python
import requests

BASE_URL = "http://localhost:7860"
API_KEY = "your-api-key-here"  # Optional, only if API key is required

# Headers (include API key if required)
headers = {}
if API_KEY:
    headers["Authorization"] = f"Bearer {API_KEY}"

# Single embedding (JobBERT v3 - default)
response = requests.post(
    f"{BASE_URL}/embed",
    headers=headers,
    json={"input": "Software Engineer"}
)
result = response.json()
embedding = result["embedding"]  # Single vector
print(f"Embedding dimension: {len(embedding)}")

# Single embedding with model selection
response = requests.post(
    f"{BASE_URL}/embed?model=jina",
    headers=headers,
    json={"input": "Data Scientist"}
)
embedding = response.json()["embedding"]

# Batch embeddings
response = requests.post(
    f"{BASE_URL}/embed?model=jobbertv3",
    headers=headers,
    json={"input": ["Software Engineer", "Data Scientist", "Product Manager"]}
)
result = response.json()
embeddings = result["embeddings"]  # List of vectors
print(f"Generated {len(embeddings)} embeddings")

# Jina AI with task
response = requests.post(
    f"{BASE_URL}/embed?model=jina&task=retrieval.query",
    headers=headers,
    json={"input": "What is Python?"}
)

# Voyage AI with input type
response = requests.post(
    f"{BASE_URL}/embed?model=voyage&input_type=document",
    headers=headers,
    json={"input": "Document text here"}
)
```

### Python Client Class with API Key Support

```python
import requests
from typing import List, Union, Optional

class EmbeddingClient:
    def __init__(self, base_url: str, api_key: Optional[str] = None, model: str = "jobbertv3"):
        self.base_url = base_url
        self.api_key = api_key
        self.model = model
        self.headers = {}
        if api_key:
            self.headers["Authorization"] = f"Bearer {api_key}"
    
    def embed(self, text: Union[str, List[str]]) -> Union[List[float], List[List[float]]]:
        """Get embeddings for single text or batch"""
        response = requests.post(
            f"{self.base_url}/embed?model={self.model}",
            headers=self.headers,
            json={"input": text}
        )
        response.raise_for_status()
        result = response.json()
        
        if isinstance(text, str):
            return result["embedding"]
        else:
            return result["embeddings"]

# Usage
client = EmbeddingClient(
    base_url="https://YOUR-SPACE.hf.space",
    api_key="your-api-key-here",  # Optional
    model="jobbertv3"
)

# Single embedding
embedding = client.embed("Software Engineer")
print(f"Dimension: {len(embedding)}")

# Batch embeddings
embeddings = client.embed(["Software Engineer", "Data Scientist"])
print(f"Generated {len(embeddings)} embeddings")
```

### Batch Format (Original)

```python
import requests

url = "http://localhost:7860/embed/batch"

response = requests.post(url, json={
    "texts": ["Software Engineer", "Data Scientist"],
    "model": "jobbertv3"
})
result = response.json()
embeddings = result["embeddings"]
print(f"Model: {result['model']}, Dimension: {result['dimension']}")
```

## Environment Variables

- `PORT`: Server port (default: 7860)
- `API_KEY`: Your API key for authentication (optional, but recommended for production)
- `REQUIRE_API_KEY`: Set to `true` to enable API key authentication (default: `false`)
- `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings)

### Setting Up API Key Authentication

#### Local Development

```bash
# Set environment variables
export API_KEY="your-secret-key-here"
export REQUIRE_API_KEY="true"

# Run the API
python api.py
```

#### Hugging Face Spaces

1. Go to your Space settings
2. Click on "Variables and secrets"
3. Add secrets:
   - Name: `API_KEY`, Value: `your-secret-key-here`
   - Name: `REQUIRE_API_KEY`, Value: `true`
4. Restart your Space

#### Docker

```bash
docker run -p 7860:7860 \
  -e API_KEY="your-secret-key-here" \
  -e REQUIRE_API_KEY="true" \
  embedding-api
```

## Interactive Documentation

Once the API is running, visit:
- **Swagger UI**: http://localhost:7860/docs
- **ReDoc**: http://localhost:7860/redoc

## Notes

- Models are downloaded automatically on first startup (~2-3GB total)
- Voyage AI requires an API key from https://www.voyageai.com/
- First request to each model may be slower due to model loading
- Use batch processing for better performance (send multiple texts at once)

## Troubleshooting

### Models not loading
- Check available disk space (need ~3GB)
- Ensure internet connection for model download
- Check logs for specific error messages

### Voyage AI not working
- Verify `VOYAGE_API_KEY` is set correctly
- Check API key has sufficient credits
- Ensure `voyageai` package is installed

### Out of memory
- Reduce batch size (process fewer texts per request)
- Use smaller models (JobBERT v2 instead of Jina)
- Increase container memory limits

## License

This API uses models with different licenses:
- JobBERT v2/v3: Apache 2.0
- Jina AI: Apache 2.0
- Voyage AI: Subject to Voyage AI terms of service