Alpine Embeddings

Run embedding models locally on Alpine Linux in Docker using Node.js β€” zero Python, zero glibc, pure WASM inference.

What is this?

A lightweight REST API that generates text embeddings using transformer models. Runs entirely in a ~100MB Docker container on Alpine Linux with no native dependencies.

Feature Detail
Runtime Node.js 20 on Alpine Linux
Inference ONNX Runtime (WASM backend) via @xenova/transformers
Model Xenova/bge-small-en-v1.5 (384 dimensions, 32MB)
API REST (Express.js)
Image size ~150MB
Startup ~5s (model loads from cache)
Latency ~50-100ms per embedding

Quick Start

# Clone
git clone https://huggingface.co/asusf15/alpine-embeddings
cd alpine-embeddings

# Build
docker build -t embeddings .

# Run
docker run -p 3000:3000 embeddings

Server starts on http://localhost:3000. Model loads automatically on first boot (~5s).

API

POST /embed

Generate embeddings for text.

Request:

{"text": "Hello world"}

Batch request:

{"text": ["Hello world", "Another sentence", "Third one"]}

Response:

{
  "embeddings": [[0.0011, -0.0146, 0.0203, ...]],
  "dims": 384,
  "model": "Xenova/bge-small-en-v1.5",
  "elapsed_ms": 52
}

GET /health

Health check.

{"status": "ready", "model": "Xenova/bge-small-en-v1.5"}

Usage Examples

cURL (Linux/Mac)

curl -X POST http://localhost:3000/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'

PowerShell (Windows)

Invoke-RestMethod -Uri http://localhost:3000/embed -Method Post -ContentType "application/json" -Body '{"text": "Hello world"}'

JavaScript (fetch)

const res = await fetch('http://localhost:3000/embed', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: ['Hello', 'World'] })
});
const { embeddings } = await res.json();
console.log(embeddings[0].length); // 384

Python (requests)

import requests
r = requests.post('http://localhost:3000/embed', json={"text": "Hello world"})
embedding = r.json()["embeddings"][0]  # 384-dim vector

Configuration

Set via environment variables:

Variable Default Description
PORT 3000 Server port
MODEL Xenova/bge-small-en-v1.5 HuggingFace model ID
# Use a different model
docker run -p 3000:3000 -e MODEL=Xenova/all-MiniLM-L6-v2 embeddings

# Different port
docker run -p 8080:8080 -e PORT=8080 embeddings

Available Models

Any ONNX model from the Xenova collection works:

Model Dims Size Quality
Xenova/bge-small-en-v1.5 384 32MB ⭐ Best for English
Xenova/all-MiniLM-L6-v2 384 22MB Good, smallest
Xenova/bge-base-en-v1.5 768 110MB Higher quality
Xenova/multilingual-e5-small 384 113MB Multilingual

How It Works

The key challenge: onnxruntime-node (native ONNX runtime) requires glibc, but Alpine uses musl libc. The solution:

  1. Install @xenova/transformers (which depends on onnxruntime-node)
  2. Stub onnxruntime-node to re-export onnxruntime-web instead
  3. onnxruntime-web uses WASM β€” runs on any OS/libc
  4. Set numThreads = 1 (WASM workers not needed in Node.js server)
  5. Copy WASM binaries to where transformers.js expects them

This gives you the full transformer inference pipeline (tokenizer + model) running in pure WASM on Alpine.

Project Structure

β”œβ”€β”€ Dockerfile       # Alpine + Node 20, WASM stubbing
β”œβ”€β”€ package.json     # @xenova/transformers + onnxruntime-web + express
β”œβ”€β”€ server.js        # Express API with /embed and /health
β”œβ”€β”€ preload.js       # (Optional) Pre-download model during build
└── .dockerignore    # Exclude node_modules from context

Sharing

Push to Docker Hub

docker tag embeddings yourusername/alpine-embeddings:latest
docker push yourusername/alpine-embeddings:latest

Others can run it directly

docker pull yourusername/alpine-embeddings:latest
docker run -p 3000:3000 yourusername/alpine-embeddings:latest

Share as code (this repo)

git clone https://huggingface.co/asusf15/alpine-embeddings
cd alpine-embeddings
docker build -t embeddings .
docker run -p 3000:3000 embeddings

Docker Compose

version: '3.8'
services:
  embeddings:
    build: .
    ports:
      - "3000:3000"
    environment:
      - MODEL=Xenova/bge-small-en-v1.5
    restart: unless-stopped
docker compose up -d

Limits

  • Max 128 texts per request
  • Max 10MB request body
  • Single-threaded WASM (~50-100ms per text)
  • First request after cold start takes ~5s (model loading)

License

MIT

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support