Typelevel-BERT

A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from BAAI/bge-large-en-v1.5 to achieve fast client-side inference.

Highlights

  • 93.3% of teacher model quality (NDCG@10)
  • 30x smaller than teacher (11M vs 335M parameters)
  • 10.7 MB quantized ONNX model
  • 1.5ms inference latency (CPU, seq_len=128)
  • Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation

Model Details

Property Value
Model Type BERT encoder (text embedding)
Architecture 4-layer transformer
Hidden Size 256
Attention Heads 4
Parameters 11.2M
Embedding Dimension 256
Max Sequence Length 512
Vocabulary bert-base-uncased (30,522 tokens)
Pooling Mean pooling

Usage

Browser/Node.js (transformers.js)

import { pipeline } from '@huggingface/transformers';

// Load the model (downloads automatically)
const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', {
  quantized: true,  // Use INT8 quantized model (10.7 MB)
});

// Generate embeddings
const embedding = await extractor("How to sequence effects in cats-effect", {
  pooling: 'mean',
  normalize: true,
});

console.log(embedding.data);  // Float32Array(256)

Python (ONNX Runtime)

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download and load quantized model
model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx")
tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert")
session = ort.InferenceSession(model_path)

# Tokenize input
text = "Resource management and safe cleanup"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)

# Run inference
outputs = session.run(None, {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64),
})

# Model outputs pooled embeddings, just L2 normalize
embedding = outputs[0]  # (1, 256)
embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)

Performance

Metric Typelevel-BERT Teacher (BGE-large) % of Teacher
NDCG@10 0.853 0.915 93.3%
MRR 0.900 0.963 93.5%
Recall@10 96.7% 96.7% 100%
Parameters 11.2M 335M 3.3%
Model Size 10.7 MB ~1.2 GB 0.9%
Latency (CPU) 1.5ms ~15ms 10x faster

Training

  • Teacher Model: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings)
  • Training Data: 30,598 text chunks from Typelevel ecosystem documentation
  • Distillation Method: Knowledge distillation with MSE + cosine similarity loss
  • Hardware: Apple M3 Max (MPS)

Intended Use

This model is designed for:

  • Semantic search in functional programming documentation
  • Document retrieval for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe)
  • Browser-based inference via transformers.js or ONNX Runtime Web
  • Client-side embeddings for privacy-preserving search applications

Limitations

  1. Domain Specialization: Optimized for FP documentation; may underperform on general text
  2. English Only: Trained exclusively on English documentation
  3. Vocabulary: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized

Files

File Size Description
model.safetensors 42.6 MB PyTorch weights
onnx/model.onnx 42.4 MB Full precision ONNX
onnx/model_quantized.onnx 10.7 MB INT8 quantized ONNX
config.json - Model configuration
tokenizer.json - Fast tokenizer
vocab.txt - Vocabulary file

Citation

@misc{typelevel-bert,
  title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search},
  author={Daniel Spiewak},
  year={2025},
  url={https://huggingface.co/djspiewak/typelevel-bert}
}

License

MIT

Downloads last month
18
Safetensors
Model size
11.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results