Typelevel-BERT
A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from BAAI/bge-large-en-v1.5 to achieve fast client-side inference.
Highlights
- 93.3% of teacher model quality (NDCG@10)
- 30x smaller than teacher (11M vs 335M parameters)
- 10.7 MB quantized ONNX model
- 1.5ms inference latency (CPU, seq_len=128)
- Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation
Model Details
| Property | Value |
|---|---|
| Model Type | BERT encoder (text embedding) |
| Architecture | 4-layer transformer |
| Hidden Size | 256 |
| Attention Heads | 4 |
| Parameters | 11.2M |
| Embedding Dimension | 256 |
| Max Sequence Length | 512 |
| Vocabulary | bert-base-uncased (30,522 tokens) |
| Pooling | Mean pooling |
Usage
Browser/Node.js (transformers.js)
import { pipeline } from '@huggingface/transformers';
// Load the model (downloads automatically)
const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', {
quantized: true, // Use INT8 quantized model (10.7 MB)
});
// Generate embeddings
const embedding = await extractor("How to sequence effects in cats-effect", {
pooling: 'mean',
normalize: true,
});
console.log(embedding.data); // Float32Array(256)
Python (ONNX Runtime)
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# Download and load quantized model
model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx")
tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert")
session = ort.InferenceSession(model_path)
# Tokenize input
text = "Resource management and safe cleanup"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)
# Run inference
outputs = session.run(None, {
"input_ids": inputs["input_ids"].astype(np.int64),
"attention_mask": inputs["attention_mask"].astype(np.int64),
})
# Model outputs pooled embeddings, just L2 normalize
embedding = outputs[0] # (1, 256)
embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)
Performance
| Metric | Typelevel-BERT | Teacher (BGE-large) | % of Teacher |
|---|---|---|---|
| NDCG@10 | 0.853 | 0.915 | 93.3% |
| MRR | 0.900 | 0.963 | 93.5% |
| Recall@10 | 96.7% | 96.7% | 100% |
| Parameters | 11.2M | 335M | 3.3% |
| Model Size | 10.7 MB | ~1.2 GB | 0.9% |
| Latency (CPU) | 1.5ms | ~15ms | 10x faster |
Training
- Teacher Model: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings)
- Training Data: 30,598 text chunks from Typelevel ecosystem documentation
- Distillation Method: Knowledge distillation with MSE + cosine similarity loss
- Hardware: Apple M3 Max (MPS)
Intended Use
This model is designed for:
- Semantic search in functional programming documentation
- Document retrieval for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe)
- Browser-based inference via transformers.js or ONNX Runtime Web
- Client-side embeddings for privacy-preserving search applications
Limitations
- Domain Specialization: Optimized for FP documentation; may underperform on general text
- English Only: Trained exclusively on English documentation
- Vocabulary: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized
Files
| File | Size | Description |
|---|---|---|
model.safetensors |
42.6 MB | PyTorch weights |
onnx/model.onnx |
42.4 MB | Full precision ONNX |
onnx/model_quantized.onnx |
10.7 MB | INT8 quantized ONNX |
config.json |
- | Model configuration |
tokenizer.json |
- | Fast tokenizer |
vocab.txt |
- | Vocabulary file |
Citation
@misc{typelevel-bert,
title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search},
author={Daniel Spiewak},
year={2025},
url={https://huggingface.co/djspiewak/typelevel-bert}
}
License
MIT
- Downloads last month
- 18
Evaluation results
- NDCG@10 on FP-Doc Benchmark v1self-reported0.853
- MRR on FP-Doc Benchmark v1self-reported0.900
- Recall@10 on FP-Doc Benchmark v1self-reported0.967