Typelevel-BERT

A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from BAAI/bge-large-en-v1.5 to achieve fast client-side inference.

Highlights

93.3% of teacher model quality (NDCG@10)
30x smaller than teacher (11M vs 335M parameters)
10.7 MB quantized ONNX model
1.5ms inference latency (CPU, seq_len=128)
Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation

Model Details

Property	Value
Model Type	BERT encoder (text embedding)
Architecture	4-layer transformer
Hidden Size	256
Attention Heads	4
Parameters	11.2M
Embedding Dimension	256
Max Sequence Length	512
Vocabulary	bert-base-uncased (30,522 tokens)
Pooling	Mean pooling

Usage

Browser/Node.js (transformers.js)

import { pipeline } from '@huggingface/transformers';

// Load the model (downloads automatically)
const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', {
  quantized: true,  // Use INT8 quantized model (10.7 MB)
});

// Generate embeddings
const embedding = await extractor("How to sequence effects in cats-effect", {
  pooling: 'mean',
  normalize: true,
});

console.log(embedding.data);  // Float32Array(256)

Python (ONNX Runtime)

import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download and load quantized model
model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx")
tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert")
session = ort.InferenceSession(model_path)

# Tokenize input
text = "Resource management and safe cleanup"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)

# Run inference
outputs = session.run(None, {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64),
})

# Model outputs pooled embeddings, just L2 normalize
embedding = outputs[0]  # (1, 256)
embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)

Performance

Metric	Typelevel-BERT	Teacher (BGE-large)	% of Teacher
NDCG@10	0.853	0.915	93.3%
MRR	0.900	0.963	93.5%
Recall@10	96.7%	96.7%	100%
Parameters	11.2M	335M	3.3%
Model Size	10.7 MB	~1.2 GB	0.9%
Latency (CPU)	1.5ms	~15ms	10x faster

Training

Teacher Model: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings)
Training Data: 30,598 text chunks from Typelevel ecosystem documentation
Distillation Method: Knowledge distillation with MSE + cosine similarity loss
Hardware: Apple M3 Max (MPS)

Intended Use

This model is designed for:

Semantic search in functional programming documentation
Document retrieval for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe)
Browser-based inference via transformers.js or ONNX Runtime Web
Client-side embeddings for privacy-preserving search applications

Limitations

Domain Specialization: Optimized for FP documentation; may underperform on general text
English Only: Trained exclusively on English documentation
Vocabulary: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized

Files

File	Size	Description
`model.safetensors`	42.6 MB	PyTorch weights
`onnx/model.onnx`	42.4 MB	Full precision ONNX
`onnx/model_quantized.onnx`	10.7 MB	INT8 quantized ONNX
`config.json`	-	Model configuration
`tokenizer.json`	-	Fast tokenizer
`vocab.txt`	-	Vocabulary file

Citation

@misc{typelevel-bert,
  title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search},
  author={Daniel Spiewak},
  year={2025},
  url={https://huggingface.co/djspiewak/typelevel-bert}
}

License

MIT

Downloads last month: 6

Safetensors

Model size

11.2M params

Tensor type

F32

Evaluation results

NDCG@10 on FP-Doc Benchmark v1
self-reported

0.853
MRR on FP-Doc Benchmark v1
self-reported

0.900
Recall@10 on FP-Doc Benchmark v1
self-reported

0.967