typelevel-bert / README.md
djspiewak's picture
Upload README.md with huggingface_hub
61b79a7 verified
---
license: mit
language:
- en
library_name: transformers
tags:
- sentence-transformers
- feature-extraction
- text-embeddings
- semantic-search
- onnx
- transformers.js
- bert
- knowledge-distillation
datasets:
- custom
pipeline_tag: feature-extraction
model-index:
- name: typelevel-bert
results:
- task:
type: retrieval
name: Document Retrieval
dataset:
type: custom
name: FP-Doc Benchmark v1
metrics:
- type: ndcg_at_10
value: 0.853
name: NDCG@10
- type: mrr
value: 0.900
name: MRR
- type: recall_at_10
value: 0.967
name: Recall@10
---
# Typelevel-BERT
A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) to achieve fast client-side inference.
## Highlights
- **93.3%** of teacher model quality (NDCG@10)
- **30x smaller** than teacher (11M vs 335M parameters)
- **10.7 MB** quantized ONNX model
- **1.5ms** inference latency (CPU, seq_len=128)
- Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation
## Model Details
| Property | Value |
|----------|-------|
| **Model Type** | BERT encoder (text embedding) |
| **Architecture** | 4-layer transformer |
| **Hidden Size** | 256 |
| **Attention Heads** | 4 |
| **Parameters** | 11.2M |
| **Embedding Dimension** | 256 |
| **Max Sequence Length** | 512 |
| **Vocabulary** | bert-base-uncased (30,522 tokens) |
| **Pooling** | Mean pooling |
## Usage
### Browser/Node.js (transformers.js)
```javascript
import { pipeline } from '@huggingface/transformers';
// Load the model (downloads automatically)
const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', {
quantized: true, // Use INT8 quantized model (10.7 MB)
});
// Generate embeddings
const embedding = await extractor("How to sequence effects in cats-effect", {
pooling: 'mean',
normalize: true,
});
console.log(embedding.data); // Float32Array(256)
```
### Python (ONNX Runtime)
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# Download and load quantized model
model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx")
tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert")
session = ort.InferenceSession(model_path)
# Tokenize input
text = "Resource management and safe cleanup"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)
# Run inference
outputs = session.run(None, {
"input_ids": inputs["input_ids"].astype(np.int64),
"attention_mask": inputs["attention_mask"].astype(np.int64),
})
# Model outputs pooled embeddings, just L2 normalize
embedding = outputs[0] # (1, 256)
embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)
```
## Performance
| Metric | Typelevel-BERT | Teacher (BGE-large) | % of Teacher |
|--------|----------------|---------------------|--------------|
| NDCG@10 | 0.853 | 0.915 | 93.3% |
| MRR | 0.900 | 0.963 | 93.5% |
| Recall@10 | 96.7% | 96.7% | 100% |
| Parameters | 11.2M | 335M | 3.3% |
| Model Size | 10.7 MB | ~1.2 GB | 0.9% |
| Latency (CPU) | 1.5ms | ~15ms | 10x faster |
## Training
- **Teacher Model**: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings)
- **Training Data**: 30,598 text chunks from Typelevel ecosystem documentation
- **Distillation Method**: Knowledge distillation with MSE + cosine similarity loss
- **Hardware**: Apple M3 Max (MPS)
## Intended Use
This model is designed for:
- **Semantic search** in functional programming documentation
- **Document retrieval** for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe)
- **Browser-based inference** via transformers.js or ONNX Runtime Web
- **Client-side embeddings** for privacy-preserving search applications
## Limitations
1. **Domain Specialization**: Optimized for FP documentation; may underperform on general text
2. **English Only**: Trained exclusively on English documentation
3. **Vocabulary**: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized
## Files
| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | 42.6 MB | PyTorch weights |
| `onnx/model.onnx` | 42.4 MB | Full precision ONNX |
| `onnx/model_quantized.onnx` | 10.7 MB | INT8 quantized ONNX |
| `config.json` | - | Model configuration |
| `tokenizer.json` | - | Fast tokenizer |
| `vocab.txt` | - | Vocabulary file |
## Citation
```bibtex
@misc{typelevel-bert,
title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search},
author={Daniel Spiewak},
year={2025},
url={https://huggingface.co/djspiewak/typelevel-bert}
}
```
## License
MIT