File size: 4,861 Bytes
1c75a25 61b79a7 1c75a25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
---
license: mit
language:
- en
library_name: transformers
tags:
- sentence-transformers
- feature-extraction
- text-embeddings
- semantic-search
- onnx
- transformers.js
- bert
- knowledge-distillation
datasets:
- custom
pipeline_tag: feature-extraction
model-index:
- name: typelevel-bert
results:
- task:
type: retrieval
name: Document Retrieval
dataset:
type: custom
name: FP-Doc Benchmark v1
metrics:
- type: ndcg_at_10
value: 0.853
name: NDCG@10
- type: mrr
value: 0.900
name: MRR
- type: recall_at_10
value: 0.967
name: Recall@10
---
# Typelevel-BERT
A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) to achieve fast client-side inference.
## Highlights
- **93.3%** of teacher model quality (NDCG@10)
- **30x smaller** than teacher (11M vs 335M parameters)
- **10.7 MB** quantized ONNX model
- **1.5ms** inference latency (CPU, seq_len=128)
- Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation
## Model Details
| Property | Value |
|----------|-------|
| **Model Type** | BERT encoder (text embedding) |
| **Architecture** | 4-layer transformer |
| **Hidden Size** | 256 |
| **Attention Heads** | 4 |
| **Parameters** | 11.2M |
| **Embedding Dimension** | 256 |
| **Max Sequence Length** | 512 |
| **Vocabulary** | bert-base-uncased (30,522 tokens) |
| **Pooling** | Mean pooling |
## Usage
### Browser/Node.js (transformers.js)
```javascript
import { pipeline } from '@huggingface/transformers';
// Load the model (downloads automatically)
const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', {
quantized: true, // Use INT8 quantized model (10.7 MB)
});
// Generate embeddings
const embedding = await extractor("How to sequence effects in cats-effect", {
pooling: 'mean',
normalize: true,
});
console.log(embedding.data); // Float32Array(256)
```
### Python (ONNX Runtime)
```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# Download and load quantized model
model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx")
tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert")
session = ort.InferenceSession(model_path)
# Tokenize input
text = "Resource management and safe cleanup"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)
# Run inference
outputs = session.run(None, {
"input_ids": inputs["input_ids"].astype(np.int64),
"attention_mask": inputs["attention_mask"].astype(np.int64),
})
# Model outputs pooled embeddings, just L2 normalize
embedding = outputs[0] # (1, 256)
embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)
```
## Performance
| Metric | Typelevel-BERT | Teacher (BGE-large) | % of Teacher |
|--------|----------------|---------------------|--------------|
| NDCG@10 | 0.853 | 0.915 | 93.3% |
| MRR | 0.900 | 0.963 | 93.5% |
| Recall@10 | 96.7% | 96.7% | 100% |
| Parameters | 11.2M | 335M | 3.3% |
| Model Size | 10.7 MB | ~1.2 GB | 0.9% |
| Latency (CPU) | 1.5ms | ~15ms | 10x faster |
## Training
- **Teacher Model**: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings)
- **Training Data**: 30,598 text chunks from Typelevel ecosystem documentation
- **Distillation Method**: Knowledge distillation with MSE + cosine similarity loss
- **Hardware**: Apple M3 Max (MPS)
## Intended Use
This model is designed for:
- **Semantic search** in functional programming documentation
- **Document retrieval** for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe)
- **Browser-based inference** via transformers.js or ONNX Runtime Web
- **Client-side embeddings** for privacy-preserving search applications
## Limitations
1. **Domain Specialization**: Optimized for FP documentation; may underperform on general text
2. **English Only**: Trained exclusively on English documentation
3. **Vocabulary**: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized
## Files
| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | 42.6 MB | PyTorch weights |
| `onnx/model.onnx` | 42.4 MB | Full precision ONNX |
| `onnx/model_quantized.onnx` | 10.7 MB | INT8 quantized ONNX |
| `config.json` | - | Model configuration |
| `tokenizer.json` | - | Fast tokenizer |
| `vocab.txt` | - | Vocabulary file |
## Citation
```bibtex
@misc{typelevel-bert,
title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search},
author={Daniel Spiewak},
year={2025},
url={https://huggingface.co/djspiewak/typelevel-bert}
}
```
## License
MIT
|