File size: 4,861 Bytes

---
license: mit
language:
- en
library_name: transformers
tags:
- sentence-transformers
- feature-extraction
- text-embeddings
- semantic-search
- onnx
- transformers.js
- bert
- knowledge-distillation
datasets:
- custom
pipeline_tag: feature-extraction
model-index:
- name: typelevel-bert
  results:
  - task:
      type: retrieval
      name: Document Retrieval
    dataset:
      type: custom
      name: FP-Doc Benchmark v1
    metrics:
    - type: ndcg_at_10
      value: 0.853
      name: NDCG@10
    - type: mrr
      value: 0.900
      name: MRR
    - type: recall_at_10
      value: 0.967
      name: Recall@10
---

# Typelevel-BERT

A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) to achieve fast client-side inference.

## Highlights

- **93.3%** of teacher model quality (NDCG@10)
- **30x smaller** than teacher (11M vs 335M parameters)
- **10.7 MB** quantized ONNX model
- **1.5ms** inference latency (CPU, seq_len=128)
- Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation

## Model Details

| Property | Value |
|----------|-------|
| **Model Type** | BERT encoder (text embedding) |
| **Architecture** | 4-layer transformer |
| **Hidden Size** | 256 |
| **Attention Heads** | 4 |
| **Parameters** | 11.2M |
| **Embedding Dimension** | 256 |
| **Max Sequence Length** | 512 |
| **Vocabulary** | bert-base-uncased (30,522 tokens) |
| **Pooling** | Mean pooling |

## Usage

### Browser/Node.js (transformers.js)

```javascript
import { pipeline } from '@huggingface/transformers';

// Load the model (downloads automatically)
const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', {
  quantized: true,  // Use INT8 quantized model (10.7 MB)
});

// Generate embeddings
const embedding = await extractor("How to sequence effects in cats-effect", {
  pooling: 'mean',
  normalize: true,
});

console.log(embedding.data);  // Float32Array(256)
```

### Python (ONNX Runtime)

```python
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download and load quantized model
model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx")
tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert")
session = ort.InferenceSession(model_path)

# Tokenize input
text = "Resource management and safe cleanup"
inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)

# Run inference
outputs = session.run(None, {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64),
})

# Model outputs pooled embeddings, just L2 normalize
embedding = outputs[0]  # (1, 256)
embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)
```

## Performance

| Metric | Typelevel-BERT | Teacher (BGE-large) | % of Teacher |
|--------|----------------|---------------------|--------------|
| NDCG@10 | 0.853 | 0.915 | 93.3% |
| MRR | 0.900 | 0.963 | 93.5% |
| Recall@10 | 96.7% | 96.7% | 100% |
| Parameters | 11.2M | 335M | 3.3% |
| Model Size | 10.7 MB | ~1.2 GB | 0.9% |
| Latency (CPU) | 1.5ms | ~15ms | 10x faster |

## Training

- **Teacher Model**: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings)
- **Training Data**: 30,598 text chunks from Typelevel ecosystem documentation
- **Distillation Method**: Knowledge distillation with MSE + cosine similarity loss
- **Hardware**: Apple M3 Max (MPS)

## Intended Use

This model is designed for:
- **Semantic search** in functional programming documentation
- **Document retrieval** for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe)
- **Browser-based inference** via transformers.js or ONNX Runtime Web
- **Client-side embeddings** for privacy-preserving search applications

## Limitations

1. **Domain Specialization**: Optimized for FP documentation; may underperform on general text
2. **English Only**: Trained exclusively on English documentation
3. **Vocabulary**: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized

## Files

| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | 42.6 MB | PyTorch weights |
| `onnx/model.onnx` | 42.4 MB | Full precision ONNX |
| `onnx/model_quantized.onnx` | 10.7 MB | INT8 quantized ONNX |
| `config.json` | - | Model configuration |
| `tokenizer.json` | - | Fast tokenizer |
| `vocab.txt` | - | Vocabulary file |

## Citation

```bibtex
@misc{typelevel-bert,
  title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search},
  author={Daniel Spiewak},
  year={2025},
  url={https://huggingface.co/djspiewak/typelevel-bert}
}
```

## License

MIT