--- license: mit language: - en library_name: transformers tags: - sentence-transformers - feature-extraction - text-embeddings - semantic-search - onnx - transformers.js - bert - knowledge-distillation datasets: - custom pipeline_tag: feature-extraction model-index: - name: typelevel-bert results: - task: type: retrieval name: Document Retrieval dataset: type: custom name: FP-Doc Benchmark v1 metrics: - type: ndcg_at_10 value: 0.853 name: NDCG@10 - type: mrr value: 0.900 name: MRR - type: recall_at_10 value: 0.967 name: Recall@10 --- # Typelevel-BERT A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) to achieve fast client-side inference. ## Highlights - **93.3%** of teacher model quality (NDCG@10) - **30x smaller** than teacher (11M vs 335M parameters) - **10.7 MB** quantized ONNX model - **1.5ms** inference latency (CPU, seq_len=128) - Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation ## Model Details | Property | Value | |----------|-------| | **Model Type** | BERT encoder (text embedding) | | **Architecture** | 4-layer transformer | | **Hidden Size** | 256 | | **Attention Heads** | 4 | | **Parameters** | 11.2M | | **Embedding Dimension** | 256 | | **Max Sequence Length** | 512 | | **Vocabulary** | bert-base-uncased (30,522 tokens) | | **Pooling** | Mean pooling | ## Usage ### Browser/Node.js (transformers.js) ```javascript import { pipeline } from '@huggingface/transformers'; // Load the model (downloads automatically) const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', { quantized: true, // Use INT8 quantized model (10.7 MB) }); // Generate embeddings const embedding = await extractor("How to sequence effects in cats-effect", { pooling: 'mean', normalize: true, }); console.log(embedding.data); // Float32Array(256) ``` ### Python (ONNX Runtime) ```python import onnxruntime as ort import numpy as np from transformers import AutoTokenizer from huggingface_hub import hf_hub_download # Download and load quantized model model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx") tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert") session = ort.InferenceSession(model_path) # Tokenize input text = "Resource management and safe cleanup" inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True) # Run inference outputs = session.run(None, { "input_ids": inputs["input_ids"].astype(np.int64), "attention_mask": inputs["attention_mask"].astype(np.int64), }) # Model outputs pooled embeddings, just L2 normalize embedding = outputs[0] # (1, 256) embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True) ``` ## Performance | Metric | Typelevel-BERT | Teacher (BGE-large) | % of Teacher | |--------|----------------|---------------------|--------------| | NDCG@10 | 0.853 | 0.915 | 93.3% | | MRR | 0.900 | 0.963 | 93.5% | | Recall@10 | 96.7% | 96.7% | 100% | | Parameters | 11.2M | 335M | 3.3% | | Model Size | 10.7 MB | ~1.2 GB | 0.9% | | Latency (CPU) | 1.5ms | ~15ms | 10x faster | ## Training - **Teacher Model**: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings) - **Training Data**: 30,598 text chunks from Typelevel ecosystem documentation - **Distillation Method**: Knowledge distillation with MSE + cosine similarity loss - **Hardware**: Apple M3 Max (MPS) ## Intended Use This model is designed for: - **Semantic search** in functional programming documentation - **Document retrieval** for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe) - **Browser-based inference** via transformers.js or ONNX Runtime Web - **Client-side embeddings** for privacy-preserving search applications ## Limitations 1. **Domain Specialization**: Optimized for FP documentation; may underperform on general text 2. **English Only**: Trained exclusively on English documentation 3. **Vocabulary**: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized ## Files | File | Size | Description | |------|------|-------------| | `model.safetensors` | 42.6 MB | PyTorch weights | | `onnx/model.onnx` | 42.4 MB | Full precision ONNX | | `onnx/model_quantized.onnx` | 10.7 MB | INT8 quantized ONNX | | `config.json` | - | Model configuration | | `tokenizer.json` | - | Fast tokenizer | | `vocab.txt` | - | Vocabulary file | ## Citation ```bibtex @misc{typelevel-bert, title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search}, author={Daniel Spiewak}, year={2025}, url={https://huggingface.co/djspiewak/typelevel-bert} } ``` ## License MIT