typelevel-bert / README.md

Upload README.md with huggingface_hub

61b79a7 verified 1 day ago

4.86 kB

	---
	license: mit
	language:
	- en
	library_name: transformers
	tags:
	- sentence-transformers
	- feature-extraction
	- text-embeddings
	- semantic-search
	- onnx
	- transformers.js
	- bert
	- knowledge-distillation
	datasets:
	- custom
	pipeline_tag: feature-extraction
	model-index:
	- name: typelevel-bert
	results:
	- task:
	type: retrieval
	name: Document Retrieval
	dataset:
	type: custom
	name: FP-Doc Benchmark v1
	metrics:
	- type: ndcg_at_10
	value: 0.853
	name: NDCG@10
	- type: mrr
	value: 0.900
	name: MRR
	- type: recall_at_10
	value: 0.967
	name: Recall@10
	---

	# Typelevel-BERT

	A compact, browser-deployable text embedding model specialized for searching Typelevel/FP documentation. Distilled from [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) to achieve fast client-side inference.

	## Highlights

	- 93.3% of teacher model quality (NDCG@10)
	- 30x smaller than teacher (11M vs 335M parameters)
	- 10.7 MB quantized ONNX model
	- 1.5ms inference latency (CPU, seq_len=128)
	- Optimized for Cats, Cats Effect, FS2, http4s, Doobie, Circe documentation

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model Type \| BERT encoder (text embedding) \|
	\| Architecture \| 4-layer transformer \|
	\| Hidden Size \| 256 \|
	\| Attention Heads \| 4 \|
	\| Parameters \| 11.2M \|
	\| Embedding Dimension \| 256 \|
	\| Max Sequence Length \| 512 \|
	\| Vocabulary \| bert-base-uncased (30,522 tokens) \|
	\| Pooling \| Mean pooling \|

	## Usage

	### Browser/Node.js (transformers.js)

	```javascript
	import { pipeline } from '@huggingface/transformers';

	// Load the model (downloads automatically)
	const extractor = await pipeline('feature-extraction', 'djspiewak/typelevel-bert', {
	quantized: true, // Use INT8 quantized model (10.7 MB)
	});

	// Generate embeddings
	const embedding = await extractor("How to sequence effects in cats-effect", {
	pooling: 'mean',
	normalize: true,
	});

	console.log(embedding.data); // Float32Array(256)
	```

	### Python (ONNX Runtime)

	```python
	import onnxruntime as ort
	import numpy as np
	from transformers import AutoTokenizer
	from huggingface_hub import hf_hub_download

	# Download and load quantized model
	model_path = hf_hub_download("djspiewak/typelevel-bert", "onnx/model_quantized.onnx")
	tokenizer = AutoTokenizer.from_pretrained("djspiewak/typelevel-bert")
	session = ort.InferenceSession(model_path)

	# Tokenize input
	text = "Resource management and safe cleanup"
	inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True)

	# Run inference
	outputs = session.run(None, {
	"input_ids": inputs["input_ids"].astype(np.int64),
	"attention_mask": inputs["attention_mask"].astype(np.int64),
	})

	# Model outputs pooled embeddings, just L2 normalize
	embedding = outputs[0] # (1, 256)
	embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)
	```

	## Performance

	\| Metric \| Typelevel-BERT \| Teacher (BGE-large) \| % of Teacher \|
	\|--------\|----------------\|---------------------\|--------------\|
	\| NDCG@10 \| 0.853 \| 0.915 \| 93.3% \|
	\| MRR \| 0.900 \| 0.963 \| 93.5% \|
	\| Recall@10 \| 96.7% \| 96.7% \| 100% \|
	\| Parameters \| 11.2M \| 335M \| 3.3% \|
	\| Model Size \| 10.7 MB \| ~1.2 GB \| 0.9% \|
	\| Latency (CPU) \| 1.5ms \| ~15ms \| 10x faster \|

	## Training

	- Teacher Model: BAAI/bge-large-en-v1.5 (335M parameters, 1024-dim embeddings)
	- Training Data: 30,598 text chunks from Typelevel ecosystem documentation
	- Distillation Method: Knowledge distillation with MSE + cosine similarity loss
	- Hardware: Apple M3 Max (MPS)

	## Intended Use

	This model is designed for:
	- Semantic search in functional programming documentation
	- Document retrieval for Typelevel ecosystem libraries (Cats, Cats Effect, FS2, http4s, Doobie, Circe)
	- Browser-based inference via transformers.js or ONNX Runtime Web
	- Client-side embeddings for privacy-preserving search applications

	## Limitations

	1. Domain Specialization: Optimized for FP documentation; may underperform on general text
	2. English Only: Trained exclusively on English documentation
	3. Vocabulary: Uses bert-base-uncased vocabulary; some FP-specific terms may be suboptimally tokenized

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `model.safetensors` \| 42.6 MB \| PyTorch weights \|
	\| `onnx/model.onnx` \| 42.4 MB \| Full precision ONNX \|
	\| `onnx/model_quantized.onnx` \| 10.7 MB \| INT8 quantized ONNX \|
	\| `config.json` \| - \| Model configuration \|
	\| `tokenizer.json` \| - \| Fast tokenizer \|
	\| `vocab.txt` \| - \| Vocabulary file \|

	## Citation

	```bibtex
	@misc{typelevel-bert,
	title={Typelevel-BERT: Distilled Text Embeddings for FP Documentation Search},
	author={Daniel Spiewak},
	year={2025},
	url={https://huggingface.co/djspiewak/typelevel-bert}
	}
	```

	## License

	MIT