File size: 3,239 Bytes
196e0f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
language:
- en
license: mit
library_name: gobed
tags:
- embeddings
- semantic-search
- int8
- quantized
- static-embeddings
- sentence-embeddings
pipeline_tag: sentence-similarity
---
# Bed - Int8 Quantized Static Embeddings for Semantic Search
Ultra-fast int8 quantized static embeddings model for semantic search. Optimized for the [gobed](https://github.com/lee101/gobed) Go library.
## Model Details
| Property | Value |
|----------|-------|
| **Dimensions** | 512 |
| **Precision** | int8 + scale factors |
| **Vocabulary** | 30,522 tokens |
| **Model Size** | 15 MB |
| **Format** | safetensors |
## Performance
- **Embedding latency**: 0.16ms average
- **Throughput**: 6,200+ embeddings/sec
- **Memory**: 15 MB (7.9x smaller than float32 version)
- **Compression ratio**: 87.4% space reduction vs original
## Usage with gobed (Go)
```bash
go get github.com/lee101/gobed
```
```go
package main
import (
"fmt"
"log"
"github.com/lee101/gobed"
)
func main() {
engine, err := gobed.NewAutoSearchEngine()
if err != nil {
log.Fatal(err)
}
defer engine.Close()
docs := map[string]string{
"doc1": "machine learning and neural networks",
"doc2": "natural language processing",
}
engine.AddDocuments(docs)
results, _, _ := engine.SearchWithMetadata("AI research", 3)
for _, r := range results {
fmt.Printf("[%.3f] %s\n", r.Similarity, r.Content)
}
}
```
## Download Model Manually
```bash
# Clone the model repository
git clone https://huggingface.co/lee101/bed
# Or download specific files
wget https://huggingface.co/lee101/bed/resolve/main/modelint8_512dim.safetensors
wget https://huggingface.co/lee101/bed/resolve/main/tokenizer.json
```
## Using huggingface_hub (Python)
```python
from huggingface_hub import hf_hub_download
# Download model file
model_path = hf_hub_download(repo_id="lee101/bed", filename="modelint8_512dim.safetensors")
# Download tokenizer
tokenizer_path = hf_hub_download(repo_id="lee101/bed", filename="tokenizer.json")
```
## Model Architecture
This model uses static embeddings with int8 quantization:
- **Embedding layer**: 30,522 x 512 int8 weights
- **Scale factors**: 30,522 float32 scale values (one per token)
- **Tokenizer**: WordPiece tokenizer (same as BERT)
Embeddings are computed by:
1. Tokenizing input text
2. Looking up int8 embeddings for each token
3. Multiplying by scale factors to reconstruct float values
4. Mean pooling across tokens
## Quantization Details
Original model: 30,522 x 1024 float32 (119 MB)
Quantized model: 30,522 x 512 int8 + 30,522 float32 scales (15 MB)
Per-vector quantization preserves relative magnitudes:
```python
max_abs = max(abs(embedding_vector))
scale = max_abs / 127.0
quantized = round(embedding_vector / scale).astype(int8)
```
## Files
- `modelint8_512dim.safetensors` - Quantized embeddings and scales
- `tokenizer.json` - HuggingFace tokenizer
## License
MIT License - see [gobed repository](https://github.com/lee101/gobed) for details.
## Citation
```bibtex
@software{gobed,
author = {Lee Penkman},
title = {gobed: Ultra-Fast Semantic Search for Go},
url = {https://github.com/lee101/gobed},
year = {2024}
}
```
|