File size: 3,239 Bytes
196e0f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
language:
  - en
license: mit
library_name: gobed
tags:
  - embeddings
  - semantic-search
  - int8
  - quantized
  - static-embeddings
  - sentence-embeddings
pipeline_tag: sentence-similarity
---

# Bed - Int8 Quantized Static Embeddings for Semantic Search

Ultra-fast int8 quantized static embeddings model for semantic search. Optimized for the [gobed](https://github.com/lee101/gobed) Go library.

## Model Details

| Property | Value |
|----------|-------|
| **Dimensions** | 512 |
| **Precision** | int8 + scale factors |
| **Vocabulary** | 30,522 tokens |
| **Model Size** | 15 MB |
| **Format** | safetensors |

## Performance

- **Embedding latency**: 0.16ms average
- **Throughput**: 6,200+ embeddings/sec
- **Memory**: 15 MB (7.9x smaller than float32 version)
- **Compression ratio**: 87.4% space reduction vs original

## Usage with gobed (Go)

```bash
go get github.com/lee101/gobed
```

```go
package main

import (
    "fmt"
    "log"
    "github.com/lee101/gobed"
)

func main() {
    engine, err := gobed.NewAutoSearchEngine()
    if err != nil {
        log.Fatal(err)
    }
    defer engine.Close()

    docs := map[string]string{
        "doc1": "machine learning and neural networks",
        "doc2": "natural language processing",
    }
    engine.AddDocuments(docs)

    results, _, _ := engine.SearchWithMetadata("AI research", 3)
    for _, r := range results {
        fmt.Printf("[%.3f] %s\n", r.Similarity, r.Content)
    }
}
```

## Download Model Manually

```bash
# Clone the model repository
git clone https://huggingface.co/lee101/bed

# Or download specific files
wget https://huggingface.co/lee101/bed/resolve/main/modelint8_512dim.safetensors
wget https://huggingface.co/lee101/bed/resolve/main/tokenizer.json
```

## Using huggingface_hub (Python)

```python
from huggingface_hub import hf_hub_download

# Download model file
model_path = hf_hub_download(repo_id="lee101/bed", filename="modelint8_512dim.safetensors")

# Download tokenizer
tokenizer_path = hf_hub_download(repo_id="lee101/bed", filename="tokenizer.json")
```

## Model Architecture

This model uses static embeddings with int8 quantization:

- **Embedding layer**: 30,522 x 512 int8 weights
- **Scale factors**: 30,522 float32 scale values (one per token)
- **Tokenizer**: WordPiece tokenizer (same as BERT)

Embeddings are computed by:
1. Tokenizing input text
2. Looking up int8 embeddings for each token
3. Multiplying by scale factors to reconstruct float values
4. Mean pooling across tokens

## Quantization Details

Original model: 30,522 x 1024 float32 (119 MB)
Quantized model: 30,522 x 512 int8 + 30,522 float32 scales (15 MB)

Per-vector quantization preserves relative magnitudes:
```python
max_abs = max(abs(embedding_vector))
scale = max_abs / 127.0
quantized = round(embedding_vector / scale).astype(int8)
```

## Files

- `modelint8_512dim.safetensors` - Quantized embeddings and scales
- `tokenizer.json` - HuggingFace tokenizer

## License

MIT License - see [gobed repository](https://github.com/lee101/gobed) for details.

## Citation

```bibtex
@software{gobed,
  author = {Lee Penkman},
  title = {gobed: Ultra-Fast Semantic Search for Go},
  url = {https://github.com/lee101/gobed},
  year = {2024}
}
```