PipeOwl
Collection
A transformer-free semantic retrieval engine. • 6 items • Updated
A transformer-free semantic retrieval engine.
PipeOwl performs deterministic vocabulary scoring over a static embedding field:
score = α⋅base + β⋅Δfield
where:
Features:
| item | value |
|---|---|
| vocab size | 495,090 |
| embedding dim | 1024 |
| storage format | safetensors |
| model size | ~2.03 GB |
| languages | multilingual (Chinese / English dominant) |
| startup time | ~30s |
| query latency | ~103-104 ms |
git clone https://huggingface.co/WangKaiLin/PipeOwl-1.3-multilingual
cd PipeOwl-1.3-multilingual
pip install numpy safetensors
python quickstart.py
https://hackmd.io/@galaxy4552/SyWQ92cFWx
Please enter words: 雪鴞
Top-K Tokens:
1.004 | 雪鴞
0.823 | 鴟鴞
0.820 | 鴞
0.700 | 長耳鴞
0.686 | 雪橇
Please enter words: happy
Top-K Tokens:
0.998 | happy
0.888 | happiness
0.863 | heureux
0.857 | happyness
0.854 | gelukkig
pipeowl-1.3-multilingual/
├ README.md
├ config.json
├ DATA_SOURCES.md
├ LICENSE
├ quickstart.py
├ engine.py
├ vocabulary.json
└ pipeowl.safetensors
PipeOwl-1.3 uses a mixed multilingual vocabulary containing:
Total vocabulary size: 495k tokens
All tokens share the same embedding field.
核心公式:
score = α⋅base + β⋅Δfield
其中:
提供一種 O(n) 的輕量語義計分方法, 適合低延遲環境(如輸入法)。
MIT