PipeOwl
Collection
A transformer-free semantic retrieval engine. • 6 items • Updated
This release introduces FP16 storage to reduce model size and startup time.
A transformer-free semantic retrieval engine.
PipeOwl performs deterministic vocabulary scoring over a static embedding field:
score = α⋅base + β⋅Δfield
where:
Features:
| item | value |
|---|---|
| vocab size | 495,090 |
| embedding dim | 1024 |
| storage format | safetensors (FP16) |
| model size | ~1.01 GB |
| languages | multilingual (Chinese / English dominant) |
| startup time | ~2s |
| query latency | ~101-105 ms (CPU) |
git clone https://huggingface.co/WangKaiLin/PipeOwl-1.4-multilingual
cd PipeOwl-1.4-multilingual
pip install numpy safetensors
python quickstart.py
https://hackmd.io/@galaxy4552/SyWQ92cFWx
Please enter words: 雪鴞
Top-K Tokens:
1.004 | 雪鴞
0.823 | 鴟鴞
0.820 | 鴞
0.700 | 長耳鴞
0.686 | 雪橇
Please enter words: happy
Top-K Tokens:
0.998 | happy
0.888 | happiness
0.863 | heureux
0.857 | happyness
0.854 | gelukkig
pipeowl-1.4-multilingual/
├ README.md
├ config.json
├ DATA_SOURCES.md
├ LICENSE
├ quickstart.py
├ engine.py
├ vocabulary.json
└ pipeowl_fp16.safetensors
PipeOwl-1.4 uses a mixed multilingual vocabulary containing:
Total vocabulary size: 495k tokens
All tokens share the same embedding field.
核心公式:
score = α⋅base + β⋅Δfield
其中:
提供一種 O(n) 的輕量語義計分方法, 適合低延遲環境(如輸入法)。
MIT