PipeOwl
Collection
A transformer-free semantic retrieval engine. • 6 items • Updated
A transformer-free semantic retrieval engine.
PipeOwl performs deterministic vocabulary scoring over a static embedding field:
score = α⋅base + β⋅Δfield
where:
Features:
| item | value |
|---|---|
| vocab size | 26155 |
| embedding dim | 768 |
| storage format | safetensors (FP16) |
| model size | ~38.7 MB |
| languages | Japanese |
| startup time | <1s |
| query latency | ~3-4 ms (CPU, full vocabulary scan) |
git clone https://huggingface.co/WangKaiLin/PipeOwl-1.5-jp
cd PipeOwl-1.5-jp
pip install numpy safetensors
python quickstart.py
Example semantic retrieval results:
Please enter words: 日
Top-K Tokens:
0.974 | 日
0.794 | 日の
0.789 | 翌日
0.777 | 週
0.775 | 週間
Please enter words: 行
Top-K Tokens:
0.961 | 行
0.794 | 行こ
0.787 | 執り行
0.787 | 入
0.784 | 起
Please enter words: 東京
Top-K Tokens:
0.979 | 東京
0.872 | 大阪
0.868 | 名古屋
0.849 | 横浜
0.848 | 目黒
Environment:
Average query latency:
| Comparison | Speedup |
|---|---|
| vs BM25 | 11.7× faster |
| vs Embedding | 7.9× faster |
| vs FAISS Flat | 9.0× faster |
| vs FAISS HNSW | 6.4× faster |
PipeOwl shows 6–12× lower latency compared with common retrieval baselines in this setup.
repo: https://huggingface.co/datasets/WangKaiLin/pipeowl-1.5-jp-benchmark
pipeowl-1.5-jp/
├ README.md
├ config.json
├ DATA_SOURCES.md
├ LICENSE
├ quickstart.py
├ engine.py
├ vocabulary.json
└ pipeowl_fp16.safetensors
MIT