PipeOwl
Collection
A transformer-free semantic retrieval engine. • 12 items • Updated
A transformer-free semantic retrieval engine.
PipeOwl performs deterministic vocabulary scoring over a static embedding field:
score = α⋅base + (1 - α⋅base)⋅Δfield
token NLL: 12.943284891453972
where:
Features:
| item | value |
|---|---|
| vocab size | 26155 |
| embedding dim | 256 |
| storage format | safetensors (FP16) |
| model size | ~13.2 MB |
| languages | Japanese |
| startup time | <1s |
| query latency | ~1 ms (CPU, full vocabulary scan) |
git clone https://huggingface.co/WangKaiLin/Pipeowl-1.8.3-jp-Whitebox
cd Pipeowl-1.8.3-jp-Whitebox
pip install numpy safetensors
python debug.py
Example semantic retrieval results:
Please enter words: 東京
Top-K Debug:
1 東京 | base=1.000 | delta=0.478 | final=1.000
2 は | base=-0.294 | delta=0.907 | final=0.880
3 大阪 | base=0.679 | delta=0.346 | final=0.790
4 パリ | base=0.597 | delta=0.419 | final=0.766
5 名古屋 | base=0.646 | delta=0.284 | final=0.747
Please enter words: 大阪
Top-K Debug:
1 大阪 | base=1.000 | delta=0.346 | final=1.000
2 は | base=-0.200 | delta=0.907 | final=0.889
3 東京 | base=0.679 | delta=0.478 | final=0.832
4 関西 | base=0.756 | delta=0.252 | final=0.817
5 尼崎 | base=0.710 | delta=0.367 | final=0.816
Pipeowl-1.8.3-jp-Whitebox/
├ README.md
├ config.json
├ DATA_SOURCES.md
├ debug.py
├ LICENSE
├ quickstart.py
├ engine.py
├ vocabulary.json
└ pipeowl_fp16.safetensors
MIT
Base model
WangKaiLin/PipeOwl-1.5-jp