PipeOwl-1.5-Japanese (Geometric Embedding)

A transformer-free semantic retrieval engine.

PipeOwl performs deterministic vocabulary scoring over a static embedding field:

score = α⋅base + β⋅Δfield

where:

  • base = cosine similarity in embedding space
  • Δfield = static scalar field bias

Features:

  • O(n) over vocabulary.
  • No attention.
  • No transformer weights.
  • CPU-friendly (<40MB model)

Architecture

  • Static embedding table (V × D)
  • Aligned vocabulary index
  • Optional scalar bias field (Δfield)
  • Linear scoring
  • Pluggable decoder stage
  • Targeted for CPU environments and low-latency systems (e.g. IME).

Model Specs

item value
vocab size 26155
embedding dim 768
storage format safetensors (FP16)
model size ~38.7 MB
languages Japanese
startup time <1s
query latency ~3-4 ms (CPU, full vocabulary scan)

Attribution

DATA_SOURCES.md

Quickstart

git clone https://huggingface.co/WangKaiLin/PipeOwl-1.5-jp
cd PipeOwl-1.5-jp

pip install numpy safetensors

python quickstart.py

Example:

Example semantic retrieval results:

Please enter words: 日

Top-K Tokens:
0.974 | 日
0.794 | 日の
0.789 | 翌日
0.777 | 週
0.775 | 週間

Please enter words: 行

Top-K Tokens:
0.961 | 行
0.794 | 行こ
0.787 | 執り行
0.787 | 入
0.784 | 起

Please enter words: 東京

Top-K Tokens:
0.979 | 東京
0.872 | 大阪
0.868 | 名古屋
0.849 | 横浜
0.848 | 目黒

Benchmark (CPU)

Environment:

  • Vocab size: 26,155
  • Embedding dimension: 768
  • Hardware: CPU

Average query latency:

  • PipeOwl: 0.0036 sec
  • BM25: 0.0421 sec
  • Embedding: 0.0283 sec
  • FAISS Flat: 0.0324 sec
  • FAISS HNSW: 0.0230 sec
Comparison Speedup
vs BM25 11.7× faster
vs Embedding 7.9× faster
vs FAISS Flat 9.0× faster
vs FAISS HNSW 6.4× faster

PipeOwl shows 6–12× lower latency compared with common retrieval baselines in this setup.

repo: https://huggingface.co/datasets/WangKaiLin/pipeowl-1.5-jp-benchmark

Repository Structure

pipeowl-1.5-jp/
 ├ README.md
 ├ config.json
 ├ DATA_SOURCES.md 
 ├ LICENSE
 ├ quickstart.py
 ├ engine.py
 ├ vocabulary.json
 └ pipeowl_fp16.safetensors

LICENSE

MIT

Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WangKaiLin/PipeOwl-1.5-jp