PipeOwl-1.6-tw (Geometric Embedding)

A transformer-free semantic retrieval engine.

PipeOwl performs deterministic vocabulary scoring over a static embedding field:

score = α⋅base + β⋅Δfield

where:

  • base = cosine similarity in embedding space
  • Δfield = static scalar field bias

Features:

  • O(n) over vocabulary.
  • No attention.
  • No transformer weights.

Architecture

  • Static embedding table (V × D)
  • Aligned vocabulary index
  • Optional scalar bias field (Δfield)
  • Linear scoring
  • Pluggable decoder stage
  • Targeted for CPU environments and low-latency systems (e.g. IME).

Model Specs

item value
vocab size 161783
embedding dim 1024
storage format safetensors (FP16)
model size ~316 MB
languages tw
startup time ~1s
query latency ~33-35 ms

Attribution

DATA_SOURCES.md

Quickstart

git clone https://huggingface.co/WangKaiLin/PipeOwl-1.6-tw
cd PipeOwl-1.6-tw

pip install numpy safetensors

python quickstart.py

Example:

Example semantic retrieval results:

請輸入句子: 今天

Top-K Tokens:
0.960 | 今天
0.893 | 今日
0.829 | 本日
0.827 | 今
0.822 | 這天

請輸入句子: 好睏

Top-K Tokens:
0.834 | 睏
0.834 | 好
0.792 | 好不
0.787 | 不錯
0.776 | 睏倦

請輸入句子: 水餃

Top-K Tokens:
0.962 | 水餃
0.831 | 餃
0.788 | 餃子
0.785 | 蒸餃
0.678 | 水漬

Repository Structure

pipeowl-1.6-tw/
 ├ README.md
 ├ config.json
 ├ DATA_SOURCES.md 
 ├ LICENSE
 ├ quickstart.py
 ├ engine.py
 ├ vocabulary.json
 └ pipeowl_fp16.safetensors

LICENSE

MIT

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WangKaiLin/PipeOwl-1.6-tw