metadata
language:
- zh
tags:
- embeddings
- retrieval
- transformer-free
- safetensors
- edge-ai
license: mit
PipeOwl-1.6-tw (Geometric Embedding)
A transformer-free semantic retrieval engine.
PipeOwl performs deterministic vocabulary scoring over a static embedding field:
score = α⋅base + β⋅Δfield
where:
- base = cosine similarity in embedding space
- Δfield = static scalar field bias
Features:
- O(n) over vocabulary.
- No attention.
- No transformer weights.
Architecture
- Static embedding table (V × D)
- Aligned vocabulary index
- Optional scalar bias field (Δfield)
- Linear scoring
- Pluggable decoder stage
- Targeted for CPU environments and low-latency systems (e.g. IME).
Model Specs
| item | value |
|---|---|
| vocab size | 161783 |
| embedding dim | 1024 |
| storage format | safetensors (FP16) |
| model size | ~316 MB |
| languages | tw |
| startup time | ~1s |
| query latency | ~33-35 ms |
Attribution
Quickstart
git clone https://huggingface.co/WangKaiLin/PipeOwl-1.6-tw
cd PipeOwl-1.6-tw
pip install numpy safetensors
python quickstart.py
Example:
Example semantic retrieval results:
請輸入句子: 今天
Top-K Tokens:
0.960 | 今天
0.893 | 今日
0.829 | 本日
0.827 | 今
0.822 | 這天
請輸入句子: 好睏
Top-K Tokens:
0.834 | 睏
0.834 | 好
0.792 | 好不
0.787 | 不錯
0.776 | 睏倦
請輸入句子: 水餃
Top-K Tokens:
0.962 | 水餃
0.831 | 餃
0.788 | 餃子
0.785 | 蒸餃
0.678 | 水漬
Repository Structure
pipeowl-1.6-tw/
├ README.md
├ config.json
├ DATA_SOURCES.md
├ LICENSE
├ quickstart.py
├ engine.py
├ vocabulary.json
└ pipeowl_fp16.safetensors
LICENSE
MIT