PipeOwl-1.3-multilingual(Geometric Embedding)

A transformer-free semantic retrieval engine.

PipeOwl performs deterministic vocabulary scoring over a static embedding field:

score = α⋅base + β⋅Δfield

where:

  • base = cosine similarity in embedding space
  • Δfield = static scalar field bias

Features:

  • O(n) over vocabulary.
  • No attention.
  • No transformer weights.

Architecture

  • Static embedding table (V × D)
  • Aligned vocabulary index
  • Optional scalar bias field
  • Linear scoring
  • Pluggable decoder stage
  • Targeted for CPU environments and low-latency systems (e.g. IME).

Model Specs

item value
vocab size 495,090
embedding dim 1024
storage format safetensors
model size ~2.03 GB
languages multilingual (Chinese / English dominant)
startup time ~30s
query latency ~103-104 ms

Attribution

DATA_SOURCES.md

Quickstart

git clone https://huggingface.co/WangKaiLin/PipeOwl-1.3-multilingual
cd PipeOwl-1.3-multilingual

pip install numpy safetensors

python quickstart.py

See full experimental notes here:

https://hackmd.io/@galaxy4552/SyWQ92cFWx

Example:

Please enter words: 雪鴞

Top-K Tokens:
1.004 | 雪鴞
0.823 | 鴟鴞
0.820 | 鴞
0.700 | 長耳鴞
0.686 | 雪橇

Please enter words: happy

Top-K Tokens:
0.998 | happy
0.888 | happiness
0.863 | heureux
0.857 | happyness
0.854 | gelukkig

Repository Structure

pipeowl-1.3-multilingual/
 ├ README.md
 ├ config.json
 ├ DATA_SOURCES.md 
 ├ LICENSE
 ├ quickstart.py
 ├ engine.py
 ├ vocabulary.json
 └ pipeowl.safetensors

Multilingual Vocabulary

PipeOwl-1.3 uses a mixed multilingual vocabulary containing:

  • Chinese words
  • English words
  • Mathematical symbols
  • Symbolic / byte fallback tokens

Total vocabulary size: 495k tokens

All tokens share the same embedding field.

PipeOwl 是一個基於靜態語義場的幾何檢索系統。

核心公式:

score = α⋅base + β⋅Δfield

其中:

  • base = embedding cosine similarity
  • delta = 靜態場偏移量
  • α / β 為可調權重

提供一種 O(n) 的輕量語義計分方法, 適合低延遲環境(如輸入法)。

LICENSE

MIT

Downloads last month
78
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WangKaiLin/PipeOwl-1.3-multilingual