PipeOwl-1.4-multilingual (Geometric Embedding)

This release introduces FP16 storage to reduce model size and startup time.

A transformer-free semantic retrieval engine.

PipeOwl performs deterministic vocabulary scoring over a static embedding field:

score = α⋅base + β⋅Δfield

where:

base = cosine similarity in embedding space
Δfield = static scalar field bias

Features:

O(n) over vocabulary.
No attention.
No transformer weights.

Changes in 1.4

Embedding storage converted from FP32 to FP16
Model size reduced from ~1.9GB → ~1.01GB
Startup time improved from ~30s → ~2s
Same scoring pipeline

Architecture

Static embedding table (V × D)
Aligned vocabulary index
Optional scalar bias field
Linear scoring
Pluggable decoder stage
Targeted for CPU environments and low-latency systems (e.g. IME).

Model Specs

item	value
vocab size	495,090
embedding dim	1024
storage format	safetensors (FP16)
model size	~1.01 GB
languages	multilingual (Chinese / English dominant)
startup time	~2s
query latency	~101-105 ms (CPU)

Attribution

DATA_SOURCES.md

Quickstart

git clone https://huggingface.co/WangKaiLin/PipeOwl-1.4-multilingual
cd PipeOwl-1.4-multilingual

pip install numpy safetensors

python quickstart.py

See full experimental notes here:

https://hackmd.io/@galaxy4552/SyWQ92cFWx

Example:

Please enter words： 雪鴞

Top-K Tokens:
1.004 | 雪鴞
0.823 | 鴟鴞
0.820 | 鴞
0.700 | 長耳鴞
0.686 | 雪橇

Please enter words： happy

Top-K Tokens:
0.998 | happy
0.888 | happiness
0.863 | heureux
0.857 | happyness
0.854 | gelukkig

Repository Structure

pipeowl-1.4-multilingual/
 ├ README.md
 ├ config.json
 ├ DATA_SOURCES.md 
 ├ LICENSE
 ├ quickstart.py
 ├ engine.py
 ├ vocabulary.json
 └ pipeowl_fp16.safetensors

Multilingual Vocabulary

PipeOwl-1.4 uses a mixed multilingual vocabulary containing:

Chinese words
English words
Mathematical symbols
Symbolic / byte fallback tokens

Total vocabulary size: 495k tokens

All tokens share the same embedding field.

PipeOwl 是一個基於靜態語義場的幾何檢索系統。

核心公式：

score = α⋅base + β⋅Δfield

其中：

base = embedding cosine similarity
delta = 靜態場偏移量
α / β 為可調權重

提供一種 O(n) 的輕量語義計分方法，適合低延遲環境（如輸入法）。

LICENSE

MIT

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WangKaiLin/PipeOwl-1.4-multilingual

PipeOwl

Collection

A transformer-free semantic retrieval engine. • 13 items • Updated Apr 25