Pipeowl-1.10-multilingual (Geometric Embedding)

A transformer-free semantic retrieval engine.

Features:

  • O(n) over vocabulary.
  • No attention.
  • No transformer weights.

Architecture

  • Static embedding table (V × D)
  • Aligned vocabulary index
  • Linear scoring
  • Pluggable decoder stage

Model Specs

item value
token size 734803
embedding dim 512
storage format safetensors (FP16)
data size ~728 MB
languages multilingual
startup time ~912 ms
query latency ~65-72 ms

Quickstart

git clone https://huggingface.co/WangKaiLin/PipeOwl-1.10-multilingual
cd PipeOwl-1.10-multilingual

pip install numpy safetensors

python quickstart.py

Example:

Example semantic retrieval results:

請輸入句子: 確實

Top-K Tokens:
1.000 | 確實
0.871 | 的確
0.848 | 确实
0.825 | 確かに
0.796 | дійсно

請輸入句子: 今天好想睡覺

Top-K Tokens:
0.711 | 今天
0.691 | 今天的
0.677 | 睡觉
0.658 | 睡覺
0.653 | 今日は

請輸入句子: i want to sleep

Top-K Tokens:
0.735 | sleep
0.686 | спать
0.671 | schlafen
0.642 | tidur
0.638 | want

請輸入句子: 哈囉你好阿

Top-K Tokens:
0.823 | 哈囉
0.808 | 你好
0.777 | こんにちは
0.767 | 嘿
0.765 | 嗨

Repository Structure

PipeOwl-1.10-multilingual/
 ├ README.md
 ├ config.json
 ├ LICENSE
 ├ quickstart.py
 ├ engine.py
 ├ tokenizer.json
 └ pipeowl.safetensors

LICENSE

MIT

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collections including WangKaiLin/PipeOwl-1.10-multilingual