--- language: zh tags: - embeddings - retrieval - numpy - transformer-free license: mit --- # PipeOwl-1.0 (Geometric Embedding) PipeOwl is a transformer-free geometric embedding package built on a **static embedding field** stored as NumPy arrays. This repo provides: - `L1_base_embeddings.npy`: float32 (V, 1024) embedding table (unit-normalized) - `L1_base_vocab.json`: list of vocab strings aligned to embedding rows - `delta_base_scalar.npy`: float32 (V,) optional scalar bias field - minimal inference engine (`engine.py`) and usage script (`quickstart.py`) --- ## Attribution The base embedding vectors were generated using **BGE (Apache-2.0)** via inference (model outputs). This repository **does not redistribute any original BGE model weights**. --- ## Quickstart ```bash pip install numpy python quickstart.py ``` Or minimal usage: ```python from engine import PipeOwlEngine, PipeOwlConfig engine = PipeOwlEngine(PipeOwlConfig()) q = engine.encode("雪鴞好可愛") # use q for similarity / retrieval ``` Files - data/L1_base_embeddings.npy : embedding table (float32, V×1024) - data/L1_base_vocab.json : vocab aligned with rows - data/delta_base_scalar.npy : scalar bias (float32, V) - engine.py : minimal runtime - quickstart.py : example script Notes No safetensors / pytorch_model.bin is included because this model is distributed as a static NumPy embedding field. --- ## Parameter Size ~165M embedding parameters (static matrix) ## Intended Use - Semantic similarity - Lightweight retrieval - Geometric experimentation ## Limitations - No contextual modeling - No token interaction modeling - Domain performance varies --- ## Stress Test Results (Hard Retrieval Setting) - corpus size = 1200 - eval size = 200 - ood ratio = 0.28 | Model | in-domain MRR@10 | OOD MRR@10 | |--------|-----------------|------------| | MiniLM | 0.019 | 0.026 | | BGE | 0.026 | 0.009 | | PipeOwl | 0.013 | 0.023 | Note: This test uses a harder corpus and adversarial-style queries. Absolute scores are low due to difficulty scaling. See full experimental notes here: --- ```bash pipeowl/ │ ├─ README.md ├─ LICENSE │ ├─ engine.py ├─ quickstart.py │ └─ data/ ├─ L1_base_embeddings.npy ├─ delta_base_scalar.npy └─ L1_base_vocab.json ```