PIXIE-Glyph-v1.0
PIXIE-Glyph-v1.0 is a jina-clip-based text–image embedding model trained for Korean and English multimodal retrieval, developed by TelePIX Co., Ltd. PIXIE stands for TelePIX Intelligent Embedding, representing TelePIX’s high-performance embedding technology. This model is optimized for text-to-image retrieval on visually rich content—especially figures and tables in academic papers. In addition to strong retrieval quality, PIXIE-Glyph-v1.0 is designed for practical deployment, offering fast retrieval latency and dimension-flexible embeddings via Matryoshka Representation Learning (MRL).
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 (default), truncatable down to 64 with MRL
- Similarity Function: Cosine Similarity
- Language: Multilingual — optimized for high performance in Korean and English
- Domain Specialization: Academic figure/table retrieval
- License: cc-by-nc-4.0
PIXIE-Glyph-v1.0 supports Matryoshka Representation Learning (MRL), enabling users to truncate the embedding dimensionality (e.g., 1024 → 512 → 256 → 128 → 64) to reduce storage, bandwidth, and similarity-compute cost, while preserving strong retrieval performance.
Quality Benchmarks
PIXIE-Glyph-v1.0 delivers strong text-to-image retrieval performance and is particularly effective for academic figure/table search and visual document retrieval. The table below reports:
- nDCG@5 (higher is better), measuring ranking quality against ground-truth relevance.
- Latency (ms/query) measured on SciCap with H100 (single GPU) and batch=128. Latency here denotes the average per-query retrieval time over a 20,000-item corpus under the evaluation setup described.
Benchmark Overview and Dataset Descriptions
| Model Name | # params | latency | SciCap (ko,en) | KoViDoRe (v1) | EnViDoRe (v1,v2) |
|---|---|---|---|---|---|
| telepix/PIXIE-Glyph-v1.0 | 0.9B | 1.60 | 0.3957 | 0.1962 | 0.4688 |
| nomic-ai/colnomic-embed-multimodal-3b | 3B | 503.00 | 0.4758 | 0.7755 | 0.8097 |
| vidore/colSmol-500M | 0.5B | 59.45 | 0.2045 | 0.1549 | 0.7042 |
| vidore/colSmol-256M | 0.3B | 40.25 | 0.1701 | 0.1244 | 0.6491 |
| jinaai/jina-clip-v2 | 0.9B | 1.60 | 0.2491 | 0.1311 | 0.4544 |
To help interpret the evaluation results above, we summarize the intent and characteristics of each benchmark.
SciCap (ko, en)
SciCap is constructed from the SCICAP figure–caption dataset introduced in “SciCap: Generating Captions for Scientific Figures”. For bilingual evaluation, English captions (text part) were additionally translated into Korean to form (text caption, figure image) pairs for 10k Korean + 10k English queries.
KoViDoRe (v1)
KoViDoRe is a Korean Visual Document Retrieval benchmark designed to evaluate retrieval systems on real-world Korean document images across multiple settings and corpora.
EnViDoRe (v1, v2)
ViDoRe (Visual Document Retrieval benchmark) is a benchmark suite for evaluating document retrieval systems on visually rich documents across tasks, domains, and languages.
Direct Use (Text to Image Retrieval)
import os
from PIL import Image
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer, util
model_name = 'telepix/PIXIE-Glyph-v1.0'
image_dir = 'images'
os.makedirs(image_dir, exist_ok=True)
truncate_dim = 1024
model = SentenceTransformer(
model_name,
trust_remote_code=True,
truncate_dim=truncate_dim,
)
target_filenames = [
'attention_visualizations.png',
'scaled_dot_product_attention.png',
'the_transformer_model_architecture.png',
'variations_on_the_transformer.png',
'beautiful_sunset.png',
]
loaded_images = []
valid_image_paths = []
for filename in target_filenames:
repo_path = f"{image_dir}/{filename}"
local_path = os.path.join(image_dir, filename)
try:
hf_hub_download(
repo_id=model_name,
filename=repo_path,
local_dir=".",
local_dir_use_symlinks=False
)
img = Image.open(local_path).convert("RGB")
loaded_images.append(img)
valid_image_paths.append(local_path)
except Exception as e:
print(f"처리 실패 ({filename}): {e}")
queries = [
'트랜스포머 전체 아키텍처 구조 찾아줘',
'attention이 어떻게 이뤄지는지 예시 자료 있나?',
'how performance changes depending on the model parameters.',
]
image_embeddings = model.encode(loaded_images, normalize_embeddings=True)
query_embeddings = model.encode(queries, prompt_name='retrieval.query', normalize_embeddings=True)
results = util.cos_sim(query_embeddings, image_embeddings)
print("\n" + "="*50)
for i, query in enumerate(queries):
print(f"Query {i+1}: '{query}'")
k = min(5, len(valid_image_paths))
if k == 0:
print(" - 검색할 이미지가 없습니다.")
continue
scores, indices = results[i].topk(k=k)
for score, idx in zip(scores, indices):
print(f" - [Score: {score:.4f}] {valid_image_paths[idx]}")
print("-" * 50)
License
The PIXIE-Glyph-v1.0 model is licensed under CC BY-NC 4.0.
Citation
@misc{TelePIX-PIXIE-Glyph-v1.0,
title={PIXIE-Glyph-v1.0},
author={TelePIX AI Research Team and Bongmin Kim},
year={2026},
url={https://huggingface.co/telepix/PIXIE-Glyph-v1.0}
}
Contact
If you have any suggestions or questions about the PIXIE, please reach out to the authors at bmkim@telepix.net.
- Downloads last month
- 10