Ziv Embedder โ Code Aware (ONNX)
This is an ONNX export of Alibaba-NLP/gte-modernbert-base, prepared for Ziv โ a local semantic code search engine for Python repositories.
Ziv uses this embedder to improve code-aware search quality in version 0.4.0. The model is optimized for local inference with onnxruntime, making it lightweight, fast, and practical for offline developer workflows.
Why this model?
Ziv needs embeddings that work well for code search and code understanding while staying fully local. This model is designed to support that goal with:
- Code-aware semantic search
- Fast local inference
- No cloud dependency
- No API keys
- ONNX runtime compatibility
Compared to a standard Python-based embedding stack, this setup is easier to ship and more efficient to run inside a local developer tool.
Usage with Ziv
ziv init --model code
ziv start
Model details
| Property | Value |
|---|---|
| Base model | Alibaba-NLP/gte-modernbert-base |
| Model type | Text embedding |
| Embedding dimension | 768 |
| Max sequence length | 8192 |
| Runtime | onnxruntime |
| Primary use | Semantic code search / code understanding |
Files
| File | Description |
|---|---|
model.onnx |
ONNX model weights and graph |
tokenizer.json |
Tokenizer vocabulary and rules |
tokenizer_config.json |
Tokenizer settings |
config.json |
Model architecture config |
1_Pooling/config.json |
Pooling configuration |
Relation to the original model
This model is based on Alibaba-NLP/gte-modernbert-base, developed by Tongyi Lab, Alibaba Group.
This repository does not claim ownership of the original model weights or training recipe. It provides an ONNX-exported runtime version tailored for Ziv and local inference.
The original model and its concepts should be credited to:
- Tongyi Lab, Alibaba Group
- The gte-modernbert model authors
- The broader Sentence Transformers ecosystem
License
This model is released under the Apache 2.0 License, consistent with the upstream model license.
Original model: Alibaba-NLP/gte-modernbert-base
Citation
If you use this model or the upstream base model in your work, please cite the original paper:
@inproceedings{zhang2024mgte,
title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
pages={1393--1412},
year={2024}
}
@article{li2023towards,
title={Towards general text embeddings with multi-stage contrastive learning},
author={Li, Zehan and Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan},
journal={arXiv preprint arXiv:2308.03281},
year={2023}
}
- Downloads last month
- 43
Model tree for ziv-ai/embedder-code-onnx
Base model
answerdotai/ModernBERT-base