|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- sentence-similarity |
|
|
- transformers |
|
|
--- |
|
|
|
|
|
## LGAI-Embedding-Preview |
|
|
|
|
|
we have trained the **LGAI-Embedding-Preview** model based on the [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) LLM model. |
|
|
|
|
|
The initial goal is to reproduce the baseline model and check the workflow for uploading results: |
|
|
- [x] Checkpoint |
|
|
- [x] technical report |
|
|
|
|
|
|
|
|
## MTEB |
|
|
Inference is performed with in-context examples for MTEB evaluation. |
|
|
|
|
|
|
|
|
## Model Information |
|
|
- Model Size: 7B |
|
|
- Embedding Dimension: 4096 |
|
|
- Max Input Tokens: 32k |
|
|
|
|
|
|
|
|
## Requirements |
|
|
``` |
|
|
transformers>=4.48.3 |
|
|
``` |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find this repository useful, please consider citing it. |
|
|
|
|
|
|
|
|
``` |
|
|
@misc{choi2025lgaiembeddingpreviewtechnicalreport, |
|
|
title={LGAI-EMBEDDING-Preview Technical Report}, |
|
|
author={Jooyoung Choi and Hyun Kim and Hansol Jang and Changwook Jun and Kyunghoon Bae and Hyewon Choi and Stanley Jungkyu Choi and Honglak Lee and Chulmin Yun}, |
|
|
year={2025}, |
|
|
eprint={2506.07438}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2506.07438}, |
|
|
} |
|
|
``` |