|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
|
|
|
FinE5: Finance-Adapted Text Embedding Model |
|
|
This financial embedding model is fine-tuned on a synthesized finance corpus, following the training pipeline of e5-mistral-7b-instruct (Wang et al., 2023). It ranks top on FinMTEB (Feb 16, 2025), with no overlap between training data and benchmark test set. |
|
|
|
|
|
The training data and pipeline are detailed in the paper. |
|
|
|
|
|
This is the tokenizer of FinE5. |
|
|
|
|
|
--- |
|
|
# Citation |
|
|
|
|
|
If you find our work helpful, please cite: |
|
|
``` |
|
|
@misc{tang2025finmtebfinancemassivetext, |
|
|
title={FinMTEB: Finance Massive Text Embedding Benchmark}, |
|
|
author={Yixuan Tang and Yi Yang}, |
|
|
year={2025}, |
|
|
eprint={2502.10990}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2502.10990}, |
|
|
} |
|
|
|
|
|
@misc{tang2024needdomainspecificembeddingmodels, |
|
|
title={Do We Need Domain-Specific Embedding Models? An Empirical Investigation}, |
|
|
author={Yixuan Tang and Yi Yang}, |
|
|
year={2024}, |
|
|
eprint={2409.18511}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2409.18511}, |
|
|
} |
|
|
``` |
|
|
|
|
|
Code for FinMTEB: https://github.com/yixuantt/FinMTEB |
|
|
-------- |
|
|
Thanks to the [MTEB](https://github.com/embeddings-benchmark/mteb) Benchmark. |
|
|
* This model should not be used for any commercial purpose. Refer the [license](https://spdx.org/licenses/CC-BY-NC-4.0) for the detailed terms. |