FinanceMTEB
/

Fin-e5-tokenizer

Model card Files Files and versions

Fin-e5-tokenizer / README.md

yixuantt's picture

Update README.md

179cbec verified 11 months ago

|

history blame contribute delete

1.42 kB

	---
	library_name: transformers
	tags: []
	---

	FinE5: Finance-Adapted Text Embedding Model
	This financial embedding model is fine-tuned on a synthesized finance corpus, following the training pipeline of e5-mistral-7b-instruct (Wang et al., 2023). It ranks top on FinMTEB (Feb 16, 2025), with no overlap between training data and benchmark test set.

	The training data and pipeline are detailed in the paper.

	This is the tokenizer of FinE5.

	---
	# Citation

	If you find our work helpful, please cite:
	```
	@misc{tang2025finmtebfinancemassivetext,
	title={FinMTEB: Finance Massive Text Embedding Benchmark},
	author={Yixuan Tang and Yi Yang},
	year={2025},
	eprint={2502.10990},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.10990},
	}

	@misc{tang2024needdomainspecificembeddingmodels,
	title={Do We Need Domain-Specific Embedding Models? An Empirical Investigation},
	author={Yixuan Tang and Yi Yang},
	year={2024},
	eprint={2409.18511},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2409.18511},
	}
	```

	Code for FinMTEB: https://github.com/yixuantt/FinMTEB
	--------
	Thanks to the [MTEB](https://github.com/embeddings-benchmark/mteb) Benchmark.
	* This model should not be used for any commercial purpose. Refer the [license](https://spdx.org/licenses/CC-BY-NC-4.0) for the detailed terms.