openbmb
/

MiniCPM-Embedding-Light

Feature Extraction

sentence-transformers

Eval Results (legacy)

Model card Files Files and versions

Kaguya-19 commited on Jan 24, 2025

Commit

91b71e3

·

1 Parent(s): 321b9bb

citation

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -11989,7 +11989,7 @@ MiniCPM-Embedding-Light结构上采取双向注意力和 Weighted Mean Pooling [
 - Outstanding cross-lingual retrieval capabilities between Chinese and English.
 - Long-text support (up to 8192 tokens).
 - Dense vectors and token-level sparse vectors.
-- Variable dense vector dimensions (Matryoshka representation).
 MiniCPM-Embedding-Light incorporates bidirectional attention and Weighted Mean Pooling [1] in its architecture. The model underwent multi-stage training using approximately 260 million training examples, including open-source, synthetic, and proprietary data.
@@ -12000,6 +12000,7 @@ We also invite you to explore the UltraRAG series:
 - Domain Adaptive RAG Framework: [UltraRAG](https://github.com/openbmb/UltraRAG)
 [1] Muennighoff, N. (2022). Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904.
 ## 模型信息 Model Information

 - Outstanding cross-lingual retrieval capabilities between Chinese and English.
 - Long-text support (up to 8192 tokens).
 - Dense vectors and token-level sparse vectors.
+- Variable dense vector dimensions (Matryoshka representation [2]).
 MiniCPM-Embedding-Light incorporates bidirectional attention and Weighted Mean Pooling [1] in its architecture. The model underwent multi-stage training using approximately 260 million training examples, including open-source, synthetic, and proprietary data.
 - Domain Adaptive RAG Framework: [UltraRAG](https://github.com/openbmb/UltraRAG)
 [1] Muennighoff, N. (2022). Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904.
+[2] Kusupati, Aditya, et al. "Matryoshka representation learning." Advances in Neural Information Processing Systems 35 (2022): 30233-30249.
 ## 模型信息 Model Information