ikuyamada commited on
Commit
48208bb
·
verified ·
1 Parent(s): 0d770d3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +14 -15
README.md CHANGED
@@ -7,10 +7,10 @@ language:
7
  license: apache-2.0
8
  library_name: transformers
9
  base_model:
10
- - Shitao/RetroMAE
11
  model_index:
12
  - name: kpr-retromae
13
- results:
14
  ---
15
 
16
  # Knowledgeable Embedding: kpr-retromae
@@ -21,7 +21,7 @@ model_index:
21
 
22
  A key limitation of large language models (LLMs) is their inability to capture less-frequent or up-to-date entity knowledge, often leading to factual inaccuracies and hallucinations. Retrieval-augmented generation (RAG), which incorporates external knowledge through retrieval, is a common approach to mitigate this issue.
23
 
24
- Although RAG typically relies on embedding-based retrieval, the embedding models themselves are also based on language models and therefore struggle with queries involving less-frequent entities ([Sciavolino et al., 2021](https://arxiv.org/abs/2109.08535)), often failing to retrieve the crucial knowledge needed to overcome this limitation.
25
 
26
  **Knowledgeable Embedding** enhances performance on such queries by injecting real-world entity knowledge into embeddings, making them more *knowledgeable*.
27
 
@@ -39,7 +39,7 @@ For further details, refer to [our paper](https://arxiv.org/abs/2507.03922) or [
39
  | [knowledgeable-ai/kpr-bge-base-en-v1.5](https://huggingface.co/knowledgeable-ai/kpr-bge-base-en-v1.5) | 112M | [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) |
40
  | [knowledgeable-ai/kpr-bge-large-en-v1.5](https://huggingface.co/knowledgeable-ai/kpr-bge-large-en-v1.5) | 340M | [bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) |
41
 
42
- For practical use, we recommend `knowledgeable-ai/kpr-bge-*`, which significantly outperforms state-of-the-art models on queries involving less-frequent entities while performing comparably on other queries, as reported in [our paper](https://arxiv.org/abs/2507.03922).
43
 
44
  Regarding the model size, we do not count the entity embeddings since they are stored in CPU memory and have a negligible impact on runtime performance. See [this page](https://github.com/knowledgeable-embedding/knowledgeable-embedding/wiki/Internals-of-Knowledgeable-Embedding) for details.
45
 
@@ -50,7 +50,7 @@ Regarding the model size, we do not count the entity embeddings since they are s
50
  - Maximum Sequence Length: 512
51
  - Embedding Dimension: 768
52
 
53
- ## Usage
54
 
55
  This model can be used via [Hugging Face Transformers](https://github.com/huggingface/transformers) or [Sentence Transformers](https://github.com/UKPLab/sentence-transformers):
56
 
@@ -63,8 +63,8 @@ import torch
63
  MODEL_NAME_OR_PATH = "knowledgeable-ai/kpr-retromae"
64
 
65
  input_texts = [
66
- "Who founded Dominican Liberation Party?",
67
- "Who owns Mompesson House?"
68
  ]
69
 
70
  # Load model and tokenizer from the Hugging Face Hub
@@ -89,8 +89,8 @@ from sentence_transformers import SentenceTransformer
89
  MODEL_NAME_OR_PATH = "knowledgeable-ai/kpr-retromae"
90
 
91
  input_texts = [
92
- "Who founded Dominican Liberation Party?",
93
- "Who owns Mompesson House?"
94
  ]
95
 
96
  # Load model from the Hugging Face Hub
@@ -115,14 +115,13 @@ This model is licensed under the Apache License, Version 2.0.
115
  ## Citation
116
 
117
  If you use this model in your research, please cite the following paper:
118
-
119
  [Dynamic Injection of Entity Knowledge into Dense Retrievers](https://arxiv.org/abs/2507.03922)
120
 
121
  ```bibtex
122
  @article{yamada2025kpr,
123
- title={Dynamic Injection of Entity Knowledge into Dense Retrievers},
124
- author={Ikuya Yamada and Ryokan Ri and Takeshi Kojima and Yusuke Iwasawa and Yutaka Matsuo},
125
- journal={arXiv preprint arXiv:2507.03922},
126
- year={2025}
127
  }
128
- ```
 
7
  license: apache-2.0
8
  library_name: transformers
9
  base_model:
10
+ - RetroMAE
11
  model_index:
12
  - name: kpr-retromae
13
+ results:
14
  ---
15
 
16
  # Knowledgeable Embedding: kpr-retromae
 
21
 
22
  A key limitation of large language models (LLMs) is their inability to capture less-frequent or up-to-date entity knowledge, often leading to factual inaccuracies and hallucinations. Retrieval-augmented generation (RAG), which incorporates external knowledge through retrieval, is a common approach to mitigate this issue.
23
 
24
+ Although RAG typically relies on embedding-based retrieval, the embedding models themselves are also based on language models and therefore struggle with queries involving less-frequent entities, often failing to retrieve the crucial knowledge needed to overcome this limitation.
25
 
26
  **Knowledgeable Embedding** enhances performance on such queries by injecting real-world entity knowledge into embeddings, making them more *knowledgeable*.
27
 
 
39
  | [knowledgeable-ai/kpr-bge-base-en-v1.5](https://huggingface.co/knowledgeable-ai/kpr-bge-base-en-v1.5) | 112M | [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) |
40
  | [knowledgeable-ai/kpr-bge-large-en-v1.5](https://huggingface.co/knowledgeable-ai/kpr-bge-large-en-v1.5) | 340M | [bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) |
41
 
42
+ For practical use, we recommend `knowledgeable-ai/kpr-bge-en-*`, which significantly outperforms state-of-the-art models on queries involving less-frequent entities while performing comparably on other queries, as reported in [our paper](https://arxiv.org/abs/2507.03922).
43
 
44
  Regarding the model size, we do not count the entity embeddings since they are stored in CPU memory and have a negligible impact on runtime performance. See [this page](https://github.com/knowledgeable-embedding/knowledgeable-embedding/wiki/Internals-of-Knowledgeable-Embedding) for details.
45
 
 
50
  - Maximum Sequence Length: 512
51
  - Embedding Dimension: 768
52
 
53
+ ## How to use
54
 
55
  This model can be used via [Hugging Face Transformers](https://github.com/huggingface/transformers) or [Sentence Transformers](https://github.com/UKPLab/sentence-transformers):
56
 
 
63
  MODEL_NAME_OR_PATH = "knowledgeable-ai/kpr-retromae"
64
 
65
  input_texts = [
66
+ "Who founded Dominican Liberation Party?",
67
+ "Who owns Mompesson House?"
68
  ]
69
 
70
  # Load model and tokenizer from the Hugging Face Hub
 
89
  MODEL_NAME_OR_PATH = "knowledgeable-ai/kpr-retromae"
90
 
91
  input_texts = [
92
+ "Who founded Dominican Liberation Party?",
93
+ "Who owns Mompesson House?"
94
  ]
95
 
96
  # Load model from the Hugging Face Hub
 
115
  ## Citation
116
 
117
  If you use this model in your research, please cite the following paper:
 
118
  [Dynamic Injection of Entity Knowledge into Dense Retrievers](https://arxiv.org/abs/2507.03922)
119
 
120
  ```bibtex
121
  @article{yamada2025kpr,
122
+ title={Dynamic Injection of Entity Knowledge into Dense Retrievers},
123
+ author={Ikuya Yamada and Ryokan Ri and Takeshi Kojima and Yusuke Iwasawa and Yutaka Matsuo},
124
+ journal={arXiv preprint arXiv:2507.03922},
125
+ year={2025}
126
  }
127
+ ```