Add pipeline tag and library name metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +13 -9
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  language:
4
  - en
5
  - zh
6
- base_model:
7
- - Qwen/Qwen3-Embedding-8B
8
  tags:
9
  - embedding
10
  - retriever
@@ -16,7 +18,7 @@ tags:
16
  [![Paper](https://img.shields.io/badge/Paper-arXiv%3A2512.17220-red)](https://arxiv.org/pdf/2512.17220)
17
  [![Model](https://img.shields.io/badge/HuggingFace-MiA--Emb--8B-yellow)](https://huggingface.co/MindscapeRAG/MiA-Emb-8B)
18
 
19
- This repository provides the inference implementation for **MiA-Emb (Mindscape-Aware Embedding)**, the retriever component in the **MiA-RAG** framework.
20
 
21
  **MiA-RAG** introduces explicit **global context awareness** via a **Mindscape**—a document-level semantic scaffold constructed by **hierarchical summarization**. By conditioning **both retrieval and generation** on the same Mindscape, MiA-RAG enables globally grounded retrieval and more coherent long-context reasoning.
22
 
@@ -56,7 +58,7 @@ pip install torch transformers>=4.53.0
56
 
57
  ### 1) Initialization
58
 
59
- > MiA-Emb-8B is initialized from **`Qwen3-Embedding-8B`**.
60
 
61
  ```python
62
  import torch
@@ -99,13 +101,17 @@ Use this mode to retrieve narrative text chunks. A **Global Summary** is injecte
99
  def get_query_prompt(query, summary="", residual=False):
100
  """Construct input prompt with global summary (Eq. 5 in paper)."""
101
  task_desc = "Given a search query with the book's summary, retrieve relevant chunks or helpful entities summaries from the given context that answer the query"
102
- summary_prefix = "\n\nHere is the summary providing possibly useful global information. Please encode the query based on the summary:\n"
 
 
 
103
 
104
  # Insert PAD token to capture residual embedding before the summary
105
  middle_token = tokenizer.pad_token if residual else ""
106
 
107
  return (
108
- f"Instruct: {task_desc}\n"
 
109
  f"Query: {query}{middle_token}{summary_prefix}{summary}{node_delimiter}"
110
  )
111
 
@@ -210,8 +216,6 @@ print(f"Node Similarity: {final_score.item():.4f}")
210
 
211
  ## 📜 Citation
212
 
213
-
214
-
215
  If you find this work useful, please cite:
216
 
217
  ```bibtex
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-Embedding-8B
4
+ library_name: transformers
5
+ pipeline_tag: feature-extraction
6
  language:
7
  - en
8
  - zh
9
+ license: apache-2.0
 
10
  tags:
11
  - embedding
12
  - retriever
 
18
  [![Paper](https://img.shields.io/badge/Paper-arXiv%3A2512.17220-red)](https://arxiv.org/pdf/2512.17220)
19
  [![Model](https://img.shields.io/badge/HuggingFace-MiA--Emb--8B-yellow)](https://huggingface.co/MindscapeRAG/MiA-Emb-8B)
20
 
21
+ This repository provides the inference implementation for **MiA-Emb (Mindscape-Aware Embedding)**, the retriever component in the **MiA-RAG** framework, as presented in the paper [Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding](https://huggingface.co/papers/2512.17220).
22
 
23
  **MiA-RAG** introduces explicit **global context awareness** via a **Mindscape**—a document-level semantic scaffold constructed by **hierarchical summarization**. By conditioning **both retrieval and generation** on the same Mindscape, MiA-RAG enables globally grounded retrieval and more coherent long-context reasoning.
24
 
 
58
 
59
  ### 1) Initialization
60
 
61
+ > MiA-Emb-8B is a LoRA adapter initialized from **`Qwen/Qwen3-Embedding-8B`**.
62
 
63
  ```python
64
  import torch
 
101
  def get_query_prompt(query, summary="", residual=False):
102
  """Construct input prompt with global summary (Eq. 5 in paper)."""
103
  task_desc = "Given a search query with the book's summary, retrieve relevant chunks or helpful entities summaries from the given context that answer the query"
104
+ summary_prefix = "
105
+
106
+ Here is the summary providing possibly useful global information. Please encode the query based on the summary:
107
+ "
108
 
109
  # Insert PAD token to capture residual embedding before the summary
110
  middle_token = tokenizer.pad_token if residual else ""
111
 
112
  return (
113
+ f"Instruct: {task_desc}
114
+ "
115
  f"Query: {query}{middle_token}{summary_prefix}{summary}{node_delimiter}"
116
  )
117
 
 
216
 
217
  ## 📜 Citation
218
 
 
 
219
  If you find this work useful, please cite:
220
 
221
  ```bibtex