|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- Qwen/Qwen3-Embedding-8B |
|
|
tags: |
|
|
- embedding |
|
|
- retriever |
|
|
- RAG |
|
|
--- |
|
|
|
|
|
# Mindscape-Aware RAG (MiA-RAG) |
|
|
|
|
|
[](https://arxiv.org/pdf/2512.17220) |
|
|
[](https://huggingface.co/MindscapeRAG/MiA-Emb-8B) |
|
|
|
|
|
This repository provides the inference implementation for **MiA-Emb (Mindscape-Aware Embedding)**, the retriever component in the **MiA-RAG** framework. |
|
|
|
|
|
**MiA-RAG** introduces explicit **global context awareness** via a **Mindscape**—a document-level semantic scaffold constructed by **hierarchical summarization**. By conditioning **both retrieval and generation** on the same Mindscape, MiA-RAG enables globally grounded retrieval and more coherent long-context reasoning. |
|
|
|
|
|
--- |
|
|
|
|
|
## ✨ Key Features |
|
|
|
|
|
- **Mindscape as Global Semantic Scaffold** |
|
|
Builds a Mindscape through **hierarchical bottom-up summarization** (chunk summaries → global summary) and uses it as persistent global memory. |
|
|
|
|
|
- **Mindscape-Aware Capabilities** |
|
|
Supports the three core benefits for long-context understanding: |
|
|
- **Enriched Understanding**: fill in missing context and resolve underspecified meanings |
|
|
- **Selective Retrieval**: bias retrieval toward the active topic’s semantic frame |
|
|
- **Integrative Reasoning**: interpret retrieved evidence within a coherent global context |
|
|
|
|
|
- **Dual-Granularity Retrieval** |
|
|
- **Chunk Retrieval** for narrative passages (standard RAG) |
|
|
- **Node Retrieval** for knowledge graph entities (GraphRAG-style) |
|
|
|
|
|
- **State-of-the-Art Retrieval Performance** |
|
|
Strong results on long-context benchmarks such as NarrativeQA and DetectiveQA, outperforming strong baselines including Qwen3-Embedding and [SitEmb](https://huggingface.co/SituatedEmbedding/SitEmb-v1.5-Qwen3). |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## 🚀 Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install torch transformers>=4.53.0 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### 1) Initialization |
|
|
|
|
|
> MiA-Emb-8B is initialized from **`Qwen3-Embedding-8B`**. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import torch.nn.functional as F |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
|
|
# Configuration |
|
|
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
# Inference Parameters |
|
|
residual = True # Enable residual connection logic |
|
|
residual_factor = 0.5 # Balance between local and global |
|
|
node_delimiter = "<|repo_name|>" # Special token for Node tasks |
|
|
|
|
|
# Load Tokenizer (base) |
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"Qwen/Qwen3-Embedding-8B", |
|
|
trust_remote_code=True, |
|
|
padding_side="left" |
|
|
) |
|
|
|
|
|
# Load Model |
|
|
model = AutoModel.from_pretrained( |
|
|
"MindscapeRAG/MiA-Emb-8B", |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.bfloat16, |
|
|
attn_implementation="flash_attention_2", |
|
|
device_map={"": 0} |
|
|
) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### 2) Chunk Retrieval |
|
|
|
|
|
Use this mode to retrieve narrative text chunks. A **Global Summary** is injected into the prompt as the “Mindscape”. |
|
|
|
|
|
```python |
|
|
def get_query_prompt(query, summary="", residual=False): |
|
|
"""Construct input prompt with global summary (Eq. 5 in paper).""" |
|
|
task_desc = "Given a search query with the book's summary, retrieve relevant chunks or helpful entities summaries from the given context that answer the query" |
|
|
summary_prefix = "\n\nHere is the summary providing possibly useful global information. Please encode the query based on the summary:\n" |
|
|
|
|
|
# Insert PAD token to capture residual embedding before the summary |
|
|
middle_token = tokenizer.pad_token if residual else "" |
|
|
|
|
|
return ( |
|
|
f"Instruct: {task_desc}\n" |
|
|
f"Query: {query}{middle_token}{summary_prefix}{summary}{node_delimiter}" |
|
|
) |
|
|
|
|
|
def encode_chunk(texts, is_query=False, residual=False): |
|
|
batch = tokenizer( |
|
|
texts, |
|
|
max_length=4096, |
|
|
padding=True, |
|
|
truncation=True, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
outputs = model(**batch) |
|
|
|
|
|
# 1) Main Embedding (Last Token) |
|
|
emb_main = last_token_pool(outputs.last_hidden_state, batch["attention_mask"]) |
|
|
|
|
|
# 2) Residual Embedding (PAD Token) |
|
|
emb_res = None |
|
|
if residual and is_query: |
|
|
emb_res = extract_residual_token(outputs, batch, tokenizer.pad_token_id) |
|
|
|
|
|
emb_main = F.normalize(emb_main, p=2, dim=-1) |
|
|
emb_res = F.normalize(emb_res, p=2, dim=-1) if emb_res is not None else None |
|
|
return emb_main, emb_res |
|
|
|
|
|
|
|
|
# --- Example --- |
|
|
query = "Who is the protagonist?" |
|
|
global_summ = "A summary of the entire book..." |
|
|
chunk = "Harry looked at the scar on his forehead." |
|
|
|
|
|
# Encode |
|
|
q_emb, q_res = encode_chunk( |
|
|
[get_query_prompt(query, global_summ, residual=True)], |
|
|
is_query=True, |
|
|
residual=True |
|
|
) |
|
|
c_emb, _ = encode_chunk([chunk], is_query=False) |
|
|
|
|
|
# Score Fusion |
|
|
score = q_emb @ c_emb.T |
|
|
if q_res is not None: |
|
|
score = (1 - residual_factor) * score + residual_factor * (q_res @ c_emb.T) |
|
|
|
|
|
print(f"Chunk Similarity: {score.item():.4f}") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### 3) Node Retrieval |
|
|
|
|
|
MiA-Emb can retrieve knowledge graph entities (**Nodes**). This mode extracts embeddings from the `<|repo_name|>` token position. |
|
|
|
|
|
**Candidate format:** |
|
|
`Entity Name : Entity Description` |
|
|
|
|
|
Example: |
|
|
`Mary Campbell Smith : Mary Campbell Smith is mentioned as the translator...` |
|
|
|
|
|
```python |
|
|
def encode_node_query(texts, residual=True, node_delimiter="<|repo_name|>"): |
|
|
batch = tokenizer(texts, padding=True, return_tensors="pt").to(model.device) |
|
|
outputs = model(**batch) |
|
|
|
|
|
# 1) Node Main Embedding: extract from <|repo_name|> position |
|
|
node_id = tokenizer.encode(node_delimiter, add_special_tokens=False)[0] |
|
|
q_emb_node = extract_specific_token(outputs, batch, node_id) |
|
|
|
|
|
# 2) Residual Embedding: extract from [PAD] position |
|
|
q_emb_res = extract_residual_token(outputs, batch, tokenizer.pad_token_id) if residual else None |
|
|
|
|
|
q_emb_node = F.normalize(q_emb_node, p=2, dim=-1) |
|
|
q_emb_res = F.normalize(q_emb_res, p=2, dim=-1) if q_emb_res is not None else None |
|
|
return q_emb_node, q_emb_res |
|
|
|
|
|
|
|
|
# --- Example --- |
|
|
query = "Who is the protagonist?" |
|
|
global_summ = "A summary of the entire book..." |
|
|
|
|
|
# 1) Encode Query (Node Token) |
|
|
q_emb_node, q_emb_res = encode_node_query( |
|
|
[get_query_prompt(query, global_summ, residual=True)], |
|
|
residual=True |
|
|
) |
|
|
|
|
|
# 2) Encode Entity Candidate |
|
|
entity_text = "Harry Potter : The main protagonist of the series..." |
|
|
n_emb, _ = encode_chunk([entity_text], is_query=False) |
|
|
|
|
|
# 3) Score Fusion |
|
|
final_score = (1 - residual_factor) * (q_emb_node @ n_emb.T) |
|
|
if q_emb_res is not None: |
|
|
final_score = final_score + residual_factor * (q_emb_res @ n_emb.T) |
|
|
|
|
|
print(f"Node Similarity: {final_score.item():.4f}") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## 📜 Citation |
|
|
|
|
|
|
|
|
|
|
|
If you find this work useful, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{li2025mindscapeawareretrievalaugmentedgeneration, |
|
|
title={Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding}, |
|
|
author={Yuqing Li and Jiangnan Li and Zheng Lin and Ziyan Zhou and Junjie Wu and Weiping Wang and Jie Zhou and Mo Yu}, |
|
|
year={2025}, |
|
|
eprint={2512.17220}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2512.17220}, |
|
|
} |
|
|
``` |