--- license: apache-2.0 language: - en - zh base_model: - Qwen/Qwen3-Embedding-0.6B tags: - embedding - retriever - RAG --- # Mindscape-Aware RAG (MiA-RAG) [![Paper](https://img.shields.io/badge/Paper-arXiv%3A2512.17220-red)](https://arxiv.org/pdf/2512.17220) [![Model](https://img.shields.io/badge/HuggingFace-MiA--Emb--0.6B-yellow)](https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B) This repository provides the inference implementation for **MiA-Emb (Mindscape-Aware Embedding)**, the retriever component in the **MiA-RAG** framework. **MiA-RAG** introduces explicit **global context awareness** via a **Mindscape**—a document-level semantic scaffold constructed by **hierarchical summarization**. By conditioning **both retrieval and generation** on the same Mindscape, MiA-RAG enables globally grounded retrieval and more coherent long-context reasoning. --- ## ✨ Key Features - **Mindscape as Global Semantic Scaffold** Builds a Mindscape through **hierarchical bottom-up summarization** (chunk summaries → global summary) and uses it as persistent global memory. - **Mindscape-Aware Capabilities** Supports the three core benefits for long-context understanding: - **Enriched Understanding**: fill in missing context and resolve underspecified meanings - **Selective Retrieval**: bias retrieval toward the active topic’s semantic frame - **Integrative Reasoning**: interpret retrieved evidence within a coherent global context - **Dual-Granularity Retrieval** - **Chunk Retrieval** for narrative passages (standard RAG) - **Node Retrieval** for knowledge graph entities (GraphRAG-style) - **State-of-the-Art Retrieval Performance** Strong results on long-context benchmarks such as NarrativeQA and DetectiveQA, outperforming strong baselines including Qwen3-Embedding and [SitEmb](https://huggingface.co/SituatedEmbedding/SitEmb-v1.5-Qwen3). --- ## 🚀 Usage ### Installation ```bash pip install torch transformers>=4.53.0 ``` --- ### 1) Initialization > MiA-Emb-0.6B is initialized from **`Qwen3-Embedding-0.6B`**. ```python import torch import torch.nn.functional as F from transformers import AutoTokenizer, AutoModel # Configuration device = "cuda" if torch.cuda.is_available() else "cpu" # Inference Parameters residual = True # Enable residual connection logic residual_factor = 0.5 # Balance between local and global node_delimiter = "<|repo_name|>" # Special token for Node tasks # Load Tokenizer (base) tokenizer = AutoTokenizer.from_pretrained( "Qwen/Qwen3-Embedding-0.6B", trust_remote_code=True, padding_side="left" ) # Load Model model = AutoModel.from_pretrained( "MindscapeRAG/MiA-Emb-0.6B", trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map={"": 0} ) ``` --- ### 2) Chunk Retrieval Use this mode to retrieve narrative text chunks. A **Global Summary** is injected into the prompt as the “Mindscape”. ```python def get_query_prompt(query, summary="", residual=False): """Construct input prompt with global summary (Eq. 5 in paper).""" task_desc = "Given a search query with the book's summary, retrieve relevant chunks or helpful entities summaries from the given context that answer the query" summary_prefix = "\n\nHere is the summary providing possibly useful global information. Please encode the query based on the summary:\n" # Insert PAD token to capture residual embedding before the summary middle_token = tokenizer.pad_token if residual else "" return ( f"Instruct: {task_desc}\n" f"Query: {query}{middle_token}{summary_prefix}{summary}{node_delimiter}" ) def encode_chunk(texts, is_query=False, residual=False): batch = tokenizer( texts, max_length=4096, padding=True, truncation=True, return_tensors="pt" ).to(model.device) outputs = model(**batch) # 1) Main Embedding (Last Token) emb_main = last_token_pool(outputs.last_hidden_state, batch["attention_mask"]) # 2) Residual Embedding (PAD Token) emb_res = None if residual and is_query: emb_res = extract_residual_token(outputs, batch, tokenizer.pad_token_id) emb_main = F.normalize(emb_main, p=2, dim=-1) emb_res = F.normalize(emb_res, p=2, dim=-1) if emb_res is not None else None return emb_main, emb_res # --- Example --- query = "Who is the protagonist?" global_summ = "A summary of the entire book..." chunk = "Harry looked at the scar on his forehead." # Encode q_emb, q_res = encode_chunk( [get_query_prompt(query, global_summ, residual=True)], is_query=True, residual=True ) c_emb, _ = encode_chunk([chunk], is_query=False) # Score Fusion score = q_emb @ c_emb.T if q_res is not None: score = (1 - residual_factor) * score + residual_factor * (q_res @ c_emb.T) print(f"Chunk Similarity: {score.item():.4f}") ``` --- ### 3) Node Retrieval MiA-Emb can retrieve knowledge graph entities (**Nodes**). This mode extracts embeddings from the `<|repo_name|>` token position. **Candidate format:** `Entity Name : Entity Description` Example: `Mary Campbell Smith : Mary Campbell Smith is mentioned as the translator...` ```python def encode_node_query(texts, residual=True, node_delimiter="<|repo_name|>"): batch = tokenizer(texts, padding=True, return_tensors="pt").to(model.device) outputs = model(**batch) # 1) Node Main Embedding: extract from <|repo_name|> position node_id = tokenizer.encode(node_delimiter, add_special_tokens=False)[0] q_emb_node = extract_specific_token(outputs, batch, node_id) # 2) Residual Embedding: extract from [PAD] position q_emb_res = extract_residual_token(outputs, batch, tokenizer.pad_token_id) if residual else None q_emb_node = F.normalize(q_emb_node, p=2, dim=-1) q_emb_res = F.normalize(q_emb_res, p=2, dim=-1) if q_emb_res is not None else None return q_emb_node, q_emb_res # --- Example --- query = "Who is the protagonist?" global_summ = "A summary of the entire book..." # 1) Encode Query (Node Token) q_emb_node, q_emb_res = encode_node_query( [get_query_prompt(query, global_summ, residual=True)], residual=True ) # 2) Encode Entity Candidate entity_text = "Harry Potter : The main protagonist of the series..." n_emb, _ = encode_chunk([entity_text], is_query=False) # 3) Score Fusion final_score = (1 - residual_factor) * (q_emb_node @ n_emb.T) if q_emb_res is not None: final_score = final_score + residual_factor * (q_emb_res @ n_emb.T) print(f"Node Similarity: {final_score.item():.4f}") ``` --- ## 📜 Citation If you find this work useful, please cite: ```bibtex @misc{li2025mindscapeawareretrievalaugmentedgeneration, title={Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding}, author={Yuqing Li and Jiangnan Li and Zheng Lin and Ziyan Zhou and Junjie Wu and Weiping Wang and Jie Zhou and Mo Yu}, year={2025}, eprint={2512.17220}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.17220}, } ```