| | --- |
| | language: en |
| | tags: |
| | - rag |
| | - context-compression |
| | - gemma |
| | license: apache-2.0 |
| | datasets: |
| | - hotpotqa |
| | base_model: |
| | - google/gemma-2b-it |
| | --- |
| | |
| | # EXIT: Context-Aware Extractive Compression for RAG |
| |
|
| | EXIT is a context-aware extractive compression model that improves the efficiency and effectiveness of Retrieval-Augmented Generation (RAG) by intelligently selecting relevant sentences while preserving contextual dependencies. |
| |
|
| | [[Paper]](https://arxiv.org/abs/2412.12559) [[GitHub]](https://github.com/ThisIsHwang/EXIT) |
| |
|
| | ## Model Description |
| |
|
| | EXIT is designed to: |
| | - Compress retrieved documents while preserving critical information |
| | - Consider full document context when evaluating sentence importance |
| | - Enable parallelizable, context-aware extraction |
| | - Adapt dynamically to query complexity |
| | - Balance compression ratio and answer accuracy |
| |
|
| | ## Task and Intended Use |
| |
|
| | EXIT is trained to classify sentences as either relevant or irrelevant for answering a query based on their content and surrounding context. It is specifically designed for: |
| |
|
| | - RAG context compression |
| | - Open-domain question answering |
| | - Both single-hop and multi-hop queries |
| |
|
| | ## Quickstart |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | from peft import PeftModel |
| | import spacy |
| | |
| | # 1. Load models |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "google/gemma-2b-it", |
| | device_map="auto", |
| | torch_dtype=torch.float16 |
| | ) |
| | exit_model = PeftModel.from_pretrained( |
| | base_model, |
| | "doubleyyh/exit-gemma-2b" |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it") |
| | |
| | # 2. Initialize sentence splitter |
| | nlp = spacy.load("en_core_web_sm", disable=[ |
| | "tok2vec", "tagger", "parser", "attribute_ruler", |
| | "lemmatizer", "ner" |
| | ]) |
| | nlp.enable_pipe("senter") |
| | |
| | # 3. Input |
| | query = "How do solid-state drives (SSDs) improve computer performance?" |
| | context = """ |
| | Solid-state drives use flash memory to store data without moving parts. |
| | Unlike traditional hard drives, SSDs have no mechanical components. |
| | The absence of physical movement allows for much faster data access speeds. |
| | I bought my computer last week. |
| | SSDs significantly reduce boot times and application loading speeds. |
| | They consume less power and are more reliable than mechanical drives. |
| | The price of SSDs has decreased significantly in recent years. |
| | """ |
| | |
| | # 4. Process sentences |
| | def get_relevance(query: str, context: str, sentence: str, tau: float = 0.5) -> bool: |
| | prompt = f'''<start_of_turn>user |
| | Query: |
| | {query} |
| | Full context: |
| | {context} |
| | Sentence: |
| | {sentence} |
| | Is this sentence useful in answering the query? Answer only "Yes" or "No".<end_of_turn> |
| | <start_of_turn>model |
| | ''' |
| | inputs = tokenizer(prompt, return_tensors="pt").to(exit_model.device) |
| | |
| | with torch.no_grad(): |
| | outputs = exit_model(**inputs) |
| | yes_id = tokenizer.encode("Yes", add_special_tokens=False) |
| | no_id = tokenizer.encode("No", add_special_tokens=False) |
| | logits = outputs.logits[0, -1, [yes_id, no_id]] |
| | prob = torch.softmax(logits, dim=0)[0].item() |
| | |
| | return prob >= tau |
| | |
| | # 5. Compress document |
| | sentences = [sent.text.strip() for sent in nlp(context).sents] |
| | compressed = [sent for sent in sentences if get_relevance(query, context, sent)] |
| | compressed_text = " ".join(compressed) |
| | |
| | print(f"Compressed text ({len(compressed)}/{len(sentences)} sentences):", compressed_text) |
| | ``` |
| |
|
| | ## Training Data |
| |
|
| | The model was trained on the HotpotQA dataset using: |
| | - Positive examples: Sentences marked as supporting facts |
| | - Hard negatives: Sentences from same documents but not supporting facts |
| | - Random negatives: Sentences from unrelated documents |
| |
|
| | ## Parameters |
| |
|
| | - Base model: Gemma-2b-it |
| | - Training method: PEFT/LoRA |
| | - Recommended tau threshold: 0.5 |
| |
|
| | ## Limitations |
| |
|
| | - Currently optimized for English text only |
| | - No support for cross-lingual compression |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{hwang2024exit, |
| | title={EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation}, |
| | author={Hwang, Taeho and Cho, Sukmin and Jeong, Soyeong and Song, Hoyun and Han, SeungYoon and Park, Jong C.}, |
| | journal={arXiv preprint arXiv:2412.12559}, |
| | year={2024} |
| | } |
| | ``` |