Spaces:
Sleeping
Sleeping
File size: 5,464 Bytes
1b4d5e8 d71e190 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
title: Rag with Binary Quantization
emoji: π
colorFrom: yellow
colorTo: indigo
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: RAG with Binary Quantization for enhanced performance
---
# RAG with Binary Quantization
A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.
## π Features
- **Binary Quantization**: Converts high-dimensional embeddings to binary vectors for memory efficiency
- **Milvus Vector Database**: Uses Milvus for scalable vector storage and similarity search
- **Gradio Web Interface**: User-friendly web UI for document upload and chat
- **BGE Embeddings**: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
- **OpenAI Integration**: Uses GPT-4.1 for intelligent question answering
- **Batch Processing**: Efficient document processing with configurable batch sizes
## ποΈ Architecture
```
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Documents βββββΆβ BGE Embeddings βββββΆβ Binary Vectors β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β User Query βββββΆβ Query Embedding βββββΆβ Milvus Search β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Retrieved Docs ββββββ Context Fusion ββββββ LLM Answer β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
```
## π οΈ Installation
1. **Clone the repository**:
```bash
git clone <repository-url>
cd rag-w-binary-quant
```
2. **Install dependencies**:
```bash
uv sync
```
3. **Set up environment variables**:
Create a `.env` file with your OpenAI API key:
```env
OPENAI_API_KEY=your_openai_api_key_here
```
## π Usage
### Starting the Application
Run the Gradio web interface:
```bash
uv run app.py
```
The application will be available at `http://localhost:7860`
### Using the Interface
1. **Upload Documents**:
- Go to the "Upload & Index" tab
- Upload your documents (supports multiple file formats)
- Click "Update Index" to process and index the documents
2. **Chat with Documents**:
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get intelligent answers based on the document content
## π§ Configuration
Key configuration parameters in `src/config.py`:
- `EMBEDDING_MODEL_NAME`: BAAI/bge-large-en-v1.5
- `COLLECTION_NAME`: "fast_rag"
- `MILVUS_DB_PATH`: "milvus_binary_quantized.db"
- `MODEL_NAME`: "gpt-4.1"
- `TEMPERATURE`: 0.2
## π Performance Benefits
- **Memory Efficiency**: Binary vectors use 8x less memory than float32 embeddings
- **Fast Search**: Hamming distance computation is highly optimized
- **Scalable**: Milvus provides enterprise-grade vector database capabilities
- **Accurate**: BGE embeddings provide high-quality semantic representations
## ποΈ Project Structure
```
rag-w-binary-quant/
βββ app.py # Gradio web interface
βββ main.py # Main application entry point
βββ src/
β βββ config.py # Configuration settings
β βββ data_loader.py # Document loading utilities
β βββ embedding_generator.py # Binary embedding generation
β βββ vector_store.py # Milvus vector database operations
β βββ rag_pipeline.py # RAG question answering pipeline
βββ documents/ # Uploaded document storage
βββ README.md
```
## π Technical Details
### Binary Quantization Process
1. **Float32 Embeddings**: Generate embeddings using BGE model
2. **Binary Conversion**: Convert to binary using threshold (positive values β 1, negative β 0)
3. **Packing**: Pack binary vectors into bytes for efficient storage
4. **Hamming Distance**: Use Hamming distance for similarity search
### Vector Search
- **Index Type**: BIN_FLAT (exact search for binary vectors)
- **Metric**: Hamming distance
- **Retrieval**: Top-k most similar documents
## π Acknowledgments
- [BAAI](https://github.com/FlagOpen/FlagEmbedding) for the BGE embedding model
- [Milvus](https://milvus.io/) for the vector database
- [Gradio](https://gradio.app/) for the web interface
- [OpenAI](https://openai.com/) for the language model
|