File size: 5,738 Bytes
1b4d5e8
 
 
 
 
 
 
 
 
 
 
 
fc565cd
 
1b4d5e8
d71e190
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: Rag with Binary Quantization
emoji: πŸ“œ
colorFrom: yellow
colorTo: indigo
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: RAG with Binary Quantization for enhanced performance
---
![CD to HF Space](https://github.com/serverdaun/rag-w-binary-quant/actions/workflows/cd-hf.yml/badge.svg)
[![View on Hugging Face Spaces](https://img.shields.io/badge/Hugging%20Face-Spaces-blue?logo=huggingface)](https://huggingface.co/spaces/serverdaun/rag-w-binary-quant)

# RAG with Binary Quantization

A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.

## πŸš€ Features

- **Binary Quantization**: Converts high-dimensional embeddings to binary vectors for memory efficiency
- **Milvus Vector Database**: Uses Milvus for scalable vector storage and similarity search
- **Gradio Web Interface**: User-friendly web UI for document upload and chat
- **BGE Embeddings**: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
- **OpenAI Integration**: Uses GPT-4.1 for intelligent question answering
- **Batch Processing**: Efficient document processing with configurable batch sizes

## πŸ—οΈ Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Documents     │───▢│  BGE Embeddings  │───▢│ Binary Vectors  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Query    │───▢│  Query Embedding │───▢│  Milvus Search  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Retrieved Docs │◀───│  Context Fusion  │◀───│  LLM Answer     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## πŸ› οΈ Installation

1. **Clone the repository**:
   ```bash
   git clone <repository-url>
   cd rag-w-binary-quant
   ```

2. **Install dependencies**:
   ```bash
   uv sync
   ```

3. **Set up environment variables**:
   Create a `.env` file with your OpenAI API key:
   ```env
   OPENAI_API_KEY=your_openai_api_key_here
   ```

## πŸš€ Usage

### Starting the Application

Run the Gradio web interface:
```bash
uv run app.py
```

The application will be available at `http://localhost:7860`

### Using the Interface

1. **Upload Documents**: 
   - Go to the "Upload & Index" tab
   - Upload your documents (supports multiple file formats)
   - Click "Update Index" to process and index the documents

2. **Chat with Documents**:
   - Switch to the "Chat" tab
   - Ask questions about your uploaded documents
   - Get intelligent answers based on the document content

## πŸ”§ Configuration

Key configuration parameters in `src/config.py`:

- `EMBEDDING_MODEL_NAME`: BAAI/bge-large-en-v1.5
- `COLLECTION_NAME`: "fast_rag"
- `MILVUS_DB_PATH`: "milvus_binary_quantized.db"
- `MODEL_NAME`: "gpt-4.1"
- `TEMPERATURE`: 0.2

## πŸ“Š Performance Benefits

- **Memory Efficiency**: Binary vectors use 8x less memory than float32 embeddings
- **Fast Search**: Hamming distance computation is highly optimized
- **Scalable**: Milvus provides enterprise-grade vector database capabilities
- **Accurate**: BGE embeddings provide high-quality semantic representations

## πŸ›οΈ Project Structure

```
rag-w-binary-quant/
β”œβ”€β”€ app.py                 # Gradio web interface
β”œβ”€β”€ main.py               # Main application entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config.py         # Configuration settings
β”‚   β”œβ”€β”€ data_loader.py    # Document loading utilities
β”‚   β”œβ”€β”€ embedding_generator.py  # Binary embedding generation
β”‚   β”œβ”€β”€ vector_store.py   # Milvus vector database operations
β”‚   └── rag_pipeline.py   # RAG question answering pipeline
β”œβ”€β”€ documents/            # Uploaded document storage
└── README.md
```

## πŸ” Technical Details

### Binary Quantization Process

1. **Float32 Embeddings**: Generate embeddings using BGE model
2. **Binary Conversion**: Convert to binary using threshold (positive values β†’ 1, negative β†’ 0)
3. **Packing**: Pack binary vectors into bytes for efficient storage
4. **Hamming Distance**: Use Hamming distance for similarity search

### Vector Search

- **Index Type**: BIN_FLAT (exact search for binary vectors)
- **Metric**: Hamming distance
- **Retrieval**: Top-k most similar documents

## πŸ™ Acknowledgments

- [BAAI](https://github.com/FlagOpen/FlagEmbedding) for the BGE embedding model
- [Milvus](https://milvus.io/) for the vector database
- [Gradio](https://gradio.app/) for the web interface
- [OpenAI](https://openai.com/) for the language model