serverdaun commited on
Commit
d71e190
Β·
1 Parent(s): 0827021

Add detailed README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md CHANGED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RAG with Binary Quantization
2
+
3
+ A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.
4
+
5
+ ## πŸš€ Features
6
+
7
+ - **Binary Quantization**: Converts high-dimensional embeddings to binary vectors for memory efficiency
8
+ - **Milvus Vector Database**: Uses Milvus for scalable vector storage and similarity search
9
+ - **Gradio Web Interface**: User-friendly web UI for document upload and chat
10
+ - **BGE Embeddings**: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
11
+ - **OpenAI Integration**: Uses GPT-4.1 for intelligent question answering
12
+ - **Batch Processing**: Efficient document processing with configurable batch sizes
13
+
14
+ ## πŸ—οΈ Architecture
15
+
16
+ ```
17
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
18
+ β”‚ Documents │───▢│ BGE Embeddings │───▢│ Binary Vectors β”‚
19
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
20
+ β”‚
21
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
22
+ β”‚ User Query │───▢│ Query Embedding │───▢│ Milvus Search β”‚
23
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
24
+ β”‚
25
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
26
+ β”‚ Retrieved Docs │◀───│ Context Fusion │◀───│ LLM Answer β”‚
27
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
28
+ ```
29
+
30
+ ## πŸ› οΈ Installation
31
+
32
+ 1. **Clone the repository**:
33
+ ```bash
34
+ git clone <repository-url>
35
+ cd rag-w-binary-quant
36
+ ```
37
+
38
+ 2. **Install dependencies**:
39
+ ```bash
40
+ uv sync
41
+ ```
42
+
43
+ 3. **Set up environment variables**:
44
+ Create a `.env` file with your OpenAI API key:
45
+ ```env
46
+ OPENAI_API_KEY=your_openai_api_key_here
47
+ ```
48
+
49
+ ## πŸš€ Usage
50
+
51
+ ### Starting the Application
52
+
53
+ Run the Gradio web interface:
54
+ ```bash
55
+ uv run app.py
56
+ ```
57
+
58
+ The application will be available at `http://localhost:7860`
59
+
60
+ ### Using the Interface
61
+
62
+ 1. **Upload Documents**:
63
+ - Go to the "Upload & Index" tab
64
+ - Upload your documents (supports multiple file formats)
65
+ - Click "Update Index" to process and index the documents
66
+
67
+ 2. **Chat with Documents**:
68
+ - Switch to the "Chat" tab
69
+ - Ask questions about your uploaded documents
70
+ - Get intelligent answers based on the document content
71
+
72
+ ## πŸ”§ Configuration
73
+
74
+ Key configuration parameters in `src/config.py`:
75
+
76
+ - `EMBEDDING_MODEL_NAME`: BAAI/bge-large-en-v1.5
77
+ - `COLLECTION_NAME`: "fast_rag"
78
+ - `MILVUS_DB_PATH`: "milvus_binary_quantized.db"
79
+ - `MODEL_NAME`: "gpt-4.1"
80
+ - `TEMPERATURE`: 0.2
81
+
82
+ ## πŸ“Š Performance Benefits
83
+
84
+ - **Memory Efficiency**: Binary vectors use 8x less memory than float32 embeddings
85
+ - **Fast Search**: Hamming distance computation is highly optimized
86
+ - **Scalable**: Milvus provides enterprise-grade vector database capabilities
87
+ - **Accurate**: BGE embeddings provide high-quality semantic representations
88
+
89
+ ## πŸ›οΈ Project Structure
90
+
91
+ ```
92
+ rag-w-binary-quant/
93
+ β”œβ”€β”€ app.py # Gradio web interface
94
+ β”œβ”€β”€ main.py # Main application entry point
95
+ β”œβ”€β”€ src/
96
+ β”‚ β”œβ”€β”€ config.py # Configuration settings
97
+ β”‚ β”œβ”€β”€ data_loader.py # Document loading utilities
98
+ β”‚ β”œβ”€β”€ embedding_generator.py # Binary embedding generation
99
+ β”‚ β”œβ”€β”€ vector_store.py # Milvus vector database operations
100
+ β”‚ └── rag_pipeline.py # RAG question answering pipeline
101
+ β”œβ”€β”€ documents/ # Uploaded document storage
102
+ └── README.md
103
+ ```
104
+
105
+ ## πŸ” Technical Details
106
+
107
+ ### Binary Quantization Process
108
+
109
+ 1. **Float32 Embeddings**: Generate embeddings using BGE model
110
+ 2. **Binary Conversion**: Convert to binary using threshold (positive values β†’ 1, negative β†’ 0)
111
+ 3. **Packing**: Pack binary vectors into bytes for efficient storage
112
+ 4. **Hamming Distance**: Use Hamming distance for similarity search
113
+
114
+ ### Vector Search
115
+
116
+ - **Index Type**: BIN_FLAT (exact search for binary vectors)
117
+ - **Metric**: Hamming distance
118
+ - **Retrieval**: Top-k most similar documents
119
+
120
+ ## πŸ™ Acknowledgments
121
+
122
+ - [BAAI](https://github.com/FlagOpen/FlagEmbedding) for the BGE embedding model
123
+ - [Milvus](https://milvus.io/) for the vector database
124
+ - [Gradio](https://gradio.app/) for the web interface
125
+ - [OpenAI](https://openai.com/) for the language model