Spaces:

serverdaun
/

rag-w-binary-quant

Sleeping

rag-w-binary-quant / README.md

Add badges for continuous deployment and Hugging Face Spaces in README.md to enhance visibility and accessibility of project status.

fc565cd 4 months ago

preview code

raw

history blame

5.74 kB

	---
	title: Rag with Binary Quantization
	emoji: 📜
	colorFrom: yellow
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.41.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: RAG with Binary Quantization for enhanced performance
	---
	![CD to HF Space](https://github.com/serverdaun/rag-w-binary-quant/actions/workflows/cd-hf.yml/badge.svg)
	[![View on Hugging Face Spaces](https://img.shields.io/badge/Hugging%20Face-Spaces-blue?logo=huggingface)](https://huggingface.co/spaces/serverdaun/rag-w-binary-quant)

	# RAG with Binary Quantization

	A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.

	## 🚀 Features

	- Binary Quantization: Converts high-dimensional embeddings to binary vectors for memory efficiency
	- Milvus Vector Database: Uses Milvus for scalable vector storage and similarity search
	- Gradio Web Interface: User-friendly web UI for document upload and chat
	- BGE Embeddings: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
	- OpenAI Integration: Uses GPT-4.1 for intelligent question answering
	- Batch Processing: Efficient document processing with configurable batch sizes

	## 🏗️ Architecture

	```
	┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
	│ Documents │───▶│ BGE Embeddings │───▶│ Binary Vectors │
	└─────────────────┘ └──────────────────┘ └─────────────────┘
	│
	┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
	│ User Query │───▶│ Query Embedding │───▶│ Milvus Search │
	└─────────────────┘ └──────────────────┘ └─────────────────┘
	│
	┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
	│ Retrieved Docs │◀───│ Context Fusion │◀───│ LLM Answer │
	└─────────────────┘ └──────────────────┘ └─────────────────┘
	```

	## 🛠️ Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd rag-w-binary-quant
	```

	2. Install dependencies:
	```bash
	uv sync
	```

	3. Set up environment variables:
	Create a `.env` file with your OpenAI API key:
	```env
	OPENAI_API_KEY=your_openai_api_key_here
	```

	## 🚀 Usage

	### Starting the Application

	Run the Gradio web interface:
	```bash
	uv run app.py
	```

	The application will be available at `http://localhost:7860`

	### Using the Interface

	1. Upload Documents:
	- Go to the "Upload & Index" tab
	- Upload your documents (supports multiple file formats)
	- Click "Update Index" to process and index the documents

	2. Chat with Documents:
	- Switch to the "Chat" tab
	- Ask questions about your uploaded documents
	- Get intelligent answers based on the document content

	## 🔧 Configuration

	Key configuration parameters in `src/config.py`:

	- `EMBEDDING_MODEL_NAME`: BAAI/bge-large-en-v1.5
	- `COLLECTION_NAME`: "fast_rag"
	- `MILVUS_DB_PATH`: "milvus_binary_quantized.db"
	- `MODEL_NAME`: "gpt-4.1"
	- `TEMPERATURE`: 0.2

	## 📊 Performance Benefits

	- Memory Efficiency: Binary vectors use 8x less memory than float32 embeddings
	- Fast Search: Hamming distance computation is highly optimized
	- Scalable: Milvus provides enterprise-grade vector database capabilities
	- Accurate: BGE embeddings provide high-quality semantic representations

	## 🏛️ Project Structure

	```
	rag-w-binary-quant/
	├── app.py # Gradio web interface
	├── main.py # Main application entry point
	├── src/
	│ ├── config.py # Configuration settings
	│ ├── data_loader.py # Document loading utilities
	│ ├── embedding_generator.py # Binary embedding generation
	│ ├── vector_store.py # Milvus vector database operations
	│ └── rag_pipeline.py # RAG question answering pipeline
	├── documents/ # Uploaded document storage
	└── README.md
	```

	## 🔍 Technical Details

	### Binary Quantization Process

	1. Float32 Embeddings: Generate embeddings using BGE model
	2. Binary Conversion: Convert to binary using threshold (positive values → 1, negative → 0)
	3. Packing: Pack binary vectors into bytes for efficient storage
	4. Hamming Distance: Use Hamming distance for similarity search

	### Vector Search

	- Index Type: BIN_FLAT (exact search for binary vectors)
	- Metric: Hamming distance
	- Retrieval: Top-k most similar documents

	## 🙏 Acknowledgments

	- [BAAI](https://github.com/FlagOpen/FlagEmbedding) for the BGE embedding model
	- [Milvus](https://milvus.io/) for the vector database
	- [Gradio](https://gradio.app/) for the web interface
	- [OpenAI](https://openai.com/) for the language model