Spaces:
Runtime error
Runtime error
| # Hugging Face Implementation Plan | |
| ## Overview | |
| This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option. | |
| ## Repository Links | |
| - GitHub: https://github.com/Daanworg/cloud-rag-webhook | |
| - Hugging Face Space: https://huggingface.co/spaces/Ultronprime/cloud-rag-webhook | |
| ## Migration Strategy | |
| The key difference in our approach is to **replace all Google Cloud dependencies with Hugging Face models and tools**: | |
| 1. **Replace Google's DocumentAI** β Use Hugging Face OCR models (like `microsoft/layoutlm-base-uncased`) | |
| 2. **Replace Vertex AI** β Use Hugging Face embeddings models (like `sentence-transformers/all-MiniLM-L6-v2`) | |
| 3. **Replace BigQuery** β Use FAISS/Chroma vector store with local storage or Hugging Face Datasets | |
| 4. **Replace Cloud Storage** β Use Hugging Face's persistent storage | |
| 5. **Replace Cloud Run** β Use Hugging Face Spaces continuous execution | |
| ## Implementation Steps | |
| 1. **Set Up New Architecture**: | |
| - Create a revised Dockerfile for Hugging Face | |
| - Set up persistent storage (20GB purchased) | |
| - Configure A100 GPU using `accelerate` for pro users | |
| 2. **Replace Text Processing Pipeline**: | |
| - Create a new OCR module using Transformers document models | |
| - Implement a chunking system using pure Python | |
| - Add text cleaning and processing without DocumentAI | |
| 3. **Replace Vector Database**: | |
| - Implement FAISS/Chroma for vector storage | |
| - Use Hugging Face Datasets for persistent indexed storage | |
| - Create migration utility to move data from BigQuery | |
| 4. **Replace Embedding System**: | |
| - Use `sentence-transformers` models for embeddings | |
| - Implement similarity search using FAISS/Chroma | |
| - Create a compatible API to replace Vertex AI functions | |
| 5. **Update Application Layer**: | |
| - Modify Flask app to run on Hugging Face | |
| - Update file handling to use local storage | |
| - Create model caching for better performance | |
| ## Key Components | |
| 1. **Text Processing**: | |
| ```python | |
| # New approach using Hugging Face models | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| from datasets import Dataset | |
| def process_text(text_content): | |
| """Process text using Hugging Face models.""" | |
| tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") | |
| model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") | |
| # Process and chunk the text | |
| chunks = chunk_text(text_content) | |
| # Store in persistent dataset | |
| dataset = Dataset.from_dict({"text": chunks}) | |
| dataset.save_to_disk("./data/chunks") | |
| return dataset | |
| ``` | |
| 2. **Vector Storage**: | |
| ```python | |
| # New approach using FAISS | |
| import faiss | |
| import numpy as np | |
| from sentence_transformers import SentenceTransformer | |
| class FAISSVectorStore: | |
| def __init__(self): | |
| self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') | |
| self.dimension = self.model.get_sentence_embedding_dimension() | |
| self.index = faiss.IndexFlatL2(self.dimension) | |
| self.texts = [] | |
| def add_texts(self, texts): | |
| embeddings = self.model.encode(texts) | |
| self.index.add(np.array(embeddings, dtype=np.float32)) | |
| self.texts.extend(texts) | |
| def search(self, query, k=5): | |
| query_embedding = self.model.encode([query])[0] | |
| distances, indices = self.index.search( | |
| np.array([query_embedding], dtype=np.float32), k | |
| ) | |
| return [self.texts[i] for i in indices[0]] | |
| ``` | |
| 3. **Hugging Face Space Configuration**: | |
| ```yaml | |
| title: RAG Document Processing | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| models: | |
| - sentence-transformers/all-MiniLM-L6-v2 | |
| - facebook/bart-large-cnn | |
| license: apache-2.0 | |
| ``` | |
| ## Automation Plan | |
| 1. **Background Processing**: | |
| - Implement a file watcher for the persistent storage directory | |
| - Process files automatically when added to upload directory | |
| - Use Gradio/Streamlit for UI with background task system | |
| 2. **Scheduled Tasks**: | |
| - Use Hugging Face Space's GitHub Actions for scheduling | |
| - Run index maintenance tasks periodically | |
| - Implement file processing queue for batch operations | |
| 3. **GitHub Integration**: | |
| - Push processed data to GitHub repository as backup | |
| - Use GitHub to store model configuration | |
| - Implement version control for processed data | |
| ## Required Libraries | |
| ``` | |
| transformers==4.40.0 | |
| datasets==2.17.1 | |
| sentence-transformers==2.3.1 | |
| faiss-cpu==1.7.4 # or faiss-gpu for CUDA support | |
| gradio==4.19.2 | |
| streamlit==1.32.0 | |
| langchain==0.1.5 | |
| torch==2.1.2 | |
| accelerate==0.28.0 | |
| ``` | |
| ## Hardware Requirements | |
| - Use Hugging Face Pro's free A100 tier (zero.gpu) | |
| - Configure model inference for optimal performance on GPU | |
| - Set up model caching to reduce memory usage | |
| - Utilize Hugging Face's persistent storage (20GB) | |
| ## Project Goals | |
| Create a fully self-contained RAG system on Hugging Face: | |
| 1. Process text files automatically | |
| 2. Generate embeddings with Hugging Face models | |
| 3. Store vectors in FAISS/Chroma on persistent storage | |
| 4. Query the data with a simple API | |
| 5. Run continuously "under the hood" | |
| 6. Utilize Hugging Face Pro benefits (A100 GPU, persistent storage) | |
| ## Implementation Files | |
| We'll create the following new files to implement the Hugging Face version: | |
| 1. `hf_process_text.py` - Text processing with HF models | |
| 2. `hf_embeddings.py` - Embedding generation with sentence-transformers | |
| 3. `hf_vector_store.py` - FAISS/Chroma implementation | |
| 4. `hf_app.py` - Gradio/Streamlit interface | |
| 5. `hf_rag_query.py` - Query interface for HF models | |
| 6. `requirements_hf.txt` - HF-specific dependencies | |
| This will allow us to maintain both implementations in parallel. |