Spaces:

Ultronprime
/

cloud-rag-webhook

Runtime error

App Files Files Community

Ultronprime commited on Feb 27, 2025

Commit

b1a2e15

verified ·

1 Parent(s): de48d1c

Upload CLAUDE_HF.md with huggingface_hub

Browse files

Files changed (1) hide show

CLAUDE_HF.md +166 -0

CLAUDE_HF.md ADDED Viewed

	@@ -0,0 +1,166 @@

+# Hugging Face Implementation Plan
+## Overview
+This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option.
+## Repository Links
+- GitHub: https://github.com/Daanworg/cloud-rag-webhook
+- Hugging Face Space: https://huggingface.co/spaces/Ultronprime/cloud-rag-webhook
+## Migration Strategy
+The key difference in our approach is to **replace all Google Cloud dependencies with Hugging Face models and tools**:
+1. **Replace Google's DocumentAI** → Use Hugging Face OCR models (like `microsoft/layoutlm-base-uncased`)
+2. **Replace Vertex AI** → Use Hugging Face embeddings models (like `sentence-transformers/all-MiniLM-L6-v2`)
+3. **Replace BigQuery** → Use FAISS/Chroma vector store with local storage or Hugging Face Datasets
+4. **Replace Cloud Storage** → Use Hugging Face's persistent storage
+5. **Replace Cloud Run** → Use Hugging Face Spaces continuous execution
+## Implementation Steps
+1. **Set Up New Architecture**:
+   - Create a revised Dockerfile for Hugging Face
+   - Set up persistent storage (20GB purchased)
+   - Configure A100 GPU using `accelerate` for pro users
+2. **Replace Text Processing Pipeline**:
+   - Create a new OCR module using Transformers document models
+   - Implement a chunking system using pure Python
+   - Add text cleaning and processing without DocumentAI
+3. **Replace Vector Database**:
+   - Implement FAISS/Chroma for vector storage
+   - Use Hugging Face Datasets for persistent indexed storage
+   - Create migration utility to move data from BigQuery
+4. **Replace Embedding System**:
+   - Use `sentence-transformers` models for embeddings
+   - Implement similarity search using FAISS/Chroma
+   - Create a compatible API to replace Vertex AI functions
+5. **Update Application Layer**:
+   - Modify Flask app to run on Hugging Face
+   - Update file handling to use local storage
+   - Create model caching for better performance
+## Key Components
+1. **Text Processing**:
+```python
+# New approach using Hugging Face models
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+from datasets import Dataset
+def process_text(text_content):
+    """Process text using Hugging Face models."""
+    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+    model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
+    # Process and chunk the text
+    chunks = chunk_text(text_content)
+    # Store in persistent dataset
+    dataset = Dataset.from_dict({"text": chunks})
+    dataset.save_to_disk("./data/chunks")
+    return dataset
+```
+2. **Vector Storage**:
+```python
+# New approach using FAISS
+import faiss
+import numpy as np
+from sentence_transformers import SentenceTransformer
+class FAISSVectorStore:
+    def __init__(self):
+        self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
+        self.dimension = self.model.get_sentence_embedding_dimension()
+        self.index = faiss.IndexFlatL2(self.dimension)
+        self.texts = []
+    def add_texts(self, texts):
+        embeddings = self.model.encode(texts)
+        self.index.add(np.array(embeddings, dtype=np.float32))
+        self.texts.extend(texts)
+    def search(self, query, k=5):
+        query_embedding = self.model.encode([query])[0]
+        distances, indices = self.index.search(
+            np.array([query_embedding], dtype=np.float32), k
+        )
+        return [self.texts[i] for i in indices[0]]
+```
+3. **Hugging Face Space Configuration**:
+```yaml
+title: RAG Document Processing
+emoji: 📄
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 7860
+pinned: false
+models:
+  - sentence-transformers/all-MiniLM-L6-v2
+  - facebook/bart-large-cnn
+license: apache-2.0
+```
+## Automation Plan
+1. **Background Processing**:
+   - Implement a file watcher for the persistent storage directory
+   - Process files automatically when added to upload directory
+   - Use Gradio/Streamlit for UI with background task system
+2. **Scheduled Tasks**:
+   - Use Hugging Face Space's GitHub Actions for scheduling
+   - Run index maintenance tasks periodically
+   - Implement file processing queue for batch operations
+3. **GitHub Integration**:
+   - Push processed data to GitHub repository as backup
+   - Use GitHub to store model configuration
+   - Implement version control for processed data
+## Required Libraries
+```
+transformers==4.40.0
+datasets==2.17.1
+sentence-transformers==2.3.1
+faiss-cpu==1.7.4  # or faiss-gpu for CUDA support
+gradio==4.19.2
+streamlit==1.32.0
+langchain==0.1.5
+torch==2.1.2
+accelerate==0.28.0
+```
+## Hardware Requirements
+- Use Hugging Face Pro's free A100 tier (zero.gpu)
+- Configure model inference for optimal performance on GPU
+- Set up model caching to reduce memory usage
+- Utilize Hugging Face's persistent storage (20GB)
+## Project Goals
+Create a fully self-contained RAG system on Hugging Face:
+1. Process text files automatically
+2. Generate embeddings with Hugging Face models
+3. Store vectors in FAISS/Chroma on persistent storage
+4. Query the data with a simple API
+5. Run continuously "under the hood"
+6. Utilize Hugging Face Pro benefits (A100 GPU, persistent storage)
+## Implementation Files
+We'll create the following new files to implement the Hugging Face version:
+1. `hf_process_text.py` - Text processing with HF models
+2. `hf_embeddings.py` - Embedding generation with sentence-transformers
+3. `hf_vector_store.py` - FAISS/Chroma implementation
+4. `hf_app.py` - Gradio/Streamlit interface
+5. `hf_rag_query.py` - Query interface for HF models
+6. `requirements_hf.txt` - HF-specific dependencies
+This will allow us to maintain both implementations in parallel.