Spaces:

Ultronprime
/

cloud-rag-webhook

Runtime error

App Files Files Community

cloud-rag-webhook / CLAUDE_HF.md

Ultronprime

Upload CLAUDE_HF.md with huggingface_hub

b1a2e15 verified 11 months ago

preview code

raw

history blame contribute delete

5.79 kB

	# Hugging Face Implementation Plan

	## Overview
	This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option.

	## Repository Links
	- GitHub: https://github.com/Daanworg/cloud-rag-webhook
	- Hugging Face Space: https://huggingface.co/spaces/Ultronprime/cloud-rag-webhook

	## Migration Strategy
	The key difference in our approach is to replace all Google Cloud dependencies with Hugging Face models and tools:

	1. Replace Google's DocumentAI → Use Hugging Face OCR models (like `microsoft/layoutlm-base-uncased`)
	2. Replace Vertex AI → Use Hugging Face embeddings models (like `sentence-transformers/all-MiniLM-L6-v2`)
	3. Replace BigQuery → Use FAISS/Chroma vector store with local storage or Hugging Face Datasets
	4. Replace Cloud Storage → Use Hugging Face's persistent storage
	5. Replace Cloud Run → Use Hugging Face Spaces continuous execution

	## Implementation Steps

	1. Set Up New Architecture:
	- Create a revised Dockerfile for Hugging Face
	- Set up persistent storage (20GB purchased)
	- Configure A100 GPU using `accelerate` for pro users

	2. Replace Text Processing Pipeline:
	- Create a new OCR module using Transformers document models
	- Implement a chunking system using pure Python
	- Add text cleaning and processing without DocumentAI

	3. Replace Vector Database:
	- Implement FAISS/Chroma for vector storage
	- Use Hugging Face Datasets for persistent indexed storage
	- Create migration utility to move data from BigQuery

	4. Replace Embedding System:
	- Use `sentence-transformers` models for embeddings
	- Implement similarity search using FAISS/Chroma
	- Create a compatible API to replace Vertex AI functions

	5. Update Application Layer:
	- Modify Flask app to run on Hugging Face
	- Update file handling to use local storage
	- Create model caching for better performance

	## Key Components

	1. Text Processing:
	```python
	# New approach using Hugging Face models
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	from datasets import Dataset

	def process_text(text_content):
	"""Process text using Hugging Face models."""
	tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
	model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

	# Process and chunk the text
	chunks = chunk_text(text_content)

	# Store in persistent dataset
	dataset = Dataset.from_dict({"text": chunks})
	dataset.save_to_disk("./data/chunks")

	return dataset
	```

	2. Vector Storage:
	```python
	# New approach using FAISS
	import faiss
	import numpy as np
	from sentence_transformers import SentenceTransformer

	class FAISSVectorStore:
	def __init__(self):
	self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
	self.dimension = self.model.get_sentence_embedding_dimension()
	self.index = faiss.IndexFlatL2(self.dimension)
	self.texts = []

	def add_texts(self, texts):
	embeddings = self.model.encode(texts)
	self.index.add(np.array(embeddings, dtype=np.float32))
	self.texts.extend(texts)

	def search(self, query, k=5):
	query_embedding = self.model.encode([query])[0]
	distances, indices = self.index.search(
	np.array([query_embedding], dtype=np.float32), k
	)
	return [self.texts[i] for i in indices[0]]
	```

	3. Hugging Face Space Configuration:
	```yaml
	title: RAG Document Processing
	emoji: 📄
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	pinned: false
	models:
	- sentence-transformers/all-MiniLM-L6-v2
	- facebook/bart-large-cnn
	license: apache-2.0
	```

	## Automation Plan

	1. Background Processing:
	- Implement a file watcher for the persistent storage directory
	- Process files automatically when added to upload directory
	- Use Gradio/Streamlit for UI with background task system

	2. Scheduled Tasks:
	- Use Hugging Face Space's GitHub Actions for scheduling
	- Run index maintenance tasks periodically
	- Implement file processing queue for batch operations

	3. GitHub Integration:
	- Push processed data to GitHub repository as backup
	- Use GitHub to store model configuration
	- Implement version control for processed data

	## Required Libraries
	```
	transformers==4.40.0
	datasets==2.17.1
	sentence-transformers==2.3.1
	faiss-cpu==1.7.4 # or faiss-gpu for CUDA support
	gradio==4.19.2
	streamlit==1.32.0
	langchain==0.1.5
	torch==2.1.2
	accelerate==0.28.0
	```

	## Hardware Requirements
	- Use Hugging Face Pro's free A100 tier (zero.gpu)
	- Configure model inference for optimal performance on GPU
	- Set up model caching to reduce memory usage
	- Utilize Hugging Face's persistent storage (20GB)

	## Project Goals
	Create a fully self-contained RAG system on Hugging Face:
	1. Process text files automatically
	2. Generate embeddings with Hugging Face models
	3. Store vectors in FAISS/Chroma on persistent storage
	4. Query the data with a simple API
	5. Run continuously "under the hood"
	6. Utilize Hugging Face Pro benefits (A100 GPU, persistent storage)

	## Implementation Files
	We'll create the following new files to implement the Hugging Face version:

	1. `hf_process_text.py` - Text processing with HF models
	2. `hf_embeddings.py` - Embedding generation with sentence-transformers
	3. `hf_vector_store.py` - FAISS/Chroma implementation
	4. `hf_app.py` - Gradio/Streamlit interface
	5. `hf_rag_query.py` - Query interface for HF models
	6. `requirements_hf.txt` - HF-specific dependencies

	This will allow us to maintain both implementations in parallel.