Spaces:
Runtime error
Runtime error
| # Development Guidelines | |
| ## Build & Test Commands | |
| ``` | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| pip install -r requirements-test.txt | |
| # Run linting | |
| python -m ruff check . | |
| # Run formatting | |
| python -m ruff format . | |
| # Type checking | |
| python -m mypy . | |
| # Run a specific test | |
| python -m pytest test_e2e.py -v | |
| # Run a specific test function | |
| python -m pytest test_e2e.py::test_end_to_end -v | |
| # Deploy to Cloud Run | |
| ./deploy_rag.sh --project=YOUR_PROJECT_ID --region=YOUR_REGION | |
| # Local development | |
| python app.py | |
| ``` | |
| ## Code Style | |
| - **Line Length**: 100 characters max (defined in pyproject.toml) | |
| - **Docstrings**: Google style docstrings required (follow existing patterns) | |
| - **Type Hints**: Required for all function parameters and return values | |
| - **Imports**: Group standard lib, third-party, then local imports with blank lines between | |
| - **Error Handling**: Use specific exception types with logging | |
| - **Linters**: Ruff for linting (F, E, W, D, N, C, B, Q, A rules) | |
| - **Naming**: snake_case for variables/functions, CamelCase for classes | |
| - **Environment Variables**: Use os.environ.get() with defaults when appropriate | |
| ## Architecture | |
| - Flask web application for serving RAG queries | |
| - Google Cloud services: BigQuery, Vertex AI, DocumentAI, Cloud Storage | |
| - Cloud Functions triggered by GCS events | |
| - Cloud Run for serving the web application | |
| ## Hugging Face Implementation Plan | |
| ### Repository Link | |
| - GitHub: https://github.com/YOUR_USERNAME/cloud-rag-webhook | |
| ### Migration Steps | |
| 1. Create a new Hugging Face Space with Docker SDK | |
| 2. Enable Dev Mode for VS Code access | |
| 3. Clone the GitHub repository | |
| 4. Set up environment variables for secrets | |
| 5. Configure persistent storage (20GB purchased) | |
| ### Running on Hugging Face | |
| 1. Configure Space to always stay running (persistent execution) | |
| 2. Use "Secrets" in Space settings for API keys and credentials | |
| 3. Set up scheduled tasks with GitHub Actions for: | |
| - Processing files (daily) | |
| - Backing up code (every 6 hours) | |
| ### Implementation Details | |
| 1. **File Storage**: | |
| - Store input files in Hugging Face's persistent storage | |
| - Use Hugging Face Datasets for managing processed data | |
| 2. **Process Automation**: | |
| - For "under the hood" processing: | |
| - Configure Space to run continuously | |
| - Set up GitHub Actions for scheduled tasks | |
| - Use Docker health checks to ensure service stays alive | |
| 3. **Deployment Architecture**: | |
| - Hugging Face Space = Cloud Run equivalent | |
| - Space will run the server continuously | |
| - Configure autoscaling in the Dockerfile settings | |
| ### Key Files | |
| - `auto_process_bucket.py`: Batch file processor | |
| - `process_text.py`: Individual file processor | |
| - `rag_query.py`: Query interface | |
| - `app.py`: Web application | |
| - `auto_backup.sh`: GitHub backup script | |
| - `setup_all.sh`: Complete setup script | |
| ### Required Environment Variables | |
| - `GOOGLE_APPLICATION_CREDENTIALS`: Google Cloud credentials | |
| - `PROJECT_ID`: Google Cloud project ID | |
| - `BUCKET_NAME`: GCS bucket name | |
| - `GITHUB_TOKEN`: For GitHub access | |
| - `HF_TOKEN`: For Hugging Face API access | |
| ### Hugging Face Specific Updates | |
| - Update Dockerfile for Hugging Face compatibility | |
| - Create Space UI in `app.py` using Gradio or Streamlit | |
| - Use Hugging Face Datasets API in addition to BigQuery | |
| ## Project Goal | |
| Create an automated RAG system that: | |
| 1. Automatically processes text/PDF files | |
| 2. Runs continuously "under the hood" | |
| 3. Provides a simple query interface | |
| 4. Backs up all code and data | |
| 5. Requires minimal maintenance |