Spaces:
Running
Running
| # π Timeout vs Memory Diagnostic Tools | |
| ## Overview | |
| When working with heavy models in HF Spaces, you may encounter issues that could be caused by: | |
| 1. **Timeout**: The model takes too long to load (>5 minutes) | |
| 2. **Memory**: The system runs out of RAM | |
| 3. **Both**: A combination of both issues | |
| This toolkit helps you identify and fix the exact problem. | |
| ## π Files Added | |
| ### 1. `diagnostic_tool.py` | |
| **Purpose**: Identify if the problem is timeout or memory | |
| **Usage**: | |
| ```bash | |
| python hf-spaces/diagnostic_tool.py | |
| ``` | |
| **What it does**: | |
| - Monitors system memory in real real real-time | |
| - Tracks model loading time | |
| - Detects the exact failure point | |
| - Provides specific recommendations | |
| **Output**: | |
| ``` | |
| π MODEL LOADING DIAGNOSTIC: meta-llama/Llama-3.2-1B | |
| π INITIAL SYSTEM STATE: | |
| - Available memory: 12.50 GB | |
| - Used memory: 3.45 GB (21.6%) | |
| β³ Starting model loading (timeout: 300s)... | |
| [1/2] Loading tokenizer... | |
| β Tokenizer loaded in 2.31s | |
| [2/2] Loading model... | |
| β Model loaded in 45.67s | |
| β LOADING SUCCESSFUL in 47.98s | |
| π‘ RECOMMENDATIONS | |
| β Model loaded successfully. | |
| ``` | |
| ### 2. `config_optimized.py` | |
| **Purpose**: Smart configuration based on model size | |
| **Features**: | |
| - Auto-detects model size category (small/medium/large) | |
| - Provides optimized timeout settings | |
| - Recommends appropriate HF Spaces tier | |
| - Warns about memory issues before loading | |
| **Usage**: | |
| ```python | |
| from config_optimized import HFSpacesConfig, get_optimized_request_config | |
| # Get optimal timeout for a model | |
| timeout = HFSpacesConfig.get_timeout_for_model("meta-llama/Llama-3.2-1B") | |
| # Get full request config | |
| config = get_optimized_request_config("meta-llama/Llama-3.2-1B") | |
| response = requests.post(url, json=payload, **config) | |
| # Check if model is recommended for your tier | |
| is_ok = HFSpacesConfig.is_model_recommended("meta-llama/Llama-3.2-1B", tier="free") | |
| ``` | |
| ### 3. `DIAGNOSTIC_README.md` | |
| **Purpose**: Complete guide with solutions | |
| **Contents**: | |
| - How to identify timeout vs memory issues | |
| - Step-by-step solutions for each problem | |
| - Model size comparison table | |
| - Code examples for fixes | |
| - Best practices | |
| ### 4. Improved Error Messages in `optipfair_frontend.py` | |
| **What changed**: | |
| - More informative timeout error messages | |
| - Explicit memory error detection | |
| - Actionable recommendations in errors | |
| - All messages in English | |
| **Example**: | |
| ``` | |
| β **Timeout Error:** | |
| The request exceeded 5 minutes (300s). | |
| **Possible causes:** | |
| 1. The model is very large and takes long to load | |
| 2. The server is processing many requests | |
| **Solutions:** | |
| β’ Use a smaller model (1B parameters) | |
| β’ Wait and try again (model may be caching) | |
| β’ If it persists, run `diagnostic_tool.py` for more information | |
| ``` | |
| ## π Quick Start Guide | |
| ### Step 1: Diagnose the Problem | |
| ```bash | |
| cd hf-spaces | |
| python diagnostic_tool.py | |
| ``` | |
| ### Step 2: Read the Output | |
| The tool will tell you: | |
| - β **Success**: Model loads fine | |
| - β **MEMORY_ERROR**: Need more RAM or smaller model | |
| - β° **TIMEOUT_ERROR**: Need more time or faster model | |
| ### Step 3: Apply the Solution | |
| #### For TIMEOUT problems: | |
| ```python | |
| # Option 1: Increase timeout in optipfair_frontend.py | |
| response = requests.post( | |
| url, | |
| json=payload, | |
| timeout=600 # Change from 300 to 600 seconds | |
| ) | |
| # Option 2: Use config_optimized.py | |
| from config_optimized import get_optimized_request_config | |
| config = get_optimized_request_config(model_name) | |
| response = requests.post(url, json=payload, **config) | |
| ``` | |
| #### For MEMORY problems: | |
| ```python | |
| # Option 1: Use smaller model | |
| AVAILABLE_MODELS = [ | |
| "meta-llama/Llama-3.2-1B", # β Works on free tier | |
| "oopere/pruned40-llama-3.2-1B", # β Works on free tier | |
| ] | |
| # Option 2: Use quantization (in backend) | |
| from transformers import AutoModel, BitsAndBytesConfig | |
| quantization_config = BitsAndBytesConfig(load_in_8bit=True) | |
| model = AutoModel.from_pretrained( | |
| model_name, | |
| quantization_config=quantization_config, | |
| low_cpu_mem_usage=True, | |
| ) | |
| # Option 3: Upgrade HF Spaces tier | |
| # Free: 16GB RAM β PRO: 32GB RAM β Enterprise: 64GB RAM | |
| ``` | |
| ## π Model Recommendations by Tier | |
| ### Free Tier (16GB RAM) | |
| β **Recommended**: | |
| - meta-llama/Llama-3.2-1B (~4 GB, ~30s load) | |
| - oopere/pruned40-llama-3.2-1B (~4 GB, ~30s load) | |
| - google/gemma-3-1b-pt (~4 GB, ~30s load) | |
| - Qwen/Qwen3-1.7B (~6 GB, ~45s load) | |
| β οΈ **May work with optimization**: | |
| - meta-llama/Llama-3.2-3B (~12 GB, ~90s load) | |
| β **Won't work**: | |
| - meta-llama/Llama-3-8B (~32 GB) | |
| - meta-llama/Llama-3-70B (~280 GB) | |
| ### PRO Tier (32GB RAM) | |
| β **Additional models**: | |
| - meta-llama/Llama-3.2-3B | |
| - meta-llama/Llama-3-8B (with quantization) | |
| ### Enterprise Tier (64GB RAM) | |
| β **Additional models**: | |
| - meta-llama/Llama-3-8B (full precision) | |
| - Larger models with quantization | |
| ## π― Common Scenarios | |
| ### Scenario 1: "My model times out after 5 minutes" | |
| **Diagnosis**: TIMEOUT_ERROR | |
| **Solution**: | |
| 1. Check if model is too large for your tier | |
| 2. Increase timeout to 600s (10 minutes) | |
| 3. Consider pre-loading models at startup | |
| ### Scenario 2: "Process crashes without clear error" | |
| **Diagnosis**: Likely MEMORY_ERROR (Out-Of-Memory kills the process) | |
| **Solution**: | |
| 1. Run `diagnostic_tool.py` to confirm | |
| 2. Use smaller model (1B parameters) | |
| 3. Use int8 quantization | |
| 4. Upgrade to PRO tier | |
| ### Scenario 3: "Sometimes works, sometimes doesn't" | |
| **Diagnosis**: Memory pressure or concurrent requests | |
| **Solution**: | |
| 1. Implement model caching | |
| 2. Add memory monitoring | |
| 3. Use smaller default model | |
| ## π οΈ Advanced: Pre-loading Models | |
| To avoid timeout on first request, pre-load models at startup: | |
| ```python | |
| # In hf-spaces/app.py | |
| from transformers import AutoModel, AutoTokenizer | |
| MODEL_CACHE = {} | |
| def preload_models(): | |
| """Pre-load common models at startup""" | |
| models = ["meta-llama/Llama-3.2-1B"] | |
| for model_name in models: | |
| try: | |
| print(f"Pre-loading {model_name}...") | |
| MODEL_CACHE[model_name] = { | |
| "model": AutoModel.from_pretrained( | |
| model_name, | |
| low_cpu_mem_usage=True | |
| ), | |
| "tokenizer": AutoTokenizer.from_pretrained(model_name) | |
| } | |
| print(f"β {model_name} ready") | |
| except Exception as e: | |
| print(f"β Could not pre-load {model_name}: {e}") | |
| def main(): | |
| preload_models() # Load models before starting services | |
| # ... rest of startup code | |
| ``` | |
| ## π Support | |
| If you still have issues after trying these solutions: | |
| 1. Check the full diagnostic output | |
| 2. Review HF Spaces logs | |
| 3. Verify your HF Spaces tier and limits | |
| 4. Consider using a different model architecture | |
| ## π Summary | |
| | Issue | Symptom | Solution | | |
| |-------|---------|----------| | |
| | **Timeout** | Request > 5 min | Increase timeout, use cache | | |
| | **Memory** | Process crashes/kills | Smaller model, quantization, upgrade tier | | |
| | **Both** | Slow + crashes | Smaller model + longer timeout | | |
| All tools are designed to help you quickly identify and fix the exact problem without guessing. | |