Spaces:
Running
Running
File size: 5,138 Bytes
b1f0789 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# π Diagnostic Guide: Timeout vs Memory
## How to identify the problem?
### 1οΈβ£ Run the diagnostic tool
In your HF Space, execute:
```bash
python hf-spaces/diagnostic_tool.py
```
This tool will tell you **exactly** if the problem is:
- β **MEMORY_ERROR**: The system ran out of RAM
- β° **TIMEOUT_ERROR**: The operation took too long
- β **OTHER_ERROR**: Another type of problem
### 2οΈβ£ Interpret the results
#### If you see "MEMORY_ERROR":
```
β PROBLEM DETECTED: OUT OF MEMORY
Memory used at failure: 15.8 GB (98.5%)
```
**Cause**: The model is too large for the available memory in HF Spaces.
**Solutions**:
1. **Use smaller models** (1B-1.7B parameters)
2. **Upgrade to HF Spaces PRO** (more RAM available)
3. **Use int8 quantization** (reduces memory usage ~50%)
4. **Load models with `low_cpu_mem_usage=True`**
#### If you see "TIMEOUT_ERROR":
```
β° TIMEOUT ERROR after 298.5s
Memory used: 8.2 GB (51.2%)
```
**Cause**: The model takes too long to load, but there is available memory.
**Solutions**:
1. **Increase timeout** from 300s to 600s or 900s
2. **Cache pre-loaded models** at startup
3. **Use faster models**
## π οΈ Implemented Solutions
### Solution 1: Increase Timeout (Easy)
Edit `hf-spaces/optipfair_frontend.py`:
```python
# Change from:
response = requests.post(url, json=payload, timeout=300)
# To:
response = requests.post(url, json=payload, timeout=600) # 10 minutes
```
### Solution 2: Use Quantization (For memory issues)
Edit model loading code in the backend:
```python
from transformers import AutoModel, BitsAndBytesConfig
# Configure int8 quantization (reduces memory usage ~50%)
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0,
)
model = AutoModel.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto",
low_cpu_mem_usage=True,
)
```
### Solution 3: Model Cache (For timeout)
Pre-load models at startup in `hf-spaces/app.py`:
```python
from transformers import AutoModel, AutoTokenizer
import logging
logger = logging.getLogger(__name__)
# Global model cache
MODEL_CACHE = {}
def preload_models():
"""Pre-load common models at startup"""
common_models = [
"meta-llama/Llama-3.2-1B",
"oopere/pruned40-llama-3.2-1B",
]
logger.info("π Pre-loading common models...")
for model_name in common_models:
try:
logger.info(f" Loading {model_name}...")
MODEL_CACHE[model_name] = {
"model": AutoModel.from_pretrained(model_name, low_cpu_mem_usage=True),
"tokenizer": AutoTokenizer.from_pretrained(model_name)
}
logger.info(f" β {model_name} loaded")
except Exception as e:
logger.warning(f" β Could not pre-load {model_name}: {e}")
logger.info("β
Pre-loading complete")
def main():
# Pre-load models before starting services
preload_models()
# Rest of the code...
fastapi_thread = threading.Thread(target=run_fastapi, daemon=True)
fastapi_thread.start()
# ...
```
### Solution 4: Improved Error Messages
Better error messages are already included to help you identify the problem:
```python
except requests.exceptions.Timeout:
return (
None,
"β **Timeout Error:**\nThe model took too long to load (>5min). "
"This is normal with large models. Options:\n"
"1. Try with a smaller model\n"
"2. Wait and try again (model may be caching)\n"
"3. Contact admin to increase timeout",
""
)
except MemoryError:
return (
None,
"β **Memory Error:**\nNot enough RAM for this model. Options:\n"
"1. Use a smaller model (1B parameters)\n"
"2. Model requires more memory than available in HF Spaces",
""
)
```
## π Model Size Comparison
| Model | Parameters | RAM Needed* | Load Time** |
|--------|-----------|----------------|----------------|
| Llama-3.2-1B | 1B | ~4 GB | ~30s |
| Llama-3.2-3B | 3B | ~12 GB | ~90s |
| Llama-3-8B | 8B | ~32 GB | ~240s |
| Llama-3-70B | 70B | ~280 GB | ~600s+ |
*Without quantization, FP32
**On typical HF Spaces hardware
## π― Recommended Action Plan
1. **Run the diagnostic**:
```bash
python hf-spaces/diagnostic_tool.py
```
2. **Read the results** and follow the specific recommendations
3. **Apply the appropriate solution**:
- If timeout β Increase timeout or use cache
- If memory β Use small models or quantization
4. **Test again** with the adjusted configuration
## π Useful Logs in HF Spaces
Check the logs in HF Spaces for messages like:
```
π MODEL LOADING DIAGNOSTIC: meta-llama/Llama-3.2-1B
π INITIAL SYSTEM STATE:
- Available memory: 12.50 GB
- Used memory: 3.45 GB (21.6%)
β³ Starting model loading (timeout: 300s)...
[1/2] Loading tokenizer...
β Tokenizer loaded in 2.31s
- Memory used: 3.48 GB (21.8%)
[2/2] Loading model...
β Model loaded in 45.67s
β
LOADING SUCCESSFUL in 47.98s
```
This tells you exactly how much memory and time each step uses.
|