optipfair-bias-analyzer / DIAGNOSTIC_README.md
oopere's picture
feat: add diagnostic tools and configuration for timeout and memory issues in HF Spaces
b1f0789
# πŸ” Diagnostic Guide: Timeout vs Memory
## How to identify the problem?
### 1️⃣ Run the diagnostic tool
In your HF Space, execute:
```bash
python hf-spaces/diagnostic_tool.py
```
This tool will tell you **exactly** if the problem is:
- ❌ **MEMORY_ERROR**: The system ran out of RAM
- ⏰ **TIMEOUT_ERROR**: The operation took too long
- ❓ **OTHER_ERROR**: Another type of problem
### 2️⃣ Interpret the results
#### If you see "MEMORY_ERROR":
```
❌ PROBLEM DETECTED: OUT OF MEMORY
Memory used at failure: 15.8 GB (98.5%)
```
**Cause**: The model is too large for the available memory in HF Spaces.
**Solutions**:
1. **Use smaller models** (1B-1.7B parameters)
2. **Upgrade to HF Spaces PRO** (more RAM available)
3. **Use int8 quantization** (reduces memory usage ~50%)
4. **Load models with `low_cpu_mem_usage=True`**
#### If you see "TIMEOUT_ERROR":
```
⏰ TIMEOUT ERROR after 298.5s
Memory used: 8.2 GB (51.2%)
```
**Cause**: The model takes too long to load, but there is available memory.
**Solutions**:
1. **Increase timeout** from 300s to 600s or 900s
2. **Cache pre-loaded models** at startup
3. **Use faster models**
## πŸ› οΈ Implemented Solutions
### Solution 1: Increase Timeout (Easy)
Edit `hf-spaces/optipfair_frontend.py`:
```python
# Change from:
response = requests.post(url, json=payload, timeout=300)
# To:
response = requests.post(url, json=payload, timeout=600) # 10 minutes
```
### Solution 2: Use Quantization (For memory issues)
Edit model loading code in the backend:
```python
from transformers import AutoModel, BitsAndBytesConfig
# Configure int8 quantization (reduces memory usage ~50%)
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0,
)
model = AutoModel.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto",
low_cpu_mem_usage=True,
)
```
### Solution 3: Model Cache (For timeout)
Pre-load models at startup in `hf-spaces/app.py`:
```python
from transformers import AutoModel, AutoTokenizer
import logging
logger = logging.getLogger(__name__)
# Global model cache
MODEL_CACHE = {}
def preload_models():
"""Pre-load common models at startup"""
common_models = [
"meta-llama/Llama-3.2-1B",
"oopere/pruned40-llama-3.2-1B",
]
logger.info("πŸ”„ Pre-loading common models...")
for model_name in common_models:
try:
logger.info(f" Loading {model_name}...")
MODEL_CACHE[model_name] = {
"model": AutoModel.from_pretrained(model_name, low_cpu_mem_usage=True),
"tokenizer": AutoTokenizer.from_pretrained(model_name)
}
logger.info(f" βœ“ {model_name} loaded")
except Exception as e:
logger.warning(f" βœ— Could not pre-load {model_name}: {e}")
logger.info("βœ… Pre-loading complete")
def main():
# Pre-load models before starting services
preload_models()
# Rest of the code...
fastapi_thread = threading.Thread(target=run_fastapi, daemon=True)
fastapi_thread.start()
# ...
```
### Solution 4: Improved Error Messages
Better error messages are already included to help you identify the problem:
```python
except requests.exceptions.Timeout:
return (
None,
"❌ **Timeout Error:**\nThe model took too long to load (>5min). "
"This is normal with large models. Options:\n"
"1. Try with a smaller model\n"
"2. Wait and try again (model may be caching)\n"
"3. Contact admin to increase timeout",
""
)
except MemoryError:
return (
None,
"❌ **Memory Error:**\nNot enough RAM for this model. Options:\n"
"1. Use a smaller model (1B parameters)\n"
"2. Model requires more memory than available in HF Spaces",
""
)
```
## πŸ“Š Model Size Comparison
| Model | Parameters | RAM Needed* | Load Time** |
|--------|-----------|----------------|----------------|
| Llama-3.2-1B | 1B | ~4 GB | ~30s |
| Llama-3.2-3B | 3B | ~12 GB | ~90s |
| Llama-3-8B | 8B | ~32 GB | ~240s |
| Llama-3-70B | 70B | ~280 GB | ~600s+ |
*Without quantization, FP32
**On typical HF Spaces hardware
## 🎯 Recommended Action Plan
1. **Run the diagnostic**:
```bash
python hf-spaces/diagnostic_tool.py
```
2. **Read the results** and follow the specific recommendations
3. **Apply the appropriate solution**:
- If timeout β†’ Increase timeout or use cache
- If memory β†’ Use small models or quantization
4. **Test again** with the adjusted configuration
## πŸ“ Useful Logs in HF Spaces
Check the logs in HF Spaces for messages like:
```
πŸ” MODEL LOADING DIAGNOSTIC: meta-llama/Llama-3.2-1B
πŸ“Š INITIAL SYSTEM STATE:
- Available memory: 12.50 GB
- Used memory: 3.45 GB (21.6%)
⏳ Starting model loading (timeout: 300s)...
[1/2] Loading tokenizer...
βœ“ Tokenizer loaded in 2.31s
- Memory used: 3.48 GB (21.8%)
[2/2] Loading model...
βœ“ Model loaded in 45.67s
βœ… LOADING SUCCESSFUL in 47.98s
```
This tells you exactly how much memory and time each step uses.