Spaces:

oopere
/

optipfair-bias-analyzer

Running

App Files Files Community

optipfair-bias-analyzer / README_DIAGNOSTICS.md

oopere

fix: correct typo in README regarding memory monitoring

db1624f 23 days ago

preview code

raw

history blame contribute delete

6.99 kB

	# 🔍 Timeout vs Memory Diagnostic Tools

	## Overview

	When working with heavy models in HF Spaces, you may encounter issues that could be caused by:
	1. Timeout: The model takes too long to load (>5 minutes)
	2. Memory: The system runs out of RAM
	3. Both: A combination of both issues

	This toolkit helps you identify and fix the exact problem.

	## 📁 Files Added

	### 1. `diagnostic_tool.py`
	Purpose: Identify if the problem is timeout or memory

	Usage:
	```bash
	python hf-spaces/diagnostic_tool.py
	```

	What it does:
	- Monitors system memory in real real real-time
	- Tracks model loading time
	- Detects the exact failure point
	- Provides specific recommendations

	Output:
	```
	🔍 MODEL LOADING DIAGNOSTIC: meta-llama/Llama-3.2-1B
	📊 INITIAL SYSTEM STATE:
	- Available memory: 12.50 GB
	- Used memory: 3.45 GB (21.6%)
	⏳ Starting model loading (timeout: 300s)...
	[1/2] Loading tokenizer...
	✓ Tokenizer loaded in 2.31s
	[2/2] Loading model...
	✓ Model loaded in 45.67s
	✅ LOADING SUCCESSFUL in 47.98s

	💡 RECOMMENDATIONS
	✅ Model loaded successfully.
	```

	### 2. `config_optimized.py`
	Purpose: Smart configuration based on model size

	Features:
	- Auto-detects model size category (small/medium/large)
	- Provides optimized timeout settings
	- Recommends appropriate HF Spaces tier
	- Warns about memory issues before loading

	Usage:
	```python
	from config_optimized import HFSpacesConfig, get_optimized_request_config

	# Get optimal timeout for a model
	timeout = HFSpacesConfig.get_timeout_for_model("meta-llama/Llama-3.2-1B")

	# Get full request config
	config = get_optimized_request_config("meta-llama/Llama-3.2-1B")
	response = requests.post(url, json=payload, **config)

	# Check if model is recommended for your tier
	is_ok = HFSpacesConfig.is_model_recommended("meta-llama/Llama-3.2-1B", tier="free")
	```

	### 3. `DIAGNOSTIC_README.md`
	Purpose: Complete guide with solutions

	Contents:
	- How to identify timeout vs memory issues
	- Step-by-step solutions for each problem
	- Model size comparison table
	- Code examples for fixes
	- Best practices

	### 4. Improved Error Messages in `optipfair_frontend.py`
	What changed:
	- More informative timeout error messages
	- Explicit memory error detection
	- Actionable recommendations in errors
	- All messages in English

	Example:
	```
	❌ Timeout Error:
	The request exceeded 5 minutes (300s).

	Possible causes:
	1. The model is very large and takes long to load
	2. The server is processing many requests

	Solutions:
	• Use a smaller model (1B parameters)
	• Wait and try again (model may be caching)
	• If it persists, run `diagnostic_tool.py` for more information
	```

	## 🚀 Quick Start Guide

	### Step 1: Diagnose the Problem
	```bash
	cd hf-spaces
	python diagnostic_tool.py
	```

	### Step 2: Read the Output
	The tool will tell you:
	- ✅ Success: Model loads fine
	- ❌ MEMORY_ERROR: Need more RAM or smaller model
	- ⏰ TIMEOUT_ERROR: Need more time or faster model

	### Step 3: Apply the Solution

	#### For TIMEOUT problems:
	```python
	# Option 1: Increase timeout in optipfair_frontend.py
	response = requests.post(
	url,
	json=payload,
	timeout=600 # Change from 300 to 600 seconds
	)

	# Option 2: Use config_optimized.py
	from config_optimized import get_optimized_request_config
	config = get_optimized_request_config(model_name)
	response = requests.post(url, json=payload, **config)
	```

	#### For MEMORY problems:
	```python
	# Option 1: Use smaller model
	AVAILABLE_MODELS = [
	"meta-llama/Llama-3.2-1B", # ✅ Works on free tier
	"oopere/pruned40-llama-3.2-1B", # ✅ Works on free tier
	]

	# Option 2: Use quantization (in backend)
	from transformers import AutoModel, BitsAndBytesConfig

	quantization_config = BitsAndBytesConfig(load_in_8bit=True)
	model = AutoModel.from_pretrained(
	model_name,
	quantization_config=quantization_config,
	low_cpu_mem_usage=True,
	)

	# Option 3: Upgrade HF Spaces tier
	# Free: 16GB RAM → PRO: 32GB RAM → Enterprise: 64GB RAM
	```

	## 📊 Model Recommendations by Tier

	### Free Tier (16GB RAM)
	✅ Recommended:
	- meta-llama/Llama-3.2-1B (~4 GB, ~30s load)
	- oopere/pruned40-llama-3.2-1B (~4 GB, ~30s load)
	- google/gemma-3-1b-pt (~4 GB, ~30s load)
	- Qwen/Qwen3-1.7B (~6 GB, ~45s load)

	⚠️ May work with optimization:
	- meta-llama/Llama-3.2-3B (~12 GB, ~90s load)

	❌ Won't work:
	- meta-llama/Llama-3-8B (~32 GB)
	- meta-llama/Llama-3-70B (~280 GB)

	### PRO Tier (32GB RAM)
	✅ Additional models:
	- meta-llama/Llama-3.2-3B
	- meta-llama/Llama-3-8B (with quantization)

	### Enterprise Tier (64GB RAM)
	✅ Additional models:
	- meta-llama/Llama-3-8B (full precision)
	- Larger models with quantization

	## 🎯 Common Scenarios

	### Scenario 1: "My model times out after 5 minutes"
	Diagnosis: TIMEOUT_ERROR

	Solution:
	1. Check if model is too large for your tier
	2. Increase timeout to 600s (10 minutes)
	3. Consider pre-loading models at startup

	### Scenario 2: "Process crashes without clear error"
	Diagnosis: Likely MEMORY_ERROR (Out-Of-Memory kills the process)

	Solution:
	1. Run `diagnostic_tool.py` to confirm
	2. Use smaller model (1B parameters)
	3. Use int8 quantization
	4. Upgrade to PRO tier

	### Scenario 3: "Sometimes works, sometimes doesn't"
	Diagnosis: Memory pressure or concurrent requests

	Solution:
	1. Implement model caching
	2. Add memory monitoring
	3. Use smaller default model

	## 🛠️ Advanced: Pre-loading Models

	To avoid timeout on first request, pre-load models at startup:

	```python
	# In hf-spaces/app.py
	from transformers import AutoModel, AutoTokenizer

	MODEL_CACHE = {}

	def preload_models():
	"""Pre-load common models at startup"""
	models = ["meta-llama/Llama-3.2-1B"]

	for model_name in models:
	try:
	print(f"Pre-loading {model_name}...")
	MODEL_CACHE[model_name] = {
	"model": AutoModel.from_pretrained(
	model_name,
	low_cpu_mem_usage=True
	),
	"tokenizer": AutoTokenizer.from_pretrained(model_name)
	}
	print(f"✓ {model_name} ready")
	except Exception as e:
	print(f"✗ Could not pre-load {model_name}: {e}")

	def main():
	preload_models() # Load models before starting services
	# ... rest of startup code
	```

	## 📞 Support

	If you still have issues after trying these solutions:

	1. Check the full diagnostic output
	2. Review HF Spaces logs
	3. Verify your HF Spaces tier and limits
	4. Consider using a different model architecture

	## 📝 Summary

	\| Issue \| Symptom \| Solution \|
	\|-------\|---------\|----------\|
	\| Timeout \| Request > 5 min \| Increase timeout, use cache \|
	\| Memory \| Process crashes/kills \| Smaller model, quantization, upgrade tier \|
	\| Both \| Slow + crashes \| Smaller model + longer timeout \|

	All tools are designed to help you quickly identify and fix the exact problem without guessing.