Spaces:
Sleeping
Sleeping
FINAL SOLUTION: Working Free-Tier Models
β Problem Solved
The issue was that larger models (Llama, Qwen, Phi-3.5) are no longer available on the free Serverless Inference API. They return 410 Gone errors.
β Solution: Use Smaller, Stable Models
I've updated the engine to use smaller models that are guaranteed to work on the free tier:
Current Configuration
HF_TEXT_MODEL=google/flan-t5-base # 250M params - STABLE
HF_VISION_MODEL=nlpconnect/vit-gpt2-image-captioning # Image captioning - STABLE
HF_ASR_MODEL=openai/whisper-base # 74M params - STABLE
These models are:
- β Always available on free tier
- β Fast (small size = quick responses)
- β Reliable (no 410 Gone errors)
- β οΈ Lower quality than larger models (trade-off for free tier)
π How to Start the Server
Step 1: Activate Virtual Environment
cd "c:\Users\God's will\Desktop\AI INSTITUTE AFRICA\services\general-ai-engine"
.\venv\Scripts\Activate.ps1
Step 2: Start the Server
python -m app.main
Step 3: Test
Open http://localhost:8002/docs and use this payload:
{
"request_id": "req_test_001",
"engine": "general-ai-engine",
"action": "ask_question",
"actor": {
"user_id": "test_user",
"session_id": null
},
"input": {
"text": "What is AI?"
},
"context": {},
"options": {
"temperature": 0.7,
"max_tokens": 200
}
}
π Model Comparison
| Model | Size | Speed | Quality | Free Tier | Status |
|---|---|---|---|---|---|
| google/flan-t5-base | 250M | β‘β‘β‘β‘ | ββ | β | β WORKING |
| google/flan-t5-large | 780M | β‘β‘β‘ | βββ | β | β Alternative |
| distilgpt2 | 82M | β‘β‘β‘β‘β‘ | β | β | β Fastest |
| microsoft/Phi-3.5-mini-instruct | 3.8B | β‘β‘ | ββββ | β | β 410 Gone |
| Qwen/Qwen2.5-Coder-32B-Instruct | 32B | β‘ | βββββ | β | β 410 Gone |
π Alternative Free Models
If you want to try other models, edit your .env file:
Text Generation
# Smaller, faster (but lower quality)
HF_TEXT_MODEL=distilgpt2
# Better quality (but slower)
HF_TEXT_MODEL=google/flan-t5-large
# Current default (best balance)
HF_TEXT_MODEL=google/flan-t5-base
Vision
# Current default
HF_VISION_MODEL=nlpconnect/vit-gpt2-image-captioning
# Alternative
HF_VISION_MODEL=Salesforce/blip-image-captioning-base
Audio
# Faster (current)
HF_ASR_MODEL=openai/whisper-base
# Better quality (slower)
HF_ASR_MODEL=openai/whisper-medium
β οΈ Important Notes
Why Smaller Models?
- Free tier restrictions: HF has limited larger models on free tier
- Reliability: Smaller models are always available
- Speed: Faster responses, less cold start time
- No 410 errors: These models won't disappear
Quality Trade-off
- Smaller models = Lower quality responses
- Larger models = Not available on free tier (410 Gone)
- Solution: Use smaller models for development, upgrade to PRO ($9/month) for production
Upgrading for Better Quality
If you need better quality:
- HF PRO Account ($9/month)
- Access to larger models
- Higher rate limits
- Faster inference
- Dedicated Endpoints (starting at $0.03/hour)
- Use any model
- No cold starts
- Production-ready
π― Expected Behavior
First Request
- β±οΈ 10-20 seconds (cold start - model loading)
- β Returns valid response
Subsequent Requests
- β±οΈ 1-3 seconds (model is warm)
- β Fast responses
Response Quality
- β Functional: Answers questions correctly
- β οΈ Simple: Not as sophisticated as larger models
- β Reliable: No 410 errors
π§ Troubleshooting
If you get 410 Gone:
- Model is not available on free tier
- Switch to one of the models listed above
If you get 503 Service Unavailable:
- Model is loading (cold start)
- Wait 10-20 seconds and try again
If you get 429 Too Many Requests:
- You've hit the rate limit (~1000 requests/day)
- Wait a few hours or upgrade to PRO
If server won't start:
- Make sure virtual environment is activated
- Check that port 8002 is not in use
β Summary
Current Setup:
- β
Using
google/flan-t5-base(250M params) - β Free tier compatible
- β No 410 Gone errors
- β Fast and reliable
- β οΈ Lower quality than larger models
To Start:
- Activate venv:
.\venv\Scripts\Activate.ps1 - Run server:
python -m app.main - Test at: http://localhost:8002/docs
This configuration will work reliably on the free tier! π