Spaces:

Godswill-IoT
/

genai-engine

Sleeping

App Files Files Community

genai-engine / FINAL_SOLUTION.md

Godswill-IoT

Upload 27 files

65b22a4 verified 3 months ago

preview code

raw

history blame contribute delete

4.92 kB

FINAL SOLUTION: Working Free-Tier Models

✅ Problem Solved

The issue was that larger models (Llama, Qwen, Phi-3.5) are no longer available on the free Serverless Inference API. They return 410 Gone errors.

✅ Solution: Use Smaller, Stable Models

I've updated the engine to use smaller models that are guaranteed to work on the free tier:

Current Configuration

HF_TEXT_MODEL=google/flan-t5-base           # 250M params - STABLE
HF_VISION_MODEL=nlpconnect/vit-gpt2-image-captioning  # Image captioning - STABLE
HF_ASR_MODEL=openai/whisper-base            # 74M params - STABLE

These models are:

✅ Always available on free tier
✅ Fast (small size = quick responses)
✅ Reliable (no 410 Gone errors)
⚠️ Lower quality than larger models (trade-off for free tier)

🚀 How to Start the Server

Step 1: Activate Virtual Environment

cd "c:\Users\God's will\Desktop\AI INSTITUTE AFRICA\services\general-ai-engine"
.\venv\Scripts\Activate.ps1

Step 2: Start the Server

python -m app.main

Step 3: Test

Open http://localhost:8002/docs and use this payload:

{
  "request_id": "req_test_001",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "actor": {
    "user_id": "test_user",
    "session_id": null
  },
  "input": {
    "text": "What is AI?"
  },
  "context": {},
  "options": {
    "temperature": 0.7,
    "max_tokens": 200
  }
}

📊 Model Comparison

Model	Size	Speed	Quality	Free Tier	Status
google/flan-t5-base	250M	⚡⚡⚡⚡	⭐⭐	✅	✅ WORKING
google/flan-t5-large	780M	⚡⚡⚡	⭐⭐⭐	✅	✅ Alternative
distilgpt2	82M	⚡⚡⚡⚡⚡	⭐	✅	✅ Fastest
microsoft/Phi-3.5-mini-instruct	3.8B	⚡⚡	⭐⭐⭐⭐	❌	❌ 410 Gone
Qwen/Qwen2.5-Coder-32B-Instruct	32B	⚡	⭐⭐⭐⭐⭐	❌	❌ 410 Gone

🔄 Alternative Free Models

If you want to try other models, edit your .env file:

Text Generation

# Smaller, faster (but lower quality)
HF_TEXT_MODEL=distilgpt2

# Better quality (but slower)
HF_TEXT_MODEL=google/flan-t5-large

# Current default (best balance)
HF_TEXT_MODEL=google/flan-t5-base

Vision

# Current default
HF_VISION_MODEL=nlpconnect/vit-gpt2-image-captioning

# Alternative
HF_VISION_MODEL=Salesforce/blip-image-captioning-base

Audio

# Faster (current)
HF_ASR_MODEL=openai/whisper-base

# Better quality (slower)
HF_ASR_MODEL=openai/whisper-medium

⚠️ Important Notes

Why Smaller Models?

Free tier restrictions: HF has limited larger models on free tier
Reliability: Smaller models are always available
Speed: Faster responses, less cold start time
No 410 errors: These models won't disappear

Quality Trade-off

Smaller models = Lower quality responses
Larger models = Not available on free tier (410 Gone)
Solution: Use smaller models for development, upgrade to PRO ($9/month) for production

Upgrading for Better Quality

If you need better quality:

HF PRO Account ($9/month)
- Access to larger models
- Higher rate limits
- Faster inference
Dedicated Endpoints (starting at $0.03/hour)
- Use any model
- No cold starts
- Production-ready

🎯 Expected Behavior

First Request

⏱️ 10-20 seconds (cold start - model loading)
✅ Returns valid response

Subsequent Requests

⏱️ 1-3 seconds (model is warm)
✅ Fast responses

Response Quality

✅ Functional: Answers questions correctly
⚠️ Simple: Not as sophisticated as larger models
✅ Reliable: No 410 errors

🔧 Troubleshooting

If you get 410 Gone:

Model is not available on free tier
Switch to one of the models listed above

If you get 503 Service Unavailable:

Model is loading (cold start)
Wait 10-20 seconds and try again

If you get 429 Too Many Requests:

You've hit the rate limit (~1000 requests/day)
Wait a few hours or upgrade to PRO

If server won't start:

Make sure virtual environment is activated
Check that port 8002 is not in use

✅ Summary

Current Setup:

✅ Using google/flan-t5-base (250M params)
✅ Free tier compatible
✅ No 410 Gone errors
✅ Fast and reliable
⚠️ Lower quality than larger models

To Start:

Activate venv: .\venv\Scripts\Activate.ps1
Run server: python -m app.main
Test at: http://localhost:8002/docs

This configuration will work reliably on the free tier! 🎉