genai-engine / FINAL_SOLUTION.md
Godswill-IoT's picture
Upload 27 files
65b22a4 verified

FINAL SOLUTION: Working Free-Tier Models

βœ… Problem Solved

The issue was that larger models (Llama, Qwen, Phi-3.5) are no longer available on the free Serverless Inference API. They return 410 Gone errors.

βœ… Solution: Use Smaller, Stable Models

I've updated the engine to use smaller models that are guaranteed to work on the free tier:

Current Configuration

HF_TEXT_MODEL=google/flan-t5-base           # 250M params - STABLE
HF_VISION_MODEL=nlpconnect/vit-gpt2-image-captioning  # Image captioning - STABLE
HF_ASR_MODEL=openai/whisper-base            # 74M params - STABLE

These models are:

  • βœ… Always available on free tier
  • βœ… Fast (small size = quick responses)
  • βœ… Reliable (no 410 Gone errors)
  • ⚠️ Lower quality than larger models (trade-off for free tier)

πŸš€ How to Start the Server

Step 1: Activate Virtual Environment

cd "c:\Users\God's will\Desktop\AI INSTITUTE AFRICA\services\general-ai-engine"
.\venv\Scripts\Activate.ps1

Step 2: Start the Server

python -m app.main

Step 3: Test

Open http://localhost:8002/docs and use this payload:

{
  "request_id": "req_test_001",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "actor": {
    "user_id": "test_user",
    "session_id": null
  },
  "input": {
    "text": "What is AI?"
  },
  "context": {},
  "options": {
    "temperature": 0.7,
    "max_tokens": 200
  }
}

πŸ“Š Model Comparison

Model Size Speed Quality Free Tier Status
google/flan-t5-base 250M ⚑⚑⚑⚑ ⭐⭐ βœ… βœ… WORKING
google/flan-t5-large 780M ⚑⚑⚑ ⭐⭐⭐ βœ… βœ… Alternative
distilgpt2 82M ⚑⚑⚑⚑⚑ ⭐ βœ… βœ… Fastest
microsoft/Phi-3.5-mini-instruct 3.8B ⚑⚑ ⭐⭐⭐⭐ ❌ ❌ 410 Gone
Qwen/Qwen2.5-Coder-32B-Instruct 32B ⚑ ⭐⭐⭐⭐⭐ ❌ ❌ 410 Gone

πŸ”„ Alternative Free Models

If you want to try other models, edit your .env file:

Text Generation

# Smaller, faster (but lower quality)
HF_TEXT_MODEL=distilgpt2

# Better quality (but slower)
HF_TEXT_MODEL=google/flan-t5-large

# Current default (best balance)
HF_TEXT_MODEL=google/flan-t5-base

Vision

# Current default
HF_VISION_MODEL=nlpconnect/vit-gpt2-image-captioning

# Alternative
HF_VISION_MODEL=Salesforce/blip-image-captioning-base

Audio

# Faster (current)
HF_ASR_MODEL=openai/whisper-base

# Better quality (slower)
HF_ASR_MODEL=openai/whisper-medium

⚠️ Important Notes

Why Smaller Models?

  1. Free tier restrictions: HF has limited larger models on free tier
  2. Reliability: Smaller models are always available
  3. Speed: Faster responses, less cold start time
  4. No 410 errors: These models won't disappear

Quality Trade-off

  • Smaller models = Lower quality responses
  • Larger models = Not available on free tier (410 Gone)
  • Solution: Use smaller models for development, upgrade to PRO ($9/month) for production

Upgrading for Better Quality

If you need better quality:

  1. HF PRO Account ($9/month)
    • Access to larger models
    • Higher rate limits
    • Faster inference
  2. Dedicated Endpoints (starting at $0.03/hour)
    • Use any model
    • No cold starts
    • Production-ready

🎯 Expected Behavior

First Request

  • ⏱️ 10-20 seconds (cold start - model loading)
  • βœ… Returns valid response

Subsequent Requests

  • ⏱️ 1-3 seconds (model is warm)
  • βœ… Fast responses

Response Quality

  • βœ… Functional: Answers questions correctly
  • ⚠️ Simple: Not as sophisticated as larger models
  • βœ… Reliable: No 410 errors

πŸ”§ Troubleshooting

If you get 410 Gone:

  • Model is not available on free tier
  • Switch to one of the models listed above

If you get 503 Service Unavailable:

  • Model is loading (cold start)
  • Wait 10-20 seconds and try again

If you get 429 Too Many Requests:

  • You've hit the rate limit (~1000 requests/day)
  • Wait a few hours or upgrade to PRO

If server won't start:

  • Make sure virtual environment is activated
  • Check that port 8002 is not in use

βœ… Summary

Current Setup:

  • βœ… Using google/flan-t5-base (250M params)
  • βœ… Free tier compatible
  • βœ… No 410 Gone errors
  • βœ… Fast and reliable
  • ⚠️ Lower quality than larger models

To Start:

  1. Activate venv: .\venv\Scripts\Activate.ps1
  2. Run server: python -m app.main
  3. Test at: http://localhost:8002/docs

This configuration will work reliably on the free tier! πŸŽ‰