Spaces:
Sleeping
Sleeping
metadata
title: General AI Engine
emoji: π§
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
General AI Engine
Overview
The General AI Engine is a pure intelligence service designed for open-ended question answering and multi-modal interaction. It uses various Hugging Face models to process text, images, and audio, providing a unified "ask anything" interface.
What This Engine Does
Input: Text, Image, Audio, or Video
Output: Intelligent natural language responses
Key Features
- β Multi-modal Chat: Unified interface for text, image, and audio interaction.
- β Dynamic Model Routing: Automatically selects appropriate models based on input modality.
- β Conversation History: Supports multi-turn dialogue when provided in context.
- β Audio Support: Transcribes spoken questions automatically.
- β Vision Support: Understands and describes image/video content.
Architecture
This is a standalone intelligence engine - NOT a chatbot, NOT a UI, NOT orchestration. It is callable by an AI Mentor like other engine services.
general-ai-engine/
βββ app/
β βββ __init__.py # Package initialization
β βββ main.py # FastAPI app + routing
β βββ contracts.py # EngineRequest / EngineResponse
β βββ config.py # Environment variables
β βββ hf_client.py # Hugging Face API client
β βββ engine.py # Core intelligence logic
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
Setup
1. Install Dependencies
cd general-ai-engine
pip install -r requirements.txt
2. Configure Environment
cp .env.example .env
# Edit .env with your HF_TOKEN
3. Start the Engine
python -m app.main
The engine will start on http://127.0.0.1:7860
API
Single Entrypoint: POST /run
Text-Only Request:
{
"request_id": "req_123",
"engine": "general-ai-engine",
"action": "ask_question",
"actor": {
"user_id": "user_456",
"session_id": "session_789"
},
"input": {
"text": "What is quantum computing?"
},
"context": {},
"options": {
"temperature": 0.7,
"max_tokens": 2048
}
}
Response:
{
"request_id": "req_123",
"ok": true,
"status": "success",
"engine": "general-ai-engine",
"action": "ask_question",
"result": {
"answer": "Quantum computing is...",
"model": "meta-llama/Llama-3.3-70B-Instruct",
"question": "What is quantum computing?",
"modalities": ["text"]
},
"messages": ["Generated response using meta-llama/Llama-3.3-70B-Instruct"],
"suggested_actions": ["ask_followup", "clarify", "explore_topic"],
"citations": []
}
Image Understanding Request:
{
"request_id": "req_124",
"engine": "general-ai-engine",
"action": "ask_question",
"actor": {
"user_id": "user_456",
"session_id": "session_789"
},
"input": {
"text": "What's in this image?",
"items": [
{
"type": "image",
"text": "",
"ref": "https://example.com/image.jpg"
}
]
},
"context": {},
"options": {}
}
Response:
{
"request_id": "req_124",
"ok": true,
"status": "success",
"engine": "general-ai-engine",
"action": "ask_question",
"result": {
"answer": "The image shows...",
"model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
"question": "What's in this image?",
"modalities": ["image"]
},
"messages": ["Generated response using meta-llama/Llama-3.2-11B-Vision-Instruct"],
"suggested_actions": ["ask_followup", "clarify", "explore_topic"]
}
Audio Transcription + Question:
{
"request_id": "req_125",
"engine": "general-ai-engine",
"action": "ask_question",
"actor": {
"user_id": "user_456",
"session_id": "session_789"
},
"input": {
"text": "Summarize what was said",
"items": [
{
"type": "audio",
"text": "",
"ref": "https://example.com/audio.mp3"
}
]
},
"context": {},
"options": {}
}
Response:
{
"request_id": "req_125",
"ok": true,
"status": "success",
"engine": "general-ai-engine",
"action": "ask_question",
"result": {
"answer": "The audio discusses...",
"model": "meta-llama/Llama-3.3-70B-Instruct",
"question": "Summarize what was said\n\n[Audio transcription]: Hello, this is a test...",
"modalities": ["audio"],
"audio_transcription": "Hello, this is a test..."
},
"messages": ["Generated response using meta-llama/Llama-3.3-70B-Instruct"],
"suggested_actions": ["ask_followup", "clarify", "explore_topic"]
}
Supported Actions
ask_question- Answer a single questionchat- Conversational interaction (supports context.conversation_history)
Configuration
All configuration via environment variables:
HF_TOKEN- Hugging Face API token (required - get free token at hf.co/settings/tokens)HF_TEXT_MODEL- Text model (default: google/flan-t5-base - 250M params, stable on free tier)HF_VISION_MODEL- Vision model (default: nlpconnect/vit-gpt2-image-captioning)HF_ASR_MODEL- Audio model (default: openai/whisper-base)HOST- Server host (default: 127.0.0.1)PORT- Server port (default: 8002)
Error Handling
All errors return structured responses:
{
"ok": false,
"status": "error",
"error": {
"code": "ENGINE_ERROR",
"detail": "Human-readable explanation"
}
}
No stack traces are exposed to clients.
Testing
Access Swagger UI at: http://localhost:8000/docs
Known Limitations
- Free Tier Limits - Uses HF Serverless Inference API with rate limits (~1000 requests/day for free users)
- Stateless - No conversation memory; context must be provided in each request
- Model per modality - Uses different models for text/vision/audio (not a unified multimodal model)
- No streaming - Returns complete responses only
- Cold starts - First request to a model may take 10-30 seconds (model loading)
- Timeout - 60-second timeout on HF API calls
- Audio format - Audio must be accessible via URL or base64-encoded
- Video processing - Videos treated as images (single frame analysis, not full video understanding)
- No retry logic - Single API call attempt; failures return immediately
- No caching - Every request hits HF API (no response caching)