Spaces:

Godswill-IoT
/

genai-engine

Sleeping

App Files Files Community

genai-engine / README.md

Godswill-IoT

Upload 27 files

65b22a4 verified 3 months ago

preview code

raw

history blame contribute delete

6.81 kB

metadata

title: General AI Engine
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860

General AI Engine

Overview

The General AI Engine is a pure intelligence service designed for open-ended question answering and multi-modal interaction. It uses various Hugging Face models to process text, images, and audio, providing a unified "ask anything" interface.

What This Engine Does

Input: Text, Image, Audio, or Video
Output: Intelligent natural language responses

Key Features

✅ Multi-modal Chat: Unified interface for text, image, and audio interaction.
✅ Dynamic Model Routing: Automatically selects appropriate models based on input modality.
✅ Conversation History: Supports multi-turn dialogue when provided in context.
✅ Audio Support: Transcribes spoken questions automatically.
✅ Vision Support: Understands and describes image/video content.

Architecture

This is a standalone intelligence engine - NOT a chatbot, NOT a UI, NOT orchestration. It is callable by an AI Mentor like other engine services.

general-ai-engine/
├── app/
│   ├── __init__.py       # Package initialization
│   ├── main.py           # FastAPI app + routing
│   ├── contracts.py      # EngineRequest / EngineResponse
│   ├── config.py         # Environment variables
│   ├── hf_client.py      # Hugging Face API client
│   └── engine.py         # Core intelligence logic
├── requirements.txt      # Python dependencies
└── .env.example          # Environment template

Setup

1. Install Dependencies

cd general-ai-engine
pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env with your HF_TOKEN

3. Start the Engine

python -m app.main

The engine will start on http://127.0.0.1:7860

API

Single Entrypoint: `POST /run`

Text-Only Request:

{
  "request_id": "req_123",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "actor": {
    "user_id": "user_456",
    "session_id": "session_789"
  },
  "input": {
    "text": "What is quantum computing?"
  },
  "context": {},
  "options": {
    "temperature": 0.7,
    "max_tokens": 2048
  }
}

Response:

{
  "request_id": "req_123",
  "ok": true,
  "status": "success",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "result": {
    "answer": "Quantum computing is...",
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "question": "What is quantum computing?",
    "modalities": ["text"]
  },
  "messages": ["Generated response using meta-llama/Llama-3.3-70B-Instruct"],
  "suggested_actions": ["ask_followup", "clarify", "explore_topic"],
  "citations": []
}

Image Understanding Request:

{
  "request_id": "req_124",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "actor": {
    "user_id": "user_456",
    "session_id": "session_789"
  },
  "input": {
    "text": "What's in this image?",
    "items": [
      {
        "type": "image",
        "text": "",
        "ref": "https://example.com/image.jpg"
      }
    ]
  },
  "context": {},
  "options": {}
}

Response:

{
  "request_id": "req_124",
  "ok": true,
  "status": "success",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "result": {
    "answer": "The image shows...",
    "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
    "question": "What's in this image?",
    "modalities": ["image"]
  },
  "messages": ["Generated response using meta-llama/Llama-3.2-11B-Vision-Instruct"],
  "suggested_actions": ["ask_followup", "clarify", "explore_topic"]
}

Audio Transcription + Question:

{
  "request_id": "req_125",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "actor": {
    "user_id": "user_456",
    "session_id": "session_789"
  },
  "input": {
    "text": "Summarize what was said",
    "items": [
      {
        "type": "audio",
        "text": "",
        "ref": "https://example.com/audio.mp3"
      }
    ]
  },
  "context": {},
  "options": {}
}

Response:

{
  "request_id": "req_125",
  "ok": true,
  "status": "success",
  "engine": "general-ai-engine",
  "action": "ask_question",
  "result": {
    "answer": "The audio discusses...",
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "question": "Summarize what was said\n\n[Audio transcription]: Hello, this is a test...",
    "modalities": ["audio"],
    "audio_transcription": "Hello, this is a test..."
  },
  "messages": ["Generated response using meta-llama/Llama-3.3-70B-Instruct"],
  "suggested_actions": ["ask_followup", "clarify", "explore_topic"]
}

Supported Actions

ask_question - Answer a single question
chat - Conversational interaction (supports context.conversation_history)

Configuration

All configuration via environment variables:

HF_TOKEN - Hugging Face API token (required - get free token at hf.co/settings/tokens)
HF_TEXT_MODEL - Text model (default: google/flan-t5-base - 250M params, stable on free tier)
HF_VISION_MODEL - Vision model (default: nlpconnect/vit-gpt2-image-captioning)
HF_ASR_MODEL - Audio model (default: openai/whisper-base)
HOST - Server host (default: 127.0.0.1)
PORT - Server port (default: 8002)

Error Handling

All errors return structured responses:

{
  "ok": false,
  "status": "error",
  "error": {
    "code": "ENGINE_ERROR",
    "detail": "Human-readable explanation"
  }
}

No stack traces are exposed to clients.

Testing

Access Swagger UI at: http://localhost:8000/docs

Known Limitations

Free Tier Limits - Uses HF Serverless Inference API with rate limits (~1000 requests/day for free users)
Stateless - No conversation memory; context must be provided in each request
Model per modality - Uses different models for text/vision/audio (not a unified multimodal model)
No streaming - Returns complete responses only
Cold starts - First request to a model may take 10-30 seconds (model loading)
Timeout - 60-second timeout on HF API calls
Audio format - Audio must be accessible via URL or base64-encoded
Video processing - Videos treated as images (single frame analysis, not full video understanding)
No retry logic - Single API call attempt; failures return immediately
No caching - Every request hits HF API (no response caching)

General AI Engine

Overview

What This Engine Does

Key Features

Architecture

Setup

1. Install Dependencies

2. Configure Environment

3. Start the Engine

API

Single Entrypoint: POST /run

Text-Only Request:

Image Understanding Request:

Audio Transcription + Question:

Supported Actions

Configuration

Error Handling

Testing

Known Limitations

Single Entrypoint: `POST /run`