Spaces:

Godswill-IoT
/

at-engine

Sleeping

App Files Files Community

at-engine / README.md

Godswill-IoT

Upload 15 files

fb9f2a9 verified 3 months ago

preview code

raw

history blame contribute delete

6.03 kB

metadata

title: Avatar-based AI Tutor Engine
emoji: 🎭
colorFrom: purple
colorTo: red
sdk: docker
app_port: 7860

Avatar-based AI Tutor Engine

Overview

The Avatar-based AI Tutor Engine is a standalone intelligence service that creates personalized AI tutor clones from user-provided images and voice samples. The engine generates dynamic teaching videos with realistic facial movements, lip-sync, and natural expressions.

What This Engine Does

Input: Image + Voice Sample + Course Content
Output: Animated avatar video teaching the course content

Key Features

✅ Dynamic Avatar Generation: Creates talking head videos with facial animations
✅ Lip-Sync Accuracy: Precise synchronization between audio and mouth movements
✅ Natural Expressions: Includes blinking, head movements, and micro-expressions
✅ Personalized Teaching: Generates engaging teaching scripts from course content
✅ Voice Matching: Analyzes and replicates voice characteristics

Architecture

avatar-tutor-engine/
├── app/
│   ├── __init__.py       # Package initialization
│   ├── main.py           # FastAPI app + routing
│   ├── contracts.py      # EngineRequest / EngineResponse
│   ├── config.py         # Environment variables
│   ├── hf_client.py      # Hugging Face API client
│   └── engine.py         # Core intelligence logic
├── requirements.txt      # Python dependencies
└── .env.example          # Environment template

Setup

1. Install Dependencies

cd avatar-tutor-engine
pip install -r requirements.txt

2. Configure Environment

Copy .env.example to .env and add your Hugging Face token:

cp .env.example .env

Edit .env:

HF_TOKEN=your_actual_token_here

3. Start the Engine

python -m app.main

The engine will start on http://127.0.0.1:7860

API Usage

Endpoint

POST /run

Request Format

{
  "request_id": "unique-id",
  "engine": "avatar-tutor-engine",
  "action": "create_avatar_tutor",
  "actor": {
    "user_id": "user123",
    "session_id": "session456"
  },
  "input": {
    "text": "Course content to teach...",
    "items": [
      {
        "type": "image",
        "ref": "https://example.com/user-photo.jpg"
      },
      {
        "type": "audio",
        "ref": "https://example.com/voice-sample.mp3"
      }
    ]
  },
  "context": {},
  "options": {
    "lesson_duration": "5 minutes",
    "temperature": 0.7
  }
}

Response Format

{
  "request_id": "unique-id",
  "ok": true,
  "status": "success",
  "engine": "avatar-tutor-engine",
  "action": "create_avatar_tutor",
  "result": {
    "avatar_video_url": "https://...",
    "teaching_script": "Generated teaching content...",
    "voice_sample_transcription": "Transcribed voice...",
    "lesson_duration": "5 minutes",
    "avatar_features": {
      "facial_animations": true,
      "lip_sync": true,
      "natural_expressions": true,
      "head_movements": true
    }
  },
  "messages": [
    "Avatar tutor created successfully",
    "Generated 5 minutes teaching session"
  ],
  "suggested_actions": [
    "download_video",
    "generate_another_lesson"
  ]
}

Supported Action

create_avatar_tutor: Creates an animated avatar teaching video

Models Used

Purpose	Model	Description
Teaching Script	`meta-llama/Meta-Llama-3-8B-Instruct`	Generates engaging teaching content
Voice Transcription	`openai/whisper-base`	Analyzes voice characteristics
Text-to-Speech	`facebook/fastspeech2-en-ljspeech`	Synthesizes teaching audio
Avatar Animation	`vinthony/SadTalker`	Creates talking head with lip-sync

Options

lesson_duration: Target length (e.g., "5 minutes", "10 minutes")
temperature: LLM creativity (0.0-1.0, default: 0.7)
max_tokens: Maximum script length (default: 2048)

Error Handling

All errors return structured responses:

{
  "ok": false,
  "status": "error",
  "error": {
    "code": "ERROR_CODE",
    "detail": "Human-readable explanation"
  }
}

Common Error Codes

INVALID_ACTION: Unsupported action requested
MISSING_IMAGE: No image provided
MISSING_AUDIO: No voice sample provided
MISSING_CONTENT: No course content provided
ENGINE_ERROR: Processing failure

Limitations

Video Generation: Current implementation uses placeholder for talking head generation. Full integration requires HF Inference API support for video generation models.
Voice Cloning: TTS uses default voice. Advanced voice cloning requires additional models not yet available via free-tier HF Inference API.
Processing Time: Video generation can take 30-120 seconds depending on lesson length and model availability.
Free Tier Limits: Hugging Face free tier has rate limits. For production use, consider upgrading to Pro tier.
Model Availability: Some models (especially talking head generation) may require direct API access or specialized endpoints not available in standard HF Inference API.

Testing

Access Swagger UI at: http://127.0.0.1:7860/docs

See SWAGGER_TESTS.md for example payloads.

Health Check

curl http://127.0.0.1:7860/health

Design Principles

Stateless: No session storage, each request is independent
Standalone: No dependencies on other engines
Fail Gracefully: Structured errors, no crashes
Single Responsibility: Only creates avatar tutors, nothing else

Future Enhancements

Real-time voice cloning
Multiple avatar styles
Gesture and body language
Multi-language support
Custom teaching styles
Interactive avatar responses