Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.12.0
title: Language_Tutor
app_file: language_tutor.py
sdk: gradio
sdk_version: 5.49.1
๐ Language Tutor - AI-Powered Multilingual Learning
An AI-powered conversational chatbot designed to help users practice speaking foreign languages in an interactive, adaptive environment. Powered by the Apertus-70B-Instruct-2509 model, which is trained on 1000+ languages, this application provides immersive language learning with voice support.
๐ What's New
- Auto-Transcription: Voice recordings automatically transcribe when you stop - no extra button needed!
- Translation Control: Toggle native language hints on/off for immersive or assisted learning
- German Default: Application now defaults to German language learning
- Streamlined UI: Organized settings (Language โ Voice โ Parameters) for better user flow
โจ Features
- ๐ฃ๏ธ 30+ Supported Languages - Practice any of 30 major world languages
- ๐ฏ Personalized Learning - Select your native language and target language
- ๐ค AI Tutor - Powered by Apertus-70B, trained on 1000+ languages
- ๐ค Auto-Transcription - Voice input automatically transcribes when you stop recording
- ๐ Voice Output (TTS) - Hear responses with native pronunciation
- ๐ Multilingual Voices - Automatic voice matching for target language
- ๐ Smart Prompts - Auto-generated teaching prompts based on language pair
- ๐ก Adaptive Teaching - Provides translations, corrections, and cultural context
- ๐ Translation Control - Toggle native language hints on/off for immersive learning
๐ Table of Contents
- Supported Languages
- Architecture
- Requirements
- Installation
- Configuration
- Usage
- Code Structure
- API Providers
- Troubleshooting
- Contributing
๐ Supported Languages
The application supports 30 languages with native TTS voices:
| Language | Code | TTS Voices Available |
|---|---|---|
| English | en | 6 voices (US, GB, AU) |
| Spanish | es | 5 voices (ES, MX, AR) |
| French | fr | 5 voices (FR, CA, BE) |
| German | de | 4 voices (DE, AT, CH) |
| Italian | it | 3 voices |
| Portuguese | pt | 4 voices (BR, PT) |
| Dutch | nl | 3 voices (NL, BE) |
| Russian | ru | 2 voices |
| Chinese (Mandarin) | zh | 4 voices (CN, TW, HK) |
| Japanese | ja | 2 voices |
| Korean | ko | 2 voices |
| Arabic | ar | 3 voices (SA, EG) |
| Hindi | hi | 2 voices |
| Turkish | tr | 2 voices |
| Polish | pl | 2 voices |
| Swedish | sv | 2 voices |
| Danish | da | 2 voices |
| Norwegian | no | 2 voices |
| Finnish | fi | 2 voices |
| Greek | el | 2 voices |
| Czech | cs | 2 voices |
| Romanian | ro | 2 voices |
| Hungarian | hu | 2 voices |
| Thai | th | 2 voices |
| Vietnamese | vi | 2 voices |
| Indonesian | id | 2 voices |
| Malay | ms | 2 voices |
| Filipino | fil | 2 voices |
| Hebrew | he | 2 voices |
| Ukrainian | uk | 2 voices |
๐๏ธ Architecture
The application consists of three main components:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Gradio Web Interface (UI) โ
โ - Language Selection โ
โ - Chat Interface โ
โ - Voice Controls โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Language Tutor Core (language_tutor.py) โ
โ - Message Formatting โ
โ - System Prompt Generation โ
โ - Hugging Face API Integration โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Voice Handler (voice_handler.py) โ
โ - STT Providers (Whisper, Local) โ
โ - TTS Providers (Edge-TTS, OpenAI, gTTS) โ
โ - Language/Voice Mapping โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฆ Requirements
System Requirements
- UV - Fast Python package manager (installation instructions below)
- Python 3.8 or higher (UV will manage this)
- Internet connection (for Hugging Face API and cloud TTS/STT)
- Microphone (optional, for voice input)
- Speakers/Headphones (optional, for voice output)
Python Dependencies
gradio>=4.0.0
huggingface_hub>=0.19.0
python-dotenv>=1.0.0
openai>=1.0.0
edge-tts>=6.1.0
openai-whisper>=20231117
gtts>=2.5.0
๐ Installation
Prerequisites
This project uses UV - an extremely fast Python package manager and project manager written in Rust.
Why UV?
- โก 10-100x faster than pip
- ๐ Automatic virtual environment management
- ๐ฆ Reproducible builds with
uv.lock - ๐ฏ Compatible with pip and PyPI
Install UV:
# macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or via pip (if you already have Python):
pip install uv
# Verify installation:
uv --version
Step 1: Clone or Download the Repository
cd /path/to/language_tutor
Step 2: Set Up Environment with UV
UV will automatically create a virtual environment and install all dependencies:
# Install dependencies from requirements.txt
uv pip install -r requirements.txt
# Or if using pyproject.toml (recommended):
uv sync
Step 3: Set Up Environment Variables
Create a .env file in the project root:
# Hugging Face API Token (required)
HF_TOKEN=your_huggingface_token_here
# OpenAI API Key (optional, only if using OpenAI providers)
OPENAI_API_KEY=your_openai_api_key_here
Getting a Hugging Face Token:
- Visit https://huggingface.co/settings/tokens
- Create a new token with "Read" access
- Copy the token to your
.envfile
Or use the CLI:
huggingface-cli login
โ๏ธ Configuration
Voice Provider Options
Speech-to-Text (STT) Providers
OpenAI Whisper API (Recommended)
- Cost: $0.006 per minute
- Quality: High
- Requires: OpenAI API key
Local Whisper (Tiny)
- Cost: Free
- Quality: Good for simple conversations
- Requires: Local processing
Local Whisper (Base)
- Cost: Free
- Quality: Better, but slower
- Requires: Local processing
Text-to-Speech (TTS) Providers
Edge-TTS (Recommended)
- Cost: Free
- Quality: High
- Languages: 30+ with native voices
- Requires: Internet connection
OpenAI TTS
- Cost: $0.015 per 1K characters
- Quality: Very High
- Languages: Limited (English-focused)
- Requires: OpenAI API key
gTTS (Google TTS)
- Cost: Free
- Quality: Basic
- Languages: 30+ supported
- Requires: Internet connection
Translation Control Feature
The "Enable Native Language Hints" checkbox controls how the AI tutor communicates with you:
Enabled (Default - Recommended for Beginners)
- AI provides translations in parentheses:
"Guten Morgen! (Good morning!)" - Offers explanations when you seem confused
- Helps you understand new vocabulary and grammar
- Builds confidence while learning
Disabled (Full Immersion Mode)
- AI responds entirely in target language
- No automatic translations or hints
- Forces you to think in the target language
- Only translates when you explicitly ask
- Ideal for intermediate/advanced learners
Customizing System Prompts
The system prompt is automatically generated based on:
- Your native language selection
- Your target language selection (defaults to German)
- Translation hints preference (enabled/disabled)
The prompt updates automatically when you change any of these settings. Advanced users can manually edit the system prompt (currently hidden in UI, but functional in backend).
๐ฎ Usage
Starting the Application
# Using UV (recommended):
uv run python language_tutor.py
# Or if you've activated the virtual environment:
python language_tutor.py
The application will launch a web interface at http://127.0.0.1:7860
Basic Workflow
Configure Language Settings
- Choose your native language (for explanations)
- Choose the language you want to practice (defaults to German)
- Toggle "Enable Native Language Hints" for translations in parentheses
- The system prompt auto-generates based on your selections
Configure Voice Settings (Optional)
- Enable Voice Input (STT) - recording auto-transcribes when you stop
- Enable Voice Output (TTS) to hear responses
- Select preferred voice provider and voice style
- Voice automatically matches your target language
Start Learning
- Type your message or click the microphone to record
- Voice recordings auto-transcribe when you stop (no extra button needed)
- The AI tutor responds primarily in the target language
- Receive corrections, explanations, and cultural context
Customize Your Experience
- Toggle native language hints for full immersion or beginner support
- Adjust generation parameters (temperature, max tokens)
- Change voice providers/styles for different accents
Example Conversations
Example 1: English โ German (With Native Language Hints)
You: "Hello! I want to practice ordering food at a restaurant."
AI Tutor: "Hallo! Perfekt, lass uns das รผben. (Hello! Perfect, let's practice.)
Stell dir vor, du bist in einem deutschen Restaurant. Ich bin der Kellner. (Imagine you're in a German restaurant. I'm the waiter.)
Was mรถchtest du bestellen? (What would you like to order?)"
You: "Ich mรถchte eine Pizza, bitte."
AI Tutor: "Sehr gut! Dein Satz ist korrekt. (Very good! Your sentence is correct.)
In Deutschland wรผrde man auch sagen: 'Ich hรคtte gerne eine Pizza' (I would like a pizza - more polite).
Und zum Trinken? Was mรถchtest du trinken? (And to drink? What would you like to drink?)"
Example 2: English โ German (Full Immersion Mode - Hints Disabled)
You: "Hallo! Wie geht es dir?"
AI Tutor: "Hallo! Mir geht es sehr gut, danke! Und dir?
Mรถchtest du heute รผber ein bestimmtes Thema sprechen? Vielleicht Reisen, Essen, oder Arbeit?"
You: "Can you translate that?"
AI Tutor: "Of course! I said: Hello! I'm doing very well, thanks! And you? Would you like to talk about a specific topic today? Maybe travel, food, or work?"
๐ Code Structure
Main Files
language_tutor.py
Main application file containing the Gradio interface and core logic.
Key Functions:
create_language_tutor_prompt(native_language, target_language, enable_translations)- Generates intelligent system prompts with optional translation hintsformat_messages(message, chat_history, system_prompt)- Formats conversation for the LLMtranscribe_audio(audio_path, stt_provider_name)- Converts speech to text (auto-triggered on recording stop)synthesize_speech(text, tts_provider_name, tts_voice, target_language)- Converts text to speech with language awarenesschat(...)- Main chat function that handles LLM inferenceupdate_voice_dropdown(tts_provider_name, target_language)- Dynamically updates voice optionsprocess_voice_input(audio, stt_provider_name)- Processes and transcribes voice input automatically
UI Organization:
- Language Settings - Native language, target language (default: German), translation hints toggle
- Voice Settings - STT/TTS providers with auto-transcription and language-matched voices
- Generation Parameters - Advanced AI tuning options (temperature, tokens, top-p)
voice_handler.py
Voice processing module with STT/TTS provider implementations.
Key Classes:
VoiceConfig- Configuration for languages, voices, and providersSTTProvider(Abstract) - Base class for speech-to-text providersTTSProvider(Abstract) - Base class for text-to-speech providersOpenAIWhisperSTT- OpenAI Whisper API implementationLocalWhisperSTT- Local Whisper model implementationEdgeTTSProvider- Microsoft Edge TTS (free, multilingual)OpenAITTSProvider- OpenAI TTS implementationGTTSProvider- Google TTS implementation
Key Functions:
get_available_languages()- Returns list of 30 supported languagesget_language_code(language_name)- Converts language name to ISO codeget_voices_for_provider(provider_name, language)- Gets available voices for a languageget_default_voice_for_language(language_name, provider_name)- Gets default voice for languagecreate_stt_provider(provider_name)- Factory for STT providerscreate_tts_provider(provider_name, voice, language)- Factory for TTS providers
Configuration Files
requirements.txt- Python dependencies for pip/uv compatibilitypyproject.toml- Project metadata and dependencies (UV format)uv.lock- Dependency lock file (managed by UV, ensures reproducible builds).env- Environment variables (API keys, not committed to git)
๐ API Providers
Hugging Face Inference API
The application uses the Hugging Face Inference API to access the Apertus-70B-Instruct-2509 model.
Model Info:
- Name: swiss-ai/Apertus-70B-Instruct-2509
- Size: 70B parameters
- Training: 1000+ languages
- Context: Supports long conversations
- Hosting: Hugging Face Inference Endpoints
Rate Limits:
- Free tier: Limited requests per hour
- Pro tier: Higher rate limits available
Voice Provider APIs
- Edge-TTS: Free, no API key required
- OpenAI: Requires API key, pay-per-use
- gTTS: Free, no API key required
- Whisper API: Requires OpenAI API key
- Local Whisper: No API key, runs locally
๐ Troubleshooting
Common Issues
Issue: "UV not found" or "Command not found: uv"
# Install UV first:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or via pip:
pip install uv
# Verify installation:
uv --version
Issue: "Hugging Face token not found"
# Solution 1: Set in .env file
echo "HF_TOKEN=your_token_here" > .env
# Solution 2: Use CLI login
huggingface-cli login
Issue: "OpenAI API key required"
# Only needed if using OpenAI providers
echo "OPENAI_API_KEY=your_key_here" >> .env
Issue: "Model loading failed" or "Rate limit exceeded"
- Wait a few minutes and try again
- Check your Hugging Face token is valid
- Consider using a Pro account for higher limits
Issue: "Voice not working"
- Check internet connection (Edge-TTS and gTTS require internet)
- Verify microphone permissions for voice input
- Try switching to a different TTS provider
Issue: "Local Whisper is slow"
- Local Whisper models run on your CPU/GPU
- Use "Tiny" model for faster performance
- Consider using OpenAI Whisper API for cloud-based processing
Debug Mode
Enable debug output:
# Add to the top of language_tutor.py
import logging
logging.basicConfig(level=logging.DEBUG)
๐ฏ Best Practices
For Best Learning Results
- Start Simple: Begin with basic conversations before advanced topics
- Use Translation Hints Strategically:
- Beginners: Keep hints enabled to build vocabulary confidence
- Intermediate: Toggle on/off based on topic difficulty
- Advanced: Disable for full immersion practice
- Be Consistent: Practice regularly, even if just 10-15 minutes daily
- Use Voice Features: Enable both STT and TTS to practice speaking and listening
- Record yourself speaking in the target language
- Auto-transcription helps you see what you said
- Listen to native pronunciation with TTS
- Ask Questions: Request grammar explanations when confused
- Progressive Learning: Start with hints, gradually disable them as you improve
For Best Performance
- Use Edge-TTS: Free and high-quality for most languages with regional variants
- Auto-Transcription: Recordings automatically transcribe - no extra button needed
- Voice Settings: STT and TTS automatically match your target language
- Adjust Token Limits: Lower max_tokens (256-512) for faster, more concise responses
- Monitor Costs: Track API usage if using paid providers (OpenAI Whisper/TTS)
๐ Cost Estimates
Free Configuration (Recommended)
- STT: Local Whisper (Tiny or Base)
- TTS: Edge-TTS or gTTS
- LLM: Hugging Face (Free tier)
- Total: $0/month โ
Paid Configuration (Premium Quality)
- STT: OpenAI Whisper API (~$0.006/minute)
- TTS: OpenAI TTS (~$0.015/1K chars)
- LLM: Hugging Face (Free or Pro)
- Example: 10 hours/month โ $3.60 STT + ~$5 TTS = ~$8.60/month
๐ค Contributing
Contributions are welcome! Areas for improvement:
- Language Support: Add more languages and regional voice variants
- Learning Modes: Add grammar-focused, conversation-focused, or exam-prep modes
- Progress Tracking: Implement analytics and learning progress dashboards
- Conversation History: Export conversations for review and study
- Specialized Prompts: Create templates for business, travel, academic contexts
- Interactive Exercises: Add grammar quizzes, vocabulary drills, pronunciation practice
- Difficulty Levels: Auto-adjust based on user performance (A1-C2 CEFR levels)
- Translation Memory: Remember commonly confused words and revisit them
- Voice Recording Playback: Let users hear their own recordings for self-assessment
- Multi-modal Learning: Add image descriptions, video transcriptions, etc.
๐ License
This project is licensed under the MIT License.
๐ Acknowledgments
- Apertus-70B by Swiss AI Lab - Multilingual LLM
- UV by Astral - Fast Python package manager
- Gradio - Web interface framework
- Hugging Face - Model hosting and inference
- Edge-TTS - Free multilingual text-to-speech
- OpenAI Whisper - Speech recognition
๐ Support
For issues, questions, or feature requests, please open an issue on the repository.
Happy Learning! ๐๐
Made with โค๏ธ for language learners worldwide.