Voice_backend / GETTING_STARTED.md
Mohansai2004's picture
Upload 67 files
24dc421 verified

πŸš€ Getting Started - Voice-to-Voice Translator

Welcome!

You now have a professional, production-ready voice-to-voice translator backend! This guide will help you get started quickly.

πŸ“‹ What You Have

βœ… Complete & Ready to Use

  • FastAPI WebSocket server
  • Room management system
  • User connection handling
  • Message routing & protocol
  • Heartbeat monitoring
  • Configuration management
  • Structured logging
  • Error handling
  • Docker deployment files
  • Complete documentation

πŸ“ Needs Implementation

  • STT (Speech-to-Text) engine
  • Translation engine
  • TTS (Text-to-Speech) engine
  • Audio processing pipeline
  • Security layer (optional)
  • Worker pools (optional)

⚑ Quick Start (5 Minutes)

Step 1: Setup Environment

# Navigate to project
cd voice-to-voice-translator

# Create virtual environment
python -m venv venv

# Activate it
# On Windows:
venv\Scripts\activate
# On Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Step 2: Start the Server

# Run the server
python app/main.py

You should see:

INFO:     Started server process
INFO:     Uvicorn running on http://0.0.0.0:8000
INFO:     Application startup complete.

Step 3: Test It!

Open another terminal and run:

# Test health endpoint
curl http://localhost:8000/health

Expected response:

{
  "status": "healthy",
  "connections": 0,
  "rooms": 0,
  "total_users": 0
}

πŸ§ͺ Test WebSocket Connection

Using Python:

import asyncio
import websockets
import json

async def test():
    uri = "ws://localhost:8000/ws"
    async with websockets.connect(uri) as ws:
        # Join a room
        await ws.send(json.dumps({
            "type": "join_room",
            "payload": {
                "room_id": "test_room",
                "user_id": "user123",
                "username": "Test User",
                "source_lang": "en",
                "target_lang": "hi"
            }
        }))
        
        # Get response
        response = await ws.recv()
        print(json.loads(response))

asyncio.run(test())

Save as test_client.py and run:

python test_client.py

Using Browser Console:

const ws = new WebSocket('ws://localhost:8000/ws');

ws.onopen = () => {
    console.log('Connected!');
    ws.send(JSON.stringify({
        type: 'join_room',
        payload: {
            room_id: 'browser_test',
            user_id: 'user456',
            username: 'Browser User',
            source_lang: 'en',
            target_lang: 'hi'
        }
    }));
};

ws.onmessage = (event) => {
    console.log('Received:', JSON.parse(event.data));
};

πŸ“š Project Structure

voice-to-voice-translator/
β”œβ”€β”€ app/                    # Application source code
β”‚   β”œβ”€β”€ main.py            # πŸ‘ˆ START HERE - Application entry point
β”‚   β”œβ”€β”€ config/            # Settings and logging
β”‚   β”œβ”€β”€ server/            # WebSocket server & connections
β”‚   β”œβ”€β”€ rooms/             # Room management
β”‚   β”œβ”€β”€ messaging/         # Message protocol & routing
β”‚   └── utils/             # Utilities
β”‚
β”œβ”€β”€ docs/                  # πŸ“– Complete documentation
β”‚   β”œβ”€β”€ architecture.md    # System design
β”‚   β”œβ”€β”€ websocket-protocol.md  # Protocol spec
β”‚   β”œβ”€β”€ latency-strategy.md    # Performance guide
β”‚   └── deployment.md      # Deployment guide
β”‚
β”œβ”€β”€ scripts/               # Utility scripts
β”‚   β”œβ”€β”€ download_models.py # Model downloader
β”‚   β”œβ”€β”€ setup_env.sh       # Environment setup
β”‚   └── health_check.py    # Health checker
β”‚
β”œβ”€β”€ docker/                # Docker files
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── docker-compose.yml
β”‚
β”œβ”€β”€ tests/                 # Test suite
β”œβ”€β”€ models/                # ML models storage
β”œβ”€β”€ .env                   # Configuration
β”œβ”€β”€ requirements.txt       # Dependencies
β”œβ”€β”€ README.md              # Main documentation
β”œβ”€β”€ PROJECT_STATUS.md      # πŸ“‹ Implementation guide
└── IMPLEMENTATION_SUMMARY.md  # πŸ“Š Project summary

🎯 What Works Right Now

βœ… Server starts and accepts WebSocket connections
βœ… Users can join and leave rooms
βœ… Messages are routed correctly
βœ… Heartbeat monitoring keeps connections alive
βœ… Health check endpoint reports status
βœ… Multiple users can be in same room
βœ… Room membership is tracked
βœ… User disconnections are handled gracefully

πŸ”§ Configuration

Edit .env file to customize:

# Server
HOST=0.0.0.0
PORT=8000

# Logging
LOG_LEVEL=INFO

# Audio
AUDIO_SAMPLE_RATE=16000
AUDIO_CHUNK_SIZE=4096

# Rooms
MAX_USERS_PER_ROOM=10
ROOM_TIMEOUT=3600

# And more...

πŸ“– Key Documentation Files

  1. README.md - Project overview and features
  2. PROJECT_STATUS.md - Detailed implementation status with code examples
  3. IMPLEMENTATION_SUMMARY.md - Complete summary of what's done
  4. docs/websocket-protocol.md - Complete WebSocket protocol
  5. docs/architecture.md - System architecture and design

πŸ”¨ Next Steps

To Add Full Translation Capability:

  1. Download Models (when ready):

    python scripts/download_models.py
    
  2. Implement Pipeline Components:

    • app/pipeline/stt/vosk_engine.py - Speech recognition
    • app/pipeline/translate/argos_engine.py - Translation
    • app/pipeline/tts/coqui_engine.py - Speech synthesis
    • app/pipeline/pipeline_manager.py - Orchestration
  3. See PROJECT_STATUS.md for detailed implementation guide with code examples

To Deploy with Docker:

# Build and run
docker-compose -f docker/docker-compose.yml up -d

# View logs
docker-compose logs -f

# Stop
docker-compose down

πŸ§ͺ Running Tests

# Run all tests
pytest tests/

# Run specific test
pytest tests/test_websocket.py -v

# With coverage
pytest --cov=app tests/

πŸ” Health Check

# Run comprehensive health check
python scripts/health_check.py

This checks:

  • Dependencies installed
  • Configuration files
  • Models present
  • Server running

πŸ› Troubleshooting

Server won't start

# Check if port is in use
netstat -ano | findstr :8000  # Windows
lsof -i :8000  # Linux/Mac

# Check dependencies
pip install -r requirements.txt

WebSocket connection fails

# Verify server is running
curl http://localhost:8000/health

# Check firewall settings
# Ensure port 8000 is open

Import errors

# Make sure you're in the right directory
cd voice-to-voice-translator

# Activate virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Reinstall dependencies
pip install -r requirements.txt

πŸ“ž API Endpoints

  • GET / - Root endpoint (service info)
  • GET /health - Health check
  • WS /ws - WebSocket endpoint for clients

πŸ’‘ Pro Tips

  1. Use the logs: Check logs/app.log for detailed information
  2. Read the docs: docs/ folder has comprehensive guides
  3. Check examples: tests/test_websocket.py has working examples
  4. Monitor performance: Built-in performance tracking available
  5. Follow the protocol: See docs/websocket-protocol.md for message formats

πŸŽ“ Learning Resources

πŸ“ Code Quality

The codebase follows:

  • βœ… PEP 8 style guide
  • βœ… Type hints throughout
  • βœ… Comprehensive docstrings
  • βœ… Structured logging
  • βœ… Error handling best practices

πŸŽ‰ You're Ready!

Your voice translator backend is set up and ready for development. The infrastructure is complete - now you can focus on implementing the ML pipeline components.

Happy Coding! πŸš€


Need Help?

  • Check PROJECT_STATUS.md for implementation guidance
  • Review docs/ for architectural details
  • Run health_check.py to verify setup
  • See IMPLEMENTATION_SUMMARY.md for complete overview