Spaces:

roniegbu
/

linguacall-backend

Sleeping

linguacall-backend / react_native_migration_guide.md

Roni Egbu

feat: Enhance TTS model with startup verification for missing models, improve subprocess handling, and add error messages for missing Piper installation

58eb194 about 1 month ago

preview code

raw

history blame contribute delete

17.7 kB

React Native Migration Guide

From HTML/JS to Mobile App with FastAPI Backend

Overview

You'll migrate your voice translation app from browser (HTML/JS) to a native mobile app using React Native while keeping your FastAPI backend unchanged.

Architecture:

React Native App (Mobile)
        ↓ WebSocket
FastAPI Backend (Same as before)
        ↓
  ASR → MT → TTS Pipeline

Phase 1: Setup React Native Project

1. Install Prerequisites

# Install Node.js (if not already installed)
# Download from: https://nodejs.org/

# Install React Native CLI
npm install -g react-native-cli

# For Android: Install Android Studio
# For iOS: Install Xcode (Mac only)

2. Create New Project

# Create React Native project
npx react-native init VoiceTranslationApp

cd VoiceTranslationApp

# Install required packages
npm install @react-native-community/netinfo
npm install react-native-webrtc
npm install @react-native-async-storage/async-storage

Phase 2: Audio Capture Setup

Install Audio Libraries

# For microphone access and audio recording
npm install react-native-audio-recorder-player

# For real-time audio streaming
npm install react-native-live-audio-stream

Configure Permissions

Android: android/app/src/main/AndroidManifest.xml

<manifest xmlns:android="http://schemas.android.com/apk/res/android">
    <!-- Add these permissions -->
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
    
    <application>
        ...
    </application>
</manifest>

iOS: ios/VoiceTranslationApp/Info.plist

<key>NSMicrophoneUsageDescription</key>
<string>We need microphone access for voice translation</string>
<key>NSAppTransportSecurity</key>
<dict>
    <key>NSAllowsArbitraryLoads</key>
    <true/>
</dict>

Phase 3: Core Components

1. WebSocket Service (`src/services/WebSocketService.js`)

class WebSocketService {
  constructor() {
    this.ws = null;
    this.listeners = {};
  }

  connect(roomId, userId, nativeLanguage, onMessage, onError) {
    const wsUrl = `ws://YOUR_BACKEND_IP:8000/ws/call/${roomId}/${userId}`;
    
    this.ws = new WebSocket(wsUrl);
    
    this.ws.onopen = () => {
      console.log('WebSocket connected');
      // Send initial config
      this.ws.send(JSON.stringify({
        native_lang: nativeLanguage
      }));
      
      if (this.listeners.onConnect) {
        this.listeners.onConnect();
      }
    };
    
    this.ws.onmessage = (event) => {
      if (typeof event.data === 'string') {
        // JSON message
        const msg = JSON.parse(event.data);
        onMessage(msg);
      } else {
        // Binary audio data
        onMessage({ type: 'audio', data: event.data });
      }
    };
    
    this.ws.onerror = (error) => {
      console.error('WebSocket error:', error);
      onError(error);
    };
    
    this.ws.onclose = () => {
      console.log('WebSocket disconnected');
      if (this.listeners.onDisconnect) {
        this.listeners.onDisconnect();
      }
    };
  }

  sendAudio(audioData) {
    if (this.ws && this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(audioData);
    }
  }

  disconnect() {
    if (this.ws) {
      this.ws.close();
      this.ws = null;
    }
  }

  on(event, callback) {
    this.listeners[event] = callback;
  }
}

export default new WebSocketService();

2. Audio Streaming Service (`src/services/AudioStreamService.js`)

import LiveAudioStream from 'react-native-live-audio-stream';

class AudioStreamService {
  constructor() {
    this.isStreaming = false;
  }

  async start(onAudioData) {
    const options = {
      sampleRate: 16000,  // Match your backend
      channels: 1,         // Mono
      bitsPerSample: 16,   // 16-bit PCM
      audioSource: 6,      // VOICE_RECOGNITION for Android
      bufferSize: 8192     // Chunk size
    };

    LiveAudioStream.init(options);
    
    LiveAudioStream.on('data', data => {
      // data is base64 encoded PCM audio
      if (onAudioData) {
        // Convert base64 to binary
        const audioBuffer = Buffer.from(data, 'base64');
        onAudioData(audioBuffer);
      }
    });

    LiveAudioStream.start();
    this.isStreaming = true;
    console.log('Audio streaming started');
  }

  stop() {
    if (this.isStreaming) {
      LiveAudioStream.stop();
      this.isStreaming = false;
      console.log('Audio streaming stopped');
    }
  }
}

export default new AudioStreamService();

3. Audio Playback Service (`src/services/AudioPlaybackService.js`)

import AudioRecorderPlayer from 'react-native-audio-recorder-player';
import RNFS from 'react-native-fs';

class AudioPlaybackService {
  constructor() {
    this.audioRecorderPlayer = new AudioRecorderPlayer();
  }

  async playAudioFromBlob(audioBlob) {
    try {
      // Save blob to temp file
      const tempPath = `${RNFS.CachesDirectoryPath}/temp_audio.wav`;
      
      // Convert blob to base64
      const base64Audio = await this.blobToBase64(audioBlob);
      
      // Write to file
      await RNFS.writeFile(tempPath, base64Audio, 'base64');
      
      // Play audio
      await this.audioRecorderPlayer.startPlayer(tempPath);
      
      this.audioRecorderPlayer.addPlayBackListener((e) => {
        if (e.currentPosition === e.duration) {
          this.audioRecorderPlayer.stopPlayer();
          // Clean up temp file
          RNFS.unlink(tempPath);
        }
      });
      
    } catch (error) {
      console.error('Playback error:', error);
    }
  }

  blobToBase64(blob) {
    return new Promise((resolve, reject) => {
      const reader = new FileReader();
      reader.onloadend = () => {
        const base64 = reader.result.split(',')[1];
        resolve(base64);
      };
      reader.onerror = reject;
      reader.readAsDataURL(blob);
    });
  }

  async stop() {
    await this.audioRecorderPlayer.stopPlayer();
  }
}

export default new AudioPlaybackService();

Phase 4: Main Screen Component

`App.js`

import React, { useState, useEffect } from 'react';
import {
  SafeAreaView,
  StyleSheet,
  View,
  Text,
  TouchableOpacity,
  ActivityIndicator,
  TextInput,
  Picker
} from 'react-native';

import WebSocketService from './src/services/WebSocketService';
import AudioStreamService from './src/services/AudioStreamService';
import AudioPlaybackService from './src/services/AudioPlaybackService';

const App = () => {
  const [isConnected, setIsConnected] = useState(false);
  const [isStreaming, setIsStreaming] = useState(false);
  const [roomId, setRoomId] = useState('room1');
  const [userId] = useState(`user_${Math.random().toString(36).substr(2, 9)}`);
  const [nativeLanguage, setNativeLanguage] = useState('en');
  const [translatedText, setTranslatedText] = useState('Waiting for translation...');
  const [originalText, setOriginalText] = useState('-');
  const [status, setStatus] = useState('Disconnected');

  useEffect(() => {
    // Setup WebSocket listeners
    WebSocketService.on('onConnect', () => {
      setIsConnected(true);
      setStatus('Connected');
    });

    WebSocketService.on('onDisconnect', () => {
      setIsConnected(false);
      setIsStreaming(false);
      setStatus('Disconnected');
      AudioStreamService.stop();
    });

    return () => {
      handleLeaveCall();
    };
  }, []);

  const handleJoinCall = async () => {
    try {
      setStatus('Connecting...');
      
      // Connect WebSocket
      WebSocketService.connect(
        roomId,
        userId,
        nativeLanguage,
        handleWebSocketMessage,
        (error) => {
          console.error('WebSocket error:', error);
          setStatus('Connection Error');
        }
      );

      // Start audio streaming
      await AudioStreamService.start((audioData) => {
        // Send audio chunks to backend
        WebSocketService.sendAudio(audioData);
      });

      setIsStreaming(true);
      
    } catch (error) {
      console.error('Failed to join call:', error);
      setStatus('Failed to connect');
    }
  };

  const handleLeaveCall = () => {
    AudioStreamService.stop();
    WebSocketService.disconnect();
    setIsConnected(false);
    setIsStreaming(false);
    setStatus('Disconnected');
  };

  const handleWebSocketMessage = async (message) => {
    if (message.type === 'audio') {
      // Play received audio
      await AudioPlaybackService.playAudioFromBlob(message.data);
    } else if (message.type === 'caption') {
      setTranslatedText(message.text);
      setOriginalText(message.original);
    } else if (message.info) {
      setStatus(message.info);
    }
  };

  return (
    <SafeAreaView style={styles.container}>
      <View style={styles.header}>
        <Text style={styles.title}>🌐 Voice Translation</Text>
      </View>

      {!isConnected && (
        <View style={styles.settingsContainer}>
          <Text style={styles.label}>Room ID:</Text>
          <TextInput
            style={styles.input}
            value={roomId}
            onChangeText={setRoomId}
            placeholder="Enter room ID"
          />

          <Text style={styles.label}>My Language:</Text>
          <Picker
            selectedValue={nativeLanguage}
            style={styles.picker}
            onValueChange={(itemValue) => setNativeLanguage(itemValue)}
          >
            <Picker.Item label="English" value="en" />
            <Picker.Item label="French" value="fr" />
            <Picker.Item label="German" value="de" />
            <Picker.Item label="Spanish" value="es" />
          </Picker>
        </View>
      )}

      <View style={styles.statusContainer}>
        <View style={[
          styles.statusIndicator,
          { backgroundColor: isConnected ? '#4CAF50' : '#F44336' }
        ]} />
        <Text style={styles.statusText}>{status}</Text>
      </View>

      {isStreaming && (
        <View style={styles.recordingIndicator}>
          <View style={styles.pulseCircle} />
          <Text style={styles.recordingText}>Microphone Active</Text>
        </View>
      )}

      <View style={styles.captionContainer}>
        <Text style={styles.captionLabel}>Incoming Translation:</Text>
        <Text style={styles.captionText}>{translatedText}</Text>
      </View>

      <View style={styles.captionContainer}>
        <Text style={styles.captionLabel}>Original:</Text>
        <Text style={styles.captionTextSmall}>{originalText}</Text>
      </View>

      <View style={styles.buttonContainer}>
        {!isConnected ? (
          <TouchableOpacity
            style={[styles.button, styles.joinButton]}
            onPress={handleJoinCall}
          >
            <Text style={styles.buttonText}>Join Call</Text>
          </TouchableOpacity>
        ) : (
          <TouchableOpacity
            style={[styles.button, styles.leaveButton]}
            onPress={handleLeaveCall}
          >
            <Text style={styles.buttonText}>Leave Call</Text>
          </TouchableOpacity>
        )}
      </View>

      <View style={styles.infoContainer}>
        <Text style={styles.infoText}>
          ℹ️ Just speak naturally - the app automatically detects when you finish speaking and translates to the other person.
        </Text>
      </View>
    </SafeAreaView>
  );
};

const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: '#f5f5f5',
  },
  header: {
    padding: 20,
    backgroundColor: '#007bff',
  },
  title: {
    fontSize: 24,
    fontWeight: 'bold',
    color: 'white',
    textAlign: 'center',
  },
  settingsContainer: {
    padding: 20,
    backgroundColor: 'white',
    margin: 10,
    borderRadius: 10,
  },
  label: {
    fontSize: 16,
    fontWeight: 'bold',
    marginTop: 10,
    marginBottom: 5,
  },
  input: {
    borderWidth: 1,
    borderColor: '#ddd',
    borderRadius: 5,
    padding: 10,
    fontSize: 16,
  },
  picker: {
    borderWidth: 1,
    borderColor: '#ddd',
    borderRadius: 5,
  },
  statusContainer: {
    flexDirection: 'row',
    alignItems: 'center',
    padding: 15,
    backgroundColor: 'white',
    margin: 10,
    borderRadius: 10,
  },
  statusIndicator: {
    width: 12,
    height: 12,
    borderRadius: 6,
    marginRight: 10,
  },
  statusText: {
    fontSize: 16,
    fontWeight: 'bold',
  },
  recordingIndicator: {
    flexDirection: 'row',
    alignItems: 'center',
    justifyContent: 'center',
    padding: 10,
    backgroundColor: '#ffe6e6',
    margin: 10,
    borderRadius: 10,
  },
  pulseCircle: {
    width: 12,
    height: 12,
    borderRadius: 6,
    backgroundColor: '#F44336',
    marginRight: 10,
  },
  recordingText: {
    color: '#F44336',
    fontWeight: 'bold',
  },
  captionContainer: {
    padding: 15,
    backgroundColor: 'white',
    margin: 10,
    borderRadius: 10,
    borderLeftWidth: 4,
    borderLeftColor: '#007bff',
  },
  captionLabel: {
    fontSize: 12,
    color: '#666',
    marginBottom: 5,
  },
  captionText: {
    fontSize: 18,
    color: '#333',
  },
  captionTextSmall: {
    fontSize: 14,
    color: '#666',
  },
  buttonContainer: {
    padding: 20,
  },
  button: {
    padding: 15,
    borderRadius: 10,
    alignItems: 'center',
  },
  joinButton: {
    backgroundColor: '#007bff',
  },
  leaveButton: {
    backgroundColor: '#F44336',
  },
  buttonText: {
    color: 'white',
    fontSize: 18,
    fontWeight: 'bold',
  },
  infoContainer: {
    padding: 15,
    backgroundColor: '#e7f3ff',
    margin: 10,
    borderRadius: 10,
  },
  infoText: {
    fontSize: 14,
    color: '#333',
  },
});

export default App;

Phase 5: Backend Changes

Update Your Backend IP Address

In your FastAPI backend, make sure it's accessible from your phone:

Option A: Same WiFi Network

# Find your computer's local IP
# Windows: ipconfig
# Mac/Linux: ifconfig

# Run backend with:
uvicorn main:app --host 0.0.0.0 --port 8000

Option B: Ngrok Tunnel (for testing)

# Install ngrok
# Download from: https://ngrok.com/

# Tunnel to your backend
ngrok http 8000

# Use the ngrok URL in your React Native app
# ws://YOUR_NGROK_URL/ws/call/...

Update CORS (if needed)

Add to your main.py:

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # For development - restrict in production
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Phase 6: Build and Run

Android

# Connect Android device via USB with USB debugging enabled
# OR start Android emulator

# Run app
npx react-native run-android

iOS (Mac only)

# Install pods
cd ios && pod install && cd ..

# Run app
npx react-native run-ios

Common Issues and Solutions

Issue 1: Audio Not Streaming

Solution: Check microphone permissions

import { PermissionsAndroid, Platform } from 'react-native';

async function requestMicrophonePermission() {
  if (Platform.OS === 'android') {
    const granted = await PermissionsAndroid.request(
      PermissionsAndroid.PERMISSIONS.RECORD_AUDIO
    );
    return granted === PermissionsAndroid.RESULTS.GRANTED;
  }
  return true;
}

// Call before starting audio
await requestMicrophonePermission();

Issue 2: WebSocket Not Connecting

Solution: Make sure backend IP is correct

// Test backend accessibility
fetch('http://YOUR_BACKEND_IP:8000/health')
  .then(res => console.log('Backend reachable'))
  .catch(err => console.error('Cannot reach backend:', err));

Issue 3: Audio Quality Issues

Solution: Adjust audio parameters

const options = {
  sampleRate: 16000,   // Match backend
  channels: 1,
  bitsPerSample: 16,
  audioSource: 6,      // VOICE_RECOGNITION
  bufferSize: 4096     // Try different sizes: 2048, 4096, 8192
};

Testing Checklist

Microphone permission granted
WebSocket connects to backend
Audio streams continuously
VAD detects speech/silence
Translation received and played
Captions display correctly
Leave call cleans up resources
Works on both WiFi and mobile data (if using ngrok)

Production Deployment

Backend Hosting

Option 1: Cloud Server (AWS, DigitalOcean, etc.)

# Deploy FastAPI to cloud
# Use Nginx + Uvicorn for production
# Enable SSL (wss:// for WebSocket)

Option 2: Serverless (AWS Lambda + API Gateway)

More complex setup
WebSocket support via API Gateway
May have latency issues

App Store Deployment

Android (Google Play)

# Generate signed APK
cd android
./gradlew assembleRelease

iOS (App Store)

# Build in Xcode
# Archive and submit to App Store Connect

Next Steps

Test with current HTML frontend first - Make sure VAD works
Set up React Native project - Get basic UI working
Implement audio streaming - Test microphone → WebSocket
Add playback - Test receiving and playing audio
Polish UI - Add animations, better error handling
Deploy - Cloud backend + app stores

Estimated Timeline

Week 1: VAD in HTML/JS (verify it works)
Week 2: React Native setup + basic UI
Week 3: Audio streaming + WebSocket integration
Week 4: Polish + testing + deployment

Good luck! 🚀