Spaces:

RSHVR
/

Command_RTC

Sleeping

App Files Files Community

RSHVR commited on Mar 30, 2025

Commit

4a58eca

verified ·

1 Parent(s): 3958987

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -76

README.md CHANGED Viewed

@@ -17,91 +17,41 @@ tags:
   - fastapi
 ---
-# Tortoise TTS with Voice Cloning
-A powerful text-to-speech application with voice cloning capabilities, powered by Tortoise-TTS.
-## Description
-This application allows you to generate high-quality, natural-sounding speech from text. You can customize the voice by either:
-- Uploading your own voice sample for cloning
-- Recording your voice directly in the browser
-- Selecting from a variety of preset voices
-The app uses Tortoise-TTS, a high-quality text-to-speech model, and runs efficiently on Hugging Face Spaces with Zero-GPU optimization.
-## How to Use
-### Web Interface
-1. Enter the text you want to convert to speech
-2. Choose one of the following voice options:
-   - Upload a voice sample audio file (WAV format recommended)
-   - Record your voice using your microphone
-   - Select a preset voice from the dropdown menu
-3. Click "Generate Speech"
-4. Listen to or download the generated audio
-### API Endpoints
-The app also provides REST API endpoints for programmatic access:
-1. **Voice File TTS** - `/api/tts_with_voice_file/`
-   - POST request with:
-     - `text`: Text to convert to speech (required)
-     - `voice_file`: Audio file for voice cloning (optional)
-     - `preset_voice`: Name of preset voice (optional, defaults to "random")
-2. **Preset Voice TTS** - `/api/tts_with_preset/`
-   - POST request with:
-     - `text`: Text to convert to speech (required)
-     - `preset_voice`: Name of preset voice (required)
-### Python Example
-```python
-import requests
-# Using preset voice
-response = requests.post(
-    "https://your-space-name.hf.space/api/tts_with_preset/",
-    data={"text": "Hello, this is a test.", "preset_voice": "tom"}
-)
-# Save the audio file
-with open("output.wav", "wb") as f:
-    f.write(response.content)
-```
 ## Technical Details
-This app leverages:
-- **Tortoise-TTS**: State-of-the-art text-to-speech model
-- **Gradio**: For the intuitive user interface
-- **FastAPI**: For the API endpoints
-- **Zero-GPU**: For efficient GPU utilization on Hugging Face Spaces
-## Limitations
-- Text generation may take some time (30-60 seconds) depending on text length
-- Voice cloning quality depends on the clarity and length of the provided sample
-- For best results, provide voice samples with clear speech and minimal background noise
-## Credits
-This project uses the Tortoise-TTS model. If you use this app in your work, please consider citing:
-```
-@misc{tortoise-tts,
-  author = {James Betker},
-  title = {Tortoise-TTS: A Multi-Voice TTS System},
-  year = {2022},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/neonbjb/tortoise-tts}}
-}
-```
-## License
-This project is available under the Apache-2.0 License.

   - fastapi
 ---
+# Voice Chat Assistant
+A conversational voice assistant powered by AI that responds to your spoken queries with natural-sounding speech.
+## Features
+- Speech Recognition: Uses OpenAI's Whisper model to accurately transcribe your voice
+- Natural Language Understanding: Leverages Cohere's LLM API for intelligent responses
+- Text-to-Speech: Generates natural speech using Tortoise-TTS
+- Reply on Pause: Automatically responds when you finish speaking
+- Conversation History: Maintains context throughout your dialogue
+## Demo
+Speak into your microphone and the assistant will respond with voice!
+## How It Works
+- Your voice is transcribed to text using Whisper
+- The text is processed by Cohere's LLM to generate a response
+- The response is converted to speech using Tortoise-TTS
+- The conversation continues with full context retention
 ## Technical Details
+This project utilizes:
+- Zero-GPU: Efficient GPU memory usage with Hugging Face's Zero-GPU technology
+- FastRTC: Real-time communication for seamless voice interaction
+- Gradio: Simple and intuitive user interface
+## Setup
+To run this locally, you'll need a Cohere API key and Python 3.8+.
+## Acknowledgements
+OpenAI for the Whisper speech recognition model
+Cohere for the language model API
+Tortoise-TTS for the text-to-speech capabilities
+Hugging Face for the Spaces and Zero-GPU infrastructure