Spaces:
Running
Running
Upload folder using huggingface_hub
Browse files- scratch.py +24 -0
- script.md +11 -1
scratch.py
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from fastrtc import Stream, ReplyOnPause
|
| 2 |
+
import numpy as np
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
def echo(audio: tuple[int, np.ndarray]):
|
| 6 |
+
# The function will be passed the audio until the user pauses
|
| 7 |
+
# Implement any iterator that yields audio
|
| 8 |
+
# See "LLM Voice Chat" for a more complete example
|
| 9 |
+
yield audio
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
stream = Stream(
|
| 13 |
+
handler=ReplyOnPause(echo),
|
| 14 |
+
modality="audio",
|
| 15 |
+
mode="send-receive",
|
| 16 |
+
ui_args={
|
| 17 |
+
"icon": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Portrait-of-a-woman.jpg/960px-Portrait-of-a-woman.jpg?20200608215745",
|
| 18 |
+
"pulse_color": "rgb(35, 157, 225)",
|
| 19 |
+
"icon_button_color": "rgb(35, 157, 225)",
|
| 20 |
+
"title": "Gemini Audio Video Chat",
|
| 21 |
+
},
|
| 22 |
+
)
|
| 23 |
+
|
| 24 |
+
stream.ui.launch()
|
script.md
CHANGED
|
@@ -1,5 +1,15 @@
|
|
| 1 |
Hi, I'm Freddy and I want to give a tour of FastRTC - the real-time communication library for Python.
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
Let's start with the basics - echoing audio.
|
| 5 |
|
|
|
|
| 1 |
Hi, I'm Freddy and I want to give a tour of FastRTC - the real-time communication library for Python.
|
| 2 |
+
|
| 3 |
+
Why is this important? In the last few months, we've seen many advances in real-time speech and vision models coming from closed-source models, open-source models, and API providers.
|
| 4 |
+
|
| 5 |
+
Despite these innovations, it's still difficult to build real-time AI applications that stream audio and video, especially in Python. This is because:
|
| 6 |
+
|
| 7 |
+
- ML engineers may not have experience with the technologies needed to build real-time applications, such as WebRTC or Websockets.
|
| 8 |
+
- Implementing algorithms for voice detection and turn taking is tricky!
|
| 9 |
+
- Best practices are scattered across various sources and even code assistant tools like Cursor and Copilot struggle to write Python code that supports real-time audio/video applications. I learned that the hard way!
|
| 10 |
+
|
| 11 |
+
All this means that if you want to take advantage of the latest advances in AI, you have to spend a lot of time figuring out how to do real-time streaming.
|
| 12 |
+
`FastRTC` solves this problem by automatically turning any python function into a real-time audio and video stream over WebRTC or WebSockets with little additional code or overhead. Let's see how it works.
|
| 13 |
|
| 14 |
Let's start with the basics - echoing audio.
|
| 15 |
|