import { useEffect, useCallback, useRef } from "react"; import { Microphone } from "@phosphor-icons/react"; import { Tooltip } from "react-tooltip"; import _regeneratorRuntime from "regenerator-runtime"; import SpeechRecognition, { useSpeechRecognition, } from "react-speech-recognition"; import { PROMPT_INPUT_EVENT } from "../../PromptInput"; import { useTranslation } from "react-i18next"; import Appearance from "@/models/appearance"; let timeout; const SILENCE_INTERVAL = 3_200; // wait in seconds of silence before closing. /** * Speech-to-text input component for the chat window. * @param {Object} props - The component props * @param {(textToAppend: string, autoSubmit: boolean) => void} props.sendCommand - The function to send the command * @returns {React.ReactElement} The SpeechToText component */ export default function SpeechToText({ sendCommand }) { const previousTranscriptRef = useRef(""); const { transcript, listening, resetTranscript, browserSupportsSpeechRecognition, browserSupportsContinuousListening, isMicrophoneAvailable, } = useSpeechRecognition({ clearTranscriptOnListen: true, }); const { t } = useTranslation(); function startSTTSession() { if (!isMicrophoneAvailable) { alert( "AnythingLLM does not have access to microphone. Please enable for this site to use this feature." ); return; } resetTranscript(); previousTranscriptRef.current = ""; SpeechRecognition.startListening({ continuous: browserSupportsContinuousListening, language: window?.navigator?.language ?? "en-US", }); } function endSTTSession() { SpeechRecognition.stopListening(); // If auto submit is enabled, send an empty string to the chat window to submit the current transcript // since every chunk of text should have been streamed to the chat window by now. if (Appearance.get("autoSubmitSttInput")) { sendCommand({ text: "", autoSubmit: true, writeMode: "append", }); } resetTranscript(); previousTranscriptRef.current = ""; clearTimeout(timeout); } const handleKeyPress = useCallback( (event) => { // CTRL + m on Mac and Windows to toggle STT listening if (event.ctrlKey && event.keyCode === 77) { if (listening) { endSTTSession(); } else { startSTTSession(); } } }, [listening, endSTTSession, startSTTSession] ); function handlePromptUpdate(e) { if (!e?.detail && timeout) { endSTTSession(); clearTimeout(timeout); } } useEffect(() => { document.addEventListener("keydown", handleKeyPress); return () => { document.removeEventListener("keydown", handleKeyPress); }; }, [handleKeyPress]); useEffect(() => { if (!!window) window.addEventListener(PROMPT_INPUT_EVENT, handlePromptUpdate); return () => window?.removeEventListener(PROMPT_INPUT_EVENT, handlePromptUpdate); }, []); useEffect(() => { if (transcript?.length > 0 && listening) { const previousTranscript = previousTranscriptRef.current; const newContent = transcript.slice(previousTranscript.length); // Stream just the diff of the new content since transcript is an accumulating string. // and not just the new content transcribed. if (newContent.length > 0) sendCommand({ text: newContent, writeMode: "append" }); previousTranscriptRef.current = transcript; clearTimeout(timeout); timeout = setTimeout(() => { endSTTSession(); }, SILENCE_INTERVAL); } }, [transcript, listening]); if (!browserSupportsSpeechRecognition) return null; return (