--- license: mit language: - en tags: - speech-to-speech - faster-whisper - qwen - gguf - windows - local-ai - terminal - sapi library_name: llama-cpp-python pipeline_tag: automatic-speech-recognition --- # Local S2S Shell Starter A simple local speech-to-speech assistant that runs from a Windows terminal. ## Stack - STT: faster-whisper medium - LLM: Qwen2.5 3B Instruct GGUF Q4_K_M - TTS: Windows SAPI voice - UI: terminal only ## Pipeline microphone -> faster-whisper -> Qwen2.5 3B GGUF -> Windows SAPI speech ## Hardware Target - CPU fallback supported - NVIDIA GPU auto-used when available - 8GB+ VRAM recommended for smoother local use ## Setup Run from PowerShell: py -3.11 -m venv .venv .\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel .\.venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 .\.venv\Scripts\python.exe -m pip install -r requirements.txt .\.venv\Scripts\python.exe download_models.py ## Run .\run_shell_s2s.bat ## Shell Commands Enter = record mic and run speech-to-speech t = type text and hear reply d = list audio devices q = quit ## Model Download The downloader fetches: Repo: bartowski/Qwen2.5-3B-Instruct-GGUF File: Qwen2.5-3B-Instruct-Q4_K_M.gguf The GGUF model file is not committed to this repository. ## Scope This is a local voice-chat starter. It does not control the computer, run tools, or perform system automation.