s2s-complete-setup / README.md
raichemathew1
Add Hugging Face repo card metadata
48ca412
---
license: mit
language:
- en
tags:
- speech-to-speech
- faster-whisper
- qwen
- gguf
- windows
- local-ai
- terminal
- sapi
library_name: llama-cpp-python
pipeline_tag: automatic-speech-recognition
---
# Local S2S Shell Starter
A simple local speech-to-speech assistant that runs from a Windows terminal.
## Stack
- STT: faster-whisper medium
- LLM: Qwen2.5 3B Instruct GGUF Q4_K_M
- TTS: Windows SAPI voice
- UI: terminal only
## Pipeline
microphone -> faster-whisper -> Qwen2.5 3B GGUF -> Windows SAPI speech
## Hardware Target
- CPU fallback supported
- NVIDIA GPU auto-used when available
- 8GB+ VRAM recommended for smoother local use
## Setup
Run from PowerShell:
py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
.\.venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe download_models.py
## Run
.\run_shell_s2s.bat
## Shell Commands
Enter = record mic and run speech-to-speech
t = type text and hear reply
d = list audio devices
q = quit
## Model Download
The downloader fetches:
Repo: bartowski/Qwen2.5-3B-Instruct-GGUF
File: Qwen2.5-3B-Instruct-Q4_K_M.gguf
The GGUF model file is not committed to this repository.
## Scope
This is a local voice-chat starter. It does not control the computer, run tools, or perform system automation.