---
license: mit
language:
- en
tags:
- speech-to-speech
- faster-whisper
- qwen
- gguf
- windows
- local-ai
- terminal
- sapi
library_name: llama-cpp-python
pipeline_tag: automatic-speech-recognition
---
# Local S2S Shell Starter

A simple local speech-to-speech assistant that runs from a Windows terminal.

## Stack

- STT: faster-whisper medium
- LLM: Qwen2.5 3B Instruct GGUF Q4_K_M
- TTS: Windows SAPI voice
- UI: terminal only

## Pipeline

microphone -> faster-whisper -> Qwen2.5 3B GGUF -> Windows SAPI speech

## Hardware Target

- CPU fallback supported
- NVIDIA GPU auto-used when available
- 8GB+ VRAM recommended for smoother local use

## Setup

Run from PowerShell:

py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
.\.venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe download_models.py

## Run

.\run_shell_s2s.bat

## Shell Commands

Enter = record mic and run speech-to-speech
t     = type text and hear reply
d     = list audio devices
q     = quit

## Model Download

The downloader fetches:

Repo: bartowski/Qwen2.5-3B-Instruct-GGUF
File: Qwen2.5-3B-Instruct-Q4_K_M.gguf

The GGUF model file is not committed to this repository.

## Scope

This is a local voice-chat starter. It does not control the computer, run tools, or perform system automation.