File size: 1,478 Bytes
48ca412
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de2ad9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: mit
language:
- en
tags:
- speech-to-speech
- faster-whisper
- qwen
- gguf
- windows
- local-ai
- terminal
- sapi
library_name: llama-cpp-python
pipeline_tag: automatic-speech-recognition
---
# Local S2S Shell Starter

A simple local speech-to-speech assistant that runs from a Windows terminal.

## Stack

- STT: faster-whisper medium
- LLM: Qwen2.5 3B Instruct GGUF Q4_K_M
- TTS: Windows SAPI voice
- UI: terminal only

## Pipeline

microphone -> faster-whisper -> Qwen2.5 3B GGUF -> Windows SAPI speech

## Hardware Target

- CPU fallback supported
- NVIDIA GPU auto-used when available
- 8GB+ VRAM recommended for smoother local use

## Setup

Run from PowerShell:

py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
.\.venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe download_models.py

## Run

.\run_shell_s2s.bat

## Shell Commands

Enter = record mic and run speech-to-speech
t     = type text and hear reply
d     = list audio devices
q     = quit

## Model Download

The downloader fetches:

Repo: bartowski/Qwen2.5-3B-Instruct-GGUF
File: Qwen2.5-3B-Instruct-Q4_K_M.gguf

The GGUF model file is not committed to this repository.

## Scope

This is a local voice-chat starter. It does not control the computer, run tools, or perform system automation.