maytman12
/

s2s-complete-setup

Automatic Speech Recognition

llama-cpp-python

speech-to-speech

Model card Files Files and versions

s2s-complete-setup / README.md

raichemathew1

Add Hugging Face repo card metadata

48ca412 24 days ago

|

history blame contribute delete

1.48 kB

	---
	license: mit
	language:
	- en
	tags:
	- speech-to-speech
	- faster-whisper
	- qwen
	- gguf
	- windows
	- local-ai
	- terminal
	- sapi
	library_name: llama-cpp-python
	pipeline_tag: automatic-speech-recognition
	---
	# Local S2S Shell Starter

	A simple local speech-to-speech assistant that runs from a Windows terminal.

	## Stack

	- STT: faster-whisper medium
	- LLM: Qwen2.5 3B Instruct GGUF Q4_K_M
	- TTS: Windows SAPI voice
	- UI: terminal only

	## Pipeline

	microphone -> faster-whisper -> Qwen2.5 3B GGUF -> Windows SAPI speech

	## Hardware Target

	- CPU fallback supported
	- NVIDIA GPU auto-used when available
	- 8GB+ VRAM recommended for smoother local use

	## Setup

	Run from PowerShell:

	py -3.11 -m venv .venv
	.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
	.\.venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
	.\.venv\Scripts\python.exe -m pip install -r requirements.txt
	.\.venv\Scripts\python.exe download_models.py

	## Run

	.\run_shell_s2s.bat

	## Shell Commands

	Enter = record mic and run speech-to-speech
	t = type text and hear reply
	d = list audio devices
	q = quit

	## Model Download

	The downloader fetches:

	Repo: bartowski/Qwen2.5-3B-Instruct-GGUF
	File: Qwen2.5-3B-Instruct-Q4_K_M.gguf

	The GGUF model file is not committed to this repository.

	## Scope

	This is a local voice-chat starter. It does not control the computer, run tools, or perform system automation.