File size: 2,328 Bytes
a60e353
a159b10
 
 
a60e353
 
 
 
 
 
 
a159b10
b4837af
a159b10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
title: Word Counter
emoji: 🎀
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.10.0
app_file: app.py
pinned: false
---

# Word Counter

A real-time speech recognition app that counts occurrences of a user-specified word from microphone input.

## How to Use

1. Enter the word you want to count in the **Target Word** field
2. Click the microphone button to start recording
3. Speak naturally - the app will transcribe your speech in real-time
4. Watch the counter increment each time it detects your target word
5. Click **Reset** to clear the counter and start over

## Features

- 🎀 Real-time speech recognition using OpenAI Whisper
- πŸ”’ Live counter updates
- 🎯 Case-insensitive word matching with proper word boundaries
- πŸ”„ Reset functionality
- πŸ’» Runs on CPU (no GPU required)

## Technical Details

- **Speech Recognition**: Whisper Tiny model for fast CPU inference
- **Framework**: Gradio for the user interface
- **Word Detection**: Regex-based matching with word boundaries to avoid false positives

## Architecture

The app is built with separation of concerns:
- `counter/` - Counter state management and word detection logic
- `model/` - Speech recognition wrapper
- `ui/` - Gradio interface components (now integrated in app.py)
- `app.py` - Main application with all components wired together

## Local Development

### Prerequisites

Install ffmpeg (required for audio processing):

**macOS:**
```bash
brew install ffmpeg
```

**Ubuntu/Debian:**
```bash
sudo apt-get install ffmpeg
```

**Windows:**
```bash
# Using chocolatey
choco install ffmpeg

# Or download from https://ffmpeg.org/download.html
```

### Setup

```bash
# Install dependencies with uv
uv sync

# (Optional) Configure Hugging Face token for faster downloads
# Copy .env.example to .env and add your token
cp .env.example .env
# Edit .env and add your HF token: HF_TOKEN="your_token_here"

# Run the app
uv run python app.py
```

**Note:** The HF_TOKEN is optional but recommended to avoid rate limits and warnings when downloading models from Hugging Face Hub. Get your token at https://huggingface.co/settings/tokens

## Testing

```bash
# Test counter logic
uv run python test_counter.py

# Test speech recognition
uv run python test_speech.py [optional-audio-file.wav]

# Test UI
uv run python test_ui.py
```