File size: 6,415 Bytes
9154e2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# English Accent Detection Tool

A practical AI tool that analyzes English accents from video content. Built for REM Waste's hiring automation system.

## 🚀 Live Demo

**Deployed App:** [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)

## Features

- **Video Processing**: Accepts public video URLs (MP4, Loom, etc.)
- **Audio Extraction**: Automatically extracts audio from video files
- **Speech Transcription**: Converts speech to text using Google Speech Recognition
- **Accent Analysis**: Detects English accents with confidence scoring
- **Web Interface**: Simple Streamlit UI for easy testing

## Supported Accents

- American English
- British English  
- Australian English
- Canadian English
- South African English

## Quick Start

### Method 1: Use the Deployed App (Recommended)

1. Visit: [https://accent-detector.streamlit.app](https://accent-detector.streamlit.app)
2. Paste a public video URL
3. Click "Analyze Accent"
4. View results with confidence scores

### Method 2: Local Installation

```bash
# Clone or download the script
git clone <repository-url>
cd accent-detector

# Install dependencies
pip install -r requirements.txt

# Install ffmpeg (required for video processing)
# On macOS:
brew install ffmpeg

# On Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg

# On Windows:
# Download from https://ffmpeg.org/download.html

# Run the app
streamlit run accent_detector.py
```

## Installation

1. Clone this repository and navigate to the project folder.
2. (Recommended) Create and activate a Python virtual environment:
   ```sh
   python3 -m venv ad_venv
   source ad_venv/bin/activate
   ```
3. Install all dependencies:
   ```sh
   pip install -r requirements.txt
   ```
4. (Optional, but recommended for better performance) Install Watchdog:
   ```sh
   xcode-select --install  # macOS only, for build tools
   pip install watchdog
   ```

## Usage Examples

### Test URLs
```
# Direct MP4 link
https://sample-videos.com/zip/10/mp4/SampleVideo_1280x720_1mb.mp4

# Loom video (public)
https://www.loom.com/share/your-video-id

# Google Drive (public)
https://drive.google.com/file/d/your-file-id/view
```

### Expected Output
```json
{
  "accent": "American",
  "confidence": 78.5,
  "explanation": "High confidence in American accent with strong linguistic indicators.",
  "all_scores": {
    "American": 78.5,
    "British": 23.1,
    "Australian": 15.7,
    "Canadian": 19.2,
    "South African": 8.3
  }
}
```

## Technical Architecture

### Core Components

1. **Video Downloader**: Downloads videos from public URLs
2. **Audio Extractor**: Uses ffmpeg to extract WAV audio
3. **Speech Recognizer**: Google Speech Recognition API
4. **Accent Analyzer**: Pattern matching for linguistic markers
5. **Web Interface**: Streamlit-based UI

### Accent Detection Algorithm

The system analyzes multiple linguistic features:

- **Vocabulary Patterns**: Accent-specific word choices
- **Phonetic Markers**: Pronunciation characteristics  
- **Spelling Patterns**: Regional spelling differences
- **Linguistic Markers**: Characteristic phrases and expressions

### Confidence Scoring

- **0-20%**: Insufficient markers detected
- **21-50%**: Moderate confidence with limited indicators
- **51-75%**: Good confidence with multiple patterns
- **76-100%**: High confidence with strong linguistic evidence

## API Integration

For programmatic access, use the core `AccentDetector` class:

```python
from accent_detector import AccentDetector

detector = AccentDetector()
result = detector.process_video("https://your-video-url.com/video.mp4")

print(f"Accent: {result['accent']}")
print(f"Confidence: {result['confidence']}%")
```

## Deployment

### Streamlit Cloud (Recommended)

1. Fork this repository
2. Connect to Streamlit Cloud
3. Deploy from your GitHub repo
4. Share the public URL

### Docker Deployment

```dockerfile
FROM python:3.9-slim

# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8501

CMD ["streamlit", "run", "accent_detector.py", "--server.port=8501", "--server.address=0.0.0.0"]
```

## Limitations & Considerations

### Current Limitations
- Requires clear speech audio (background noise affects accuracy)
- Works best with 30+ seconds of speech
- Free Google Speech Recognition has daily limits
- Accent detection based on vocabulary/patterns, not phonetic analysis

### Potential Improvements
- Integrate phonetic analysis libraries
- Add more accent varieties (Indian, Irish, etc.)
- Implement batch processing for multiple videos
- Add voice activity detection for better audio segmentation

## Testing

### Manual Testing
1. Test with different accent samples
2. Verify confidence scores are reasonable
3. Check error handling with invalid URLs
4. Test with various video formats

### Automated Testing
```python
def test_accent_detection():
    detector = AccentDetector()
    
    # Test American accent
    american_text = "I'm gonna grab some cookies from the elevator"
    scores = detector.analyze_accent_patterns(american_text)
    assert scores['American'] > scores['British']
    
    # Test British accent  
    british_text = "That's brilliant, quite lovely indeed"
    scores = detector.analyze_accent_patterns(british_text)
    assert scores['British'] > scores['American']
```

## Performance Metrics

- **Video Download**: ~10-30 seconds (depends on file size)
- **Audio Extraction**: ~5-15 seconds
- **Speech Recognition**: ~10-30 seconds
- **Accent Analysis**: <1 second
- **Total Processing**: ~30-90 seconds per video

## Troubleshooting

### Common Issues

**Error: "Could not understand the audio"**
- Solution: Ensure clear speech, minimal background noise

**Error: "Failed to download video"**  
- Solution: Verify URL is public and accessible

**Error: "ffmpeg not found"**
- Solution: Install ffmpeg system dependency

**Low confidence scores**
- Solution: Ensure longer speech samples (30+ seconds)

### Support

For technical issues or feature requests:
1. Check the error messages in the Streamlit interface
2. Verify all dependencies are installed correctly  
3. Test with known working video URLs

## License

MIT License - Free for commercial and personal use.

---

**Built for REM Waste Interview Challenge**  
*Practical AI tools for automated hiring decisions*