gamlin commited on
Commit
71e9395
·
verified ·
0 Parent(s):

initial commit

Browse files
Files changed (2) hide show
  1. .gitattributes +35 -0
  2. README.md +614 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,614 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - vicidial
5
+ - call-center
6
+ - speech
7
+ - analytics
8
+ - call
9
+ ---
10
+
11
+ # Speech Analytics for Call Centers: From Call Recordings to Automated QA Without a Six-Figure Platform
12
+
13
+ **Last updated: March 2026 | Reading time: ~26 minutes** Here is the dirty secret of call center QA: most operations review 1-2% of their calls. A QA analyst listens to maybe 5-10 recordings per agent per month, fills out a scorecard, and hopes that sample is representative. It is not. Two percent coverage means 98% of your calls -- including compliance violations, missed upsells, and the call where your best agent snapped at a customer -- go completely unreviewed. Speech analytics changes that equation. Automated transcription and analysis can process 100% of your calls, flag the ones that matter, and hand your QA team a prioritized list instead of a random sample. McKinsey data shows contact centers using speech analytics see a 10% improvement in customer satisfaction scores. Sprinklr reports 20-30% cost savings and a 40% productivity boost when speech analytics is implemented properly. The problem is price. Enterprise speech...
14
+
15
+ ## Overview
16
+
17
+ **Last updated: March 2026 | Reading time: ~26 minutes**
18
+
19
+ Here is the dirty secret of call center QA: most operations review 1-2% of their calls. A QA analyst listens to maybe 5-10 recordings per agent per month, fills out a scorecard, and hopes that sample is representative.
20
+
21
+ It is not. Two percent coverage means 98% of your calls -- including compliance violations, missed upsells, and the call where your best agent snapped at a customer -- go completely unreviewed.
22
+
23
+ Speech analytics changes that equation. Automated transcription and analysis can process 100% of your calls, flag the ones that matter, and hand your QA team a prioritized list instead of a random sample. McKinsey data shows contact centers using speech analytics see a 10% improvement in customer satisfaction scores. Sprinklr reports 20-30% cost savings and a 40% productivity boost when speech analytics is implemented properly.
24
+
25
+ The problem is price. Enterprise speech analytics platforms from CallMiner, Verint, and NICE run $50K-$200K per year for a mid-sized operation. That prices out most call centers under 100 agents.
26
+
27
+ But the underlying technology -- transcription via Whisper, sentiment analysis via open-source NLP models, keyword detection via pattern matching -- is available for the cost of a decent GPU server and some engineering time. If you already run [VICIdial call recording](/blog/vicidial-call-recording/), you are sitting on a goldmine of unanalyzed audio data.
28
+
29
+ This guide covers how to build a working speech analytics pipeline: from call recordings to transcripts to automated [QA scoring](/blog/vicidial-qa-scoring/), using tools you can actually afford.
30
+
31
+ ## What Speech Analytics Actually Does
32
+
33
+ Strip away the vendor marketing and speech analytics breaks down into four concrete capabilities:
34
+
35
+ ### 1. Transcription (Speech-to-Text)
36
+
37
+ Convert audio recordings into searchable text. This is the foundation. Without accurate transcripts, nothing else works.
38
+
39
+ Modern transcription accuracy with Whisper-class models runs 92-97% word error rate on clean call center audio. That is good enough for keyword detection and sentiment analysis, though you will still want human review on compliance-critical calls.
40
+
41
+ ### 2. Keyword and Phrase Detection
42
+
43
+ Search transcripts for specific words and phrases. The two main use cases:
44
+
45
+ **Compliance monitoring** -- detect when agents skip required disclosures, make unauthorized promises, or use prohibited language.
46
+
47
+ **Sales intelligence** -- identify competitor mentions, objection patterns, buying signals, and pricing discussions.
48
+
49
+ ### 3. Sentiment Analysis
50
+
51
+ Score the emotional tone of the conversation. Most implementations track:
52
+
53
+ - **Agent sentiment** -- are they frustrated, bored, engaged, professional?
54
+ - **Customer sentiment** -- are they angry, confused, interested, ready to buy?
55
+ - **Sentiment trajectory** -- did the call start negative and end positive (good recovery) or start positive and end negative (lost sale)?
56
+
57
+ ### 4. Automated QA Scoring
58
+
59
+ Combine transcription, keywords, and sentiment into an automated quality score for each call. Instead of manually scoring 5 calls per agent per month, the system scores every call and surfaces the outliers for human review.
60
+
61
+ Opus Research found that 68% of companies using speech analytics saw it as a cost-saving tool, and 52% saw direct revenue improvement. The ROI comes from doing more with less -- not replacing QA analysts, but focusing their time on the 5% of calls that actually need human attention.
62
+
63
+ ## Setting Up the Recording Pipeline
64
+
65
+ Speech analytics starts with audio files. If your recordings are bad, your transcripts will be bad, and your analytics will be useless.
66
+
67
+ ### VICIdial Recording Configuration
68
+
69
+ Set recording at the campaign level for full coverage:
70
+
71
+ ```
72
+ Campaign > Detail > Recording: ALLFORCE
73
+ Campaign > Detail > Recording Method: STEREO
74
+ Campaign > Detail > Recording Filename: FULLDATE_AGENT_CUSTPHONE
75
+ ```
76
+
77
+ The key settings:
78
+
79
+ - **ALLFORCE** records every call regardless of agent action. ALLCALLS lets agents control recording (they will forget or skip it).
80
+ - **STEREO** records agent and customer on separate channels. This is critical for speech analytics because it lets you run sentiment analysis on each speaker independently. MONO mixes both channels, making speaker separation impossible.
81
+ - **Filename format** with date, agent, and customer phone makes it easy to correlate recordings with CDR data later.
82
+
83
+ ### Recording Storage and Format
84
+
85
+ VICIdial stores recordings in `/var/spool/asterisk/monitor/` by default, organized by date. The default format is WAV (uncompressed).
86
+
87
+ For speech analytics processing, you want WAV files -- not MP3. Transcription models work better with uncompressed audio. If storage is a concern, compress after transcription:
88
+
89
+ ```bash
90
+ # Check recording storage usage
91
+ du -sh /var/spool/asterisk/monitor/
92
+ du -sh /var/spool/asterisk/monitor/$(date +%Y/%m/%d)/
93
+
94
+ # Count recordings per day
95
+ find /var/spool/asterisk/monitor/$(date +%Y/%m/%d)/ -name "*.wav" | wc -l
96
+ ```
97
+
98
+ A typical 50-agent operation generates 3-5 GB of WAV recordings per day. That is about 1.5 TB per year. Budget your storage accordingly.
99
+
100
+ ### Centralizing Recordings for Processing
101
+
102
+ If you are running a multi-server VICIdial cluster, recordings live on whichever server handled the call. You need to centralize them before processing.
103
+
104
+ Set up an rsync job to pull recordings to your analytics server:
105
+
106
+ ```bash
107
+ #!/bin/bash
108
+ # sync-recordings.sh - Pull recordings from VICIdial servers to analytics box
109
+ ANALYTICS_DIR="/data/recordings"
110
+ VICIDIAL_SERVERS=("vici-web1" "vici-tel1" "vici-tel2")
111
+ TODAY=$(date +%Y/%m/%d)
112
+
113
+ for server in "${VICIDIAL_SERVERS[@]}"; do
114
+ rsync -avz --include="*.wav" --exclude="*" \
115
+ "${server}:/var/spool/asterisk/monitor/${TODAY}/" \
116
+ "${ANALYTICS_DIR}/${TODAY}/"
117
+ done
118
+ ```
119
+
120
+ Run this via cron every hour during operating hours. Set it up on the analytics server (not the VICIdial servers -- you don't want rsync load on your telephony boxes during peak dialing).
121
+
122
+ ## Deploying the Transcription Engine
123
+
124
+ This is where the magic (and the compute cost) lives. You need a speech-to-text model that can process hundreds of recordings per day with acceptable accuracy.
125
+
126
+ ### Option 1: OpenAI Whisper (Open Source, Self-Hosted)
127
+
128
+ Whisper is OpenAI's open-source speech recognition model. The `large-v3` model delivers near-human accuracy on English call center audio. It runs on any NVIDIA GPU with 10+ GB VRAM.
129
+
130
+ Install Whisper on your GPU server:
131
+
132
+ ```bash
133
+ pip install openai-whisper
134
+
135
+ # Or for faster inference, use faster-whisper (CTranslate2 backend)
136
+ pip install faster-whisper
137
+ ```
138
+
139
+ faster-whisper is the practical choice for production. It runs 4x faster than vanilla Whisper with the same accuracy, and uses half the VRAM.
140
+
141
+ ### Batch Transcription Script
142
+
143
+ ```python
144
+ #!/usr/bin/env python3
145
+ """batch_transcribe.py - Transcribe call recordings using faster-whisper"""
146
+
147
+ import os
148
+ import sys
149
+ import json
150
+ import glob
151
+ from datetime import datetime
152
+ from faster_whisper import WhisperModel
153
+
154
+ MODEL_SIZE = "large-v3"
155
+ DEVICE = "cuda" # use "cpu" if no GPU (much slower)
156
+ COMPUTE_TYPE = "float16" # use "int8" on older GPUs for speed
157
+ RECORDINGS_DIR = "/data/recordings"
158
+ TRANSCRIPTS_DIR = "/data/transcripts"
159
+ BATCH_SIZE = 100
160
+
161
+ model = WhisperModel(MODEL_SIZE, device=DEVICE, compute_type=COMPUTE_TYPE)
162
+
163
+ def transcribe_file(wav_path):
164
+ """Transcribe a single WAV file and return segments with timestamps."""
165
+ segments, info = model.transcribe(wav_path, beam_size=5, language="en")
166
+ result = {
167
+ "file": os.path.basename(wav_path),
168
+ "language": info.language,
169
+ "language_probability": round(info.language_probability, 3),
170
+ "duration_seconds": round(info.duration, 1),
171
+ "segments": []
172
+ }
173
+ full_text = []
174
+ for segment in segments:
175
+ result["segments"].append({
176
+ "start": round(segment.start, 2),
177
+ "end": round(segment.end, 2),
178
+ "text": segment.text.strip()
179
+ })
180
+ full_text.append(segment.text.strip())
181
+ result["full_text"] = " ".join(full_text)
182
+ return result
183
+
184
+ def process_day(date_str):
185
+ """Process all recordings for a given date."""
186
+ day_dir = os.path.join(RECORDINGS_DIR, date_str.replace("-", "/"))
187
+ if not os.path.isdir(day_dir):
188
+ print(f"No recordings directory for {date_str}")
189
+ return
190
+
191
+ out_dir = os.path.join(TRANSCRIPTS_DIR, date_str.replace("-", "/"))
192
+ os.makedirs(out_dir, exist_ok=True)
193
+
194
+ wav_files = glob.glob(os.path.join(day_dir, "*.wav"))
195
+ already_done = set(
196
+ f.replace(".json", ".wav")
197
+ for f in os.listdir(out_dir) if f.endswith(".json")
198
+ )
199
+ pending = [f for f in wav_files if os.path.basename(f) not in already_done]
200
+ print(f"{date_str}: {len(pending)} new recordings to transcribe "
201
+ f"({len(already_done)} already done)")
202
+
203
+ for i, wav_path in enumerate(pending[:BATCH_SIZE]):
204
+ try:
205
+ result = transcribe_file(wav_path)
206
+ out_file = os.path.join(
207
+ out_dir,
208
+ os.path.basename(wav_path).replace(".wav", ".json")
209
+ )
210
+ with open(out_file, "w") as f:
211
+ json.dump(result, f, indent=2)
212
+ if (i + 1) % 10 == 0:
213
+ print(f" Transcribed {i+1}/{len(pending[:BATCH_SIZE])}")
214
+ except Exception as e:
215
+ print(f" ERROR transcribing {wav_path}: {e}")
216
+
217
+ if __name__ == "__main__":
218
+ target_date = sys.argv[1] if len(sys.argv) > 1 else datetime.now().strftime("%Y-%m-%d")
219
+ process_day(target_date)
220
+ ```
221
+
222
+ On an NVIDIA RTX 3090 with faster-whisper large-v3, expect about 30 seconds of processing per minute of audio. A 3-minute call takes ~90 seconds to transcribe. A 50-agent operation generating 500 calls per day (average 4 minutes each) needs about 17 hours of GPU time -- just barely doable in 24 hours on a single GPU.
223
+
224
+ For larger operations, run multiple GPUs or use the `medium` model (2x faster, ~3% lower accuracy). The accuracy tradeoff is worth it above 1,000 calls per day.
225
+
226
+ ### Option 2: Cloud Transcription APIs
227
+
228
+ If you do not want to manage GPU infrastructure:
229
+
230
+ | Provider | Cost per Minute | Accuracy | Latency |
231
+ |---|:---:|:---:|:---:|
232
+ | Deepgram | $0.0043 | 95%+ | Real-time |
233
+ | AssemblyAI | $0.0065 | 94%+ | Near real-time |
234
+ | Google Speech-to-Text | $0.009 | 93%+ | Near real-time |
235
+ | AWS Transcribe | $0.024 | 92%+ | Batch |
236
+
237
+ At $0.0043/minute (Deepgram), a 50-agent operation with 500 calls averaging 4 minutes costs about $86/day or ~$2,580/month. Cheaper than enterprise speech analytics platforms, but it adds up.
238
+
239
+ Self-hosted Whisper costs you only the GPU hardware (~$1,500 one-time for a used RTX 3090 or $300/month for a cloud GPU instance), making it the better choice if you have the engineering capacity to maintain it.
240
+
241
+ ## Building Keyword and Compliance Detection
242
+
243
+ With transcripts in hand, the next layer is keyword detection. This is the simplest and highest-ROI part of the pipeline.
244
+
245
+ ### Compliance Keyword Lists
246
+
247
+ Define keyword groups based on what your QA team needs to catch:
248
+
249
+ ```json
250
+ {
251
+ "compliance_required": {
252
+ "description": "Phrases agents MUST say on every call",
253
+ "phrases": [
254
+ "this call may be recorded",
255
+ "this call is being recorded",
256
+ "for quality and training purposes",
257
+ "my name is"
258
+ ],
259
+ "alert_on": "missing"
260
+ },
261
+ "compliance_prohibited": {
262
+ "description": "Phrases agents must NEVER say",
263
+ "phrases": [
264
+ "i guarantee",
265
+ "guaranteed results",
266
+ "no risk",
267
+ "100 percent",
268
+ "i promise"
269
+ ],
270
+ "alert_on": "present"
271
+ },
272
+ "competitor_mentions": {
273
+ "description": "Competitor names for competitive intelligence",
274
+ "phrases": [
275
+ "five9", "convoso", "genesys", "talkdesk",
276
+ "ringcentral", "nice incontact", "dialpad"
277
+ ],
278
+ "alert_on": "present"
279
+ },
280
+ "buying_signals": {
281
+ "description": "Positive buying indicators",
282
+ "phrases": [
283
+ "how much does it cost",
284
+ "what are the terms",
285
+ "when can we start",
286
+ "send me the contract",
287
+ "sounds good"
288
+ ],
289
+ "alert_on": "present"
290
+ },
291
+ "objection_patterns": {
292
+ "description": "Common customer objections",
293
+ "phrases": [
294
+ "too expensive",
295
+ "not interested",
296
+ "already have",
297
+ "call me back",
298
+ "take me off your list"
299
+ ],
300
+ "alert_on": "present"
301
+ }
302
+ }
303
+ ```
304
+
305
+ ### Keyword Detection Script
306
+
307
+ ```python
308
+ #!/usr/bin/env python3
309
+ """keyword_scan.py - Scan transcripts for compliance and intelligence keywords"""
310
+
311
+ import json
312
+ import os
313
+ import glob
314
+ from datetime import datetime
315
+
316
+ KEYWORDS_FILE = "/data/analytics/keyword_config.json"
317
+ TRANSCRIPTS_DIR = "/data/transcripts"
318
+
319
+ def load_keywords():
320
+ with open(KEYWORDS_FILE) as f:
321
+ return json.load(f)
322
+
323
+ def scan_transcript(transcript, keywords):
324
+ """Scan a single transcript against all keyword groups."""
325
+ text_lower = transcript["full_text"].lower()
326
+ results = {"file": transcript["file"], "flags": [], "score": 100}
327
+
328
+ for group_name, group in keywords.items():
329
+ matched = [p for p in group["phrases"] if p.lower() in text_lower]
330
+
331
+ if group["alert_on"] == "missing":
332
+ missing = [p for p in group["phrases"] if p.lower() not in text_lower]
333
+ if missing:
334
+ results["flags"].append({
335
+ "group": group_name,
336
+ "type": "missing_required",
337
+ "missing_phrases": missing,
338
+ "severity": "high"
339
+ })
340
+ results["score"] -= 15 * len(missing)
341
+
342
+ elif group["alert_on"] == "present" and matched:
343
+ severity = "high" if "prohibited" in group_name else "info"
344
+ results["flags"].append({
345
+ "group": group_name,
346
+ "type": "detected",
347
+ "matched_phrases": matched,
348
+ "severity": severity
349
+ })
350
+ if "prohibited" in group_name:
351
+ results["score"] -= 25 * len(matched)
352
+
353
+ results["score"] = max(0, results["score"])
354
+ return results
355
+
356
+ def scan_day(date_str):
357
+ """Scan all transcripts for a given date."""
358
+ keywords = load_keywords()
359
+ day_dir = os.path.join(TRANSCRIPTS_DIR, date_str.replace("-", "/"))
360
+ if not os.path.isdir(day_dir):
361
+ return []
362
+
363
+ results = []
364
+ for json_file in glob.glob(os.path.join(day_dir, "*.json")):
365
+ with open(json_file) as f:
366
+ transcript = json.load(f)
367
+ result = scan_transcript(transcript, keywords)
368
+ results.append(result)
369
+
370
+ flagged = [r for r in results if r["flags"]]
371
+ print(f"{date_str}: {len(results)} calls scanned, "
372
+ f"{len(flagged)} flagged ({len(flagged)/max(len(results),1)*100:.1f}%)")
373
+
374
+ high_severity = [r for r in results if any(
375
+ fl["severity"] == "high" for fl in r["flags"]
376
+ )]
377
+ if high_severity:
378
+ print(f" HIGH SEVERITY FLAGS: {len(high_severity)} calls need immediate review")
379
+ for r in high_severity[:5]:
380
+ print(f" {r['file']}: score={r['score']}, "
381
+ f"flags={[fl['group'] for fl in r['flags']]}")
382
+
383
+ return results
384
+ ```
385
+
386
+ This runs in seconds even on thousands of transcripts because it is just string matching. No GPU needed. Schedule it to run right after transcription completes.
387
+
388
+ ## Adding Sentiment Analysis
389
+
390
+ Sentiment scoring tells you how calls *feel*, not just what was said. An agent who hits every compliance checkbox but sounds dead inside is still going to lose sales.
391
+
392
+ ### Lightweight Sentiment Scoring
393
+
394
+ You do not need a massive NLP model for call center sentiment. A fine-tuned transformer running on CPU handles it fine. The `cardiffnlp/twitter-roberta-base-sentiment-latest` model from Hugging Face is a solid starting point -- it is small, fast, and accurate enough for conversation segments.
395
+
396
+ ```python
397
+ #!/usr/bin/env python3
398
+ """sentiment_score.py - Score transcript segments for sentiment"""
399
+
400
+ from transformers import pipeline
401
+ import json
402
+ import os
403
+ import sys
404
+
405
+ sentiment_model = pipeline(
406
+ "sentiment-analysis",
407
+ model="cardiffnlp/twitter-roberta-base-sentiment-latest",
408
+ device=-1 # CPU; use 0 for GPU
409
+ )
410
+
411
+ LABEL_MAP = {"positive": 1.0, "neutral": 0.0, "negative": -1.0}
412
+
413
+ def score_transcript(transcript_path):
414
+ """Score each segment and compute call-level sentiment metrics."""
415
+ with open(transcript_path) as f:
416
+ data = json.load(f)
417
+
418
+ segments = data.get("segments", [])
419
+ if not segments:
420
+ return None
421
+
422
+ scored_segments = []
423
+ for seg in segments:
424
+ text = seg["text"].strip()
425
+ if len(text) < 10:
426
+ continue
427
+ try:
428
+ result = sentiment_model(text[:512])[0]
429
+ scored_segments.append({
430
+ "start": seg["start"],
431
+ "end": seg["end"],
432
+ "text": text,
433
+ "sentiment": result["label"],
434
+ "confidence": round(result["score"], 3),
435
+ "sentiment_value": LABEL_MAP.get(result["label"], 0.0)
436
+ })
437
+ except Exception:
438
+ pass
439
+
440
+ if not scored_segments:
441
+ return None
442
+
443
+ values = [s["sentiment_value"] for s in scored_segments]
444
+ n = len(values)
445
+ first_third = values[:n//3] if n >= 3 else values
446
+ last_third = values[-(n//3):] if n >= 3 else values
447
+
448
+ return {
449
+ "file": data["file"],
450
+ "total_segments": len(scored_segments),
451
+ "average_sentiment": round(sum(values) / len(values), 3),
452
+ "opening_sentiment": round(sum(first_third) / max(len(first_third), 1), 3),
453
+ "closing_sentiment": round(sum(last_third) / max(len(last_third), 1), 3),
454
+ "sentiment_trajectory": round(
455
+ (sum(last_third) / max(len(last_third), 1)) -
456
+ (sum(first_third) / max(len(first_third), 1)), 3
457
+ ),
458
+ "negative_segment_count": sum(1 for s in scored_segments if s["sentiment"] == "negative"),
459
+ "positive_segment_count": sum(1 for s in scored_segments if s["sentiment"] == "positive"),
460
+ "segments": scored_segments
461
+ }
462
+ ```
463
+
464
+ ### What the Sentiment Numbers Mean
465
+
466
+ | Metric | Good Range | Warning | Action Required |
467
+ |---|:---:|:---:|---|
468
+ | Average Sentiment | 0.1 to 0.5 | -0.1 to 0.1 | Below -0.1 |
469
+ | Opening Sentiment | 0.2+ | 0.0 to 0.2 | Below 0.0 |
470
+ | Closing Sentiment | 0.3+ | 0.0 to 0.3 | Below 0.0 |
471
+ | Sentiment Trajectory | Positive (going up) | Flat | Negative (going down) |
472
+ | Negative Segment % | Under 15% | 15-30% | Over 30% |
473
+
474
+ The most important metric is **sentiment trajectory**. A call that starts negative and ends positive means the agent recovered the situation. A call that starts positive and ends negative means the agent lost the customer during the conversation -- and that is where your coaching should focus.
475
+
476
+ ## Connecting Analytics to Your QA Workflow
477
+
478
+ The analytics pipeline produces three outputs: transcripts, keyword flags, and sentiment scores. The last step is turning those into actionable QA workflow.
479
+
480
+ ### Priority-Based Review Queue
481
+
482
+ Instead of random sampling, build a review queue that prioritizes calls based on risk:
483
+
484
+ ```python
485
+ def build_review_queue(keyword_results, sentiment_results):
486
+ """Merge keyword and sentiment data into a prioritized review queue."""
487
+ queue = []
488
+ sentiment_lookup = {s["file"]: s for s in sentiment_results if s}
489
+
490
+ for kr in keyword_results:
491
+ sent = sentiment_lookup.get(kr["file"], {})
492
+ priority = 0
493
+
494
+ # High-severity keyword flags
495
+ high_flags = [f for f in kr["flags"] if f["severity"] == "high"]
496
+ priority += len(high_flags) * 30
497
+
498
+ # Low keyword QA score
499
+ if kr["score"] < 60:
500
+ priority += 20
501
+
502
+ # Negative sentiment trajectory
503
+ trajectory = sent.get("sentiment_trajectory", 0)
504
+ if trajectory < -0.3:
505
+ priority += 25
506
+
507
+ # High negative segment count
508
+ neg_pct = sent.get("negative_segment_count", 0) / max(
509
+ sent.get("total_segments", 1), 1)
510
+ if neg_pct > 0.3:
511
+ priority += 15
512
+
513
+ if priority > 0:
514
+ queue.append({
515
+ "file": kr["file"],
516
+ "priority": priority,
517
+ "keyword_score": kr["score"],
518
+ "flags": kr["flags"],
519
+ "sentiment_avg": sent.get("average_sentiment"),
520
+ "sentiment_trajectory": trajectory,
521
+ "review_reasons": []
522
+ })
523
+ if high_flags:
524
+ queue[-1]["review_reasons"].append("compliance_flag")
525
+ if trajectory < -0.3:
526
+ queue[-1]["review_reasons"].append("negative_trajectory")
527
+ if kr["score"] < 60:
528
+ queue[-1]["review_reasons"].append("low_qa_score")
529
+
530
+ queue.sort(key=lambda x: -x["priority"])
531
+ return queue
532
+ ```
533
+
534
+ This gives your QA team a sorted list where the worst calls float to the top. A compliance violation with negative sentiment trajectory gets reviewed first. A clean call with neutral sentiment gets skipped entirely.
535
+
536
+ ### Agent-Level Reporting
537
+
538
+ Aggregate the per-call data to agent-level metrics for coaching:
539
+
540
+ ```sql
541
+ SELECT
542
+ agent_id,
543
+ COUNT(*) AS total_calls,
544
+ AVG(keyword_score) AS avg_qa_score,
545
+ AVG(avg_sentiment) AS avg_sentiment,
546
+ AVG(sentiment_trajectory) AS avg_trajectory,
547
+ SUM(CASE WHEN keyword_score < 60 THEN 1 ELSE 0 END) AS low_score_calls,
548
+ SUM(CASE WHEN has_compliance_flag = 1 THEN 1 ELSE 0 END) AS compliance_flags
549
+ FROM call_analytics
550
+ WHERE call_date >= DATE_SUB(NOW(), INTERVAL 7 DAY)
551
+ GROUP BY agent_id
552
+ ORDER BY avg_qa_score ASC;
553
+ ```
554
+
555
+ Agents with consistently low QA scores or high compliance flag counts need targeted coaching. Agents with negative sentiment trajectories might be burning out -- route them to the conversation about workload and [break scheduling](/blog/call-center-agent-burnout/).
556
+
557
+ ### The Full Pipeline Schedule
558
+
559
+ Put it all together with a cron schedule on your analytics server:
560
+
561
+ ```bash
562
+ # crontab for speech analytics pipeline
563
+ # Sync recordings every hour during business hours
564
+ 0 8-20 * * 1-5 /data/scripts/sync-recordings.sh >> /var/log/analytics/sync.log 2>&1
565
+
566
+ # Transcribe previous day's recordings overnight
567
+ 0 22 * * * python3 /data/scripts/batch_transcribe.py $(date -d yesterday +%Y-%m-%d) >> /var/log/analytics/transcribe.log 2>&1
568
+
569
+ # Run keyword scan after transcription
570
+ 0 4 * * * python3 /data/scripts/keyword_scan.py $(date -d yesterday +%Y-%m-%d) >> /var/log/analytics/keywords.log 2>&1
571
+
572
+ # Run sentiment scoring after keywords
573
+ 0 5 * * * python3 /data/scripts/sentiment_score.py $(date -d yesterday +%Y-%m-%d) >> /var/log/analytics/sentiment.log 2>&1
574
+
575
+ # Generate review queue and agent reports
576
+ 0 6 * * * python3 /data/scripts/build_reports.py $(date -d yesterday +%Y-%m-%d) >> /var/log/analytics/reports.log 2>&1
577
+ ```
578
+
579
+ By 6 AM, yesterday's calls are fully transcribed, scanned, scored, and prioritized. Your QA team starts their day with a ready-to-go review queue instead of randomly picking recordings.
580
+
581
+ ## Cost Comparison: Build vs. Buy
582
+
583
+ Here is the honest math on building this yourself versus buying an enterprise platform:
584
+
585
+ | Component | Self-Hosted Cost | Enterprise Platform |
586
+ |---|:---:|:---:|
587
+ | Transcription (500 calls/day) | $300/mo (GPU) or $2,500/mo (API) | Included |
588
+ | Sentiment Analysis | $0 (open-source model, CPU) | Included |
589
+ | Keyword Detection | $0 (pattern matching) | Included |
590
+ | QA Dashboard | Engineering time (40-80 hours) | Included |
591
+ | Real-Time Alerts | Engineering time (20-40 hours) | Included |
592
+ | Agent Coaching Workflow | Manual process | Automated |
593
+ | Total Year 1 | $4K-$30K + 60-120 hrs engineering | $50K-$200K |
594
+ | Total Year 2+ | $4K-$30K/year | $50K-$200K/year |
595
+
596
+ The break-even point depends on your engineering capacity. If you have a developer who can build and maintain the pipeline, self-hosting saves $20K-$170K per year. If you don't, the engineering cost might push you toward a mid-tier vendor like Observe.AI or Level AI that costs $30K-$60K.
597
+
598
+ For VICIdial operations specifically, the self-hosted route makes more sense because you already have the infrastructure -- the recordings, the database, the server environment. The teams at [ViciStack](https://vicistack.com/) have deployed this pipeline across multiple call centers, and the typical implementation takes 2-3 weeks from recording configuration to working QA dashboard.
599
+
600
+ ## What to Do First
601
+
602
+ Start with transcription. Just get your recordings turned into text. Even without keyword detection or sentiment analysis, searchable transcripts change how your QA team works.
603
+
604
+ The next step is the compliance keyword scan -- it is the highest-ROI piece because a single compliance violation can cost more than the entire analytics pipeline.
605
+
606
+ Sentiment scoring comes last. It is valuable for coaching and agent development, but it does not prevent fires the way compliance monitoring does.
607
+
608
+ If you want this built and running in two weeks instead of two months, [talk to us](https://vicistack.com/contact/). We have done this integration enough times that the architecture decisions are already made -- it is just configuration and deployment at this point.
609
+
610
+ ## Resources
611
+
612
+ - [Read the full article](https://vicistack.com/blog/speech-analytics-call-center/) on ViciStack
613
+ - [ViciStack](https://vicistack.com) - VICIdial hosting and optimization
614
+ - [Free VICIdial Audit](https://vicistack.com/free-audit/)