SreekarB commited on
Commit
4848a6a
·
verified ·
1 Parent(s): 5827742

Upload 31 files

Browse files
README.md CHANGED
@@ -1,12 +1,123 @@
1
- ---
2
- title: CASLLiveKit
3
- emoji: 🏢
4
- colorFrom: red
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 5.23.3
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CASL Voice Bot
2
+
3
+ A speech pathology assistant using AI to assess students' speaking abilities based on the CASL-2 framework. This application helps speech-language pathologists (SLPs) with speech assessment in school settings.
4
+
5
+ ## Implementations
6
+
7
+ This project provides multiple implementations:
8
+
9
+ 1. **LiveKit Implementation** - Uses LiveKit agents with OpenAI's real-time voice API for low-latency, high-quality audio streaming.
10
+ 2. **Direct API Implementation** - Uses OpenAI's API directly without LiveKit, for simpler deployment.
11
+ 3. **Hugging Face Spaces** - An adaptive implementation that works on Hugging Face Spaces, automatically detecting whether LiveKit is available.
12
+
13
+ ## Features
14
+
15
+ - Voice-to-voice interaction with AI speech pathologist
16
+ - CASL-2 framework assessment
17
+ - Real-time assessment tracking
18
+ - Session recording and saving
19
+ - Custom note-taking for SLPs
20
+ - Gradio web interface for easy sharing and use in school settings
21
+
22
+ ## CASL-2 Assessment Areas
23
+
24
+ The AI speech pathologist assesses students in these key areas:
25
+
26
+ 1. **Lexical/Semantic Skills**: Vocabulary knowledge, word meanings, and contextual word use
27
+ 2. **Syntactic Skills**: Grammar and sentence structure understanding
28
+ 3. **Supralinguistic Skills**: Higher-level language skills beyond literal meanings
29
+ 4. **Pragmatic Skills**: Language use in social contexts (less emphasis for younger students)
30
+
31
+ ## Setup Instructions
32
+
33
+ ### Prerequisites
34
+
35
+ - Python 3.8+
36
+ - OpenAI API key with access to GPT-4o and TTS models
37
+
38
+ ### Installation
39
+
40
+ 1. Clone the repository:
41
+ ```
42
+ git clone https://github.com/yourusername/CASLVoiceBot.git
43
+ cd CASLVoiceBot
44
+ ```
45
+
46
+ 2. Create a virtual environment and install dependencies:
47
+ ```
48
+ python -m venv venv
49
+ source venv/bin/activate # On Windows: venv\Scripts\activate
50
+ pip install -r requirements.txt
51
+ ```
52
+
53
+ 3. For LiveKit implementation, install LiveKit dependencies:
54
+ ```
55
+ # Edit requirements.txt to uncomment the livekit-agents line
56
+ pip install livekit-agents>=0.7.0
57
+ ```
58
+
59
+ 4. Set up environment variables:
60
+ ```
61
+ cp .env.example .env
62
+ ```
63
+ Then edit `.env` to add your OpenAI API key.
64
+
65
+ ### Running the Application
66
+
67
+ #### LiveKit Implementation (recommended for best performance)
68
+ ```
69
+ python run_livekit.py
70
+ ```
71
+
72
+ #### Direct API Implementation (simpler deployment)
73
+ ```
74
+ python run_direct.py
75
+ ```
76
+
77
+ #### Command Line Options
78
+ Both implementations support these options:
79
+ - `--share`: Share the app publicly (enabled by default)
80
+ - `--local`: Run the app locally without sharing
81
+
82
+ ## Deployment on Hugging Face Spaces
83
+
84
+ 1. Create a new Space on Hugging Face with the Gradio SDK
85
+ 2. Upload the repository contents to the Space
86
+ 3. Add your OPENAI_API_KEY as a secret in the Space settings
87
+
88
+ By default, the Hugging Face Spaces deployment will try to use LiveKit if available, and fall back to direct API if not.
89
+
90
+ ## Project Structure
91
+
92
+ ```
93
+ CASLVoiceBot/
94
+ ├── app.py # Hugging Face Spaces entry point
95
+ ├── run_direct.py # Direct API implementation runner
96
+ ├── run_livekit.py # LiveKit implementation runner
97
+ ├── requirements.txt # Common dependencies
98
+ ├── .env.example # Environment variables template
99
+ ├── implementations/
100
+ │ ├── common/ # Shared utilities
101
+ │ │ ├── casl_utils.py # CASL-2 assessment utilities
102
+ │ ├── direct/ # Direct API implementation
103
+ │ │ ├── app.py # Direct OpenAI API app
104
+ │ ├── livekit/ # LiveKit implementation
105
+ │ │ ├── app.py # LiveKit app
106
+ │ │ ├── livekit_gradio_hf.py # HF-compatible LiveKit app
107
+ ├── session_data/ # Saved session data
108
+ ```
109
+
110
+ ## Usage
111
+
112
+ 1. Optionally enter a Student ID to track sessions
113
+ 2. Select your preferred AI voice
114
+ 3. Click "Start Session" to begin a speech assessment
115
+ 4. Wait for the AI to introduce itself, then speak when prompted
116
+ 5. View real-time assessment in the interface
117
+ 6. SLPs can add notes throughout the session
118
+ 7. Save the session when finished
119
+ 8. Click "Stop Session" to end
120
+
121
+ ## License
122
+
123
+ [MIT License](LICENSE)
README_livekit.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CASL Voice Bot with LiveKit
2
+
3
+ A speech pathology assistant using LiveKit agents with OpenAI's real-time voice capabilities. This application helps speech-language pathologists (SLPs) assess students' speaking abilities based on the CASL-2 framework.
4
+
5
+ ## Features
6
+
7
+ - Real-time voice interaction with AI speech pathologist using LiveKit
8
+ - OpenAI's GPT-4o for intelligent conversation
9
+ - CASL-2 framework assessment
10
+ - Real-time assessment tracking
11
+ - Session recording and saving
12
+ - Custom note-taking for SLPs
13
+ - Gradio web interface for easy sharing and use in school settings
14
+
15
+ ## CASL-2 Assessment Areas
16
+
17
+ The AI speech pathologist assesses students in these key areas:
18
+
19
+ 1. **Lexical/Semantic Skills**: Vocabulary knowledge, word meanings, and contextual word use
20
+ 2. **Syntactic Skills**: Grammar and sentence structure understanding
21
+ 3. **Supralinguistic Skills**: Higher-level language skills beyond literal meanings
22
+ 4. **Pragmatic Skills**: Language use in social contexts (less emphasis for younger students)
23
+
24
+ ## Setup Instructions
25
+
26
+ ### Prerequisites
27
+
28
+ - Python 3.8+
29
+ - OpenAI API key with access to GPT-4o and TTS models
30
+ - Created using the LiveKit multimodal agent template
31
+
32
+ ### Installation
33
+
34
+ 1. Clone the repository:
35
+ ```
36
+ git clone https://github.com/yourusername/CASLVoiceBot.git
37
+ cd CASLVoiceBot
38
+ ```
39
+
40
+ 2. Create a virtual environment and install dependencies:
41
+ ```
42
+ python -m venv venv
43
+ source venv/bin/activate # On Windows: venv\Scripts\activate
44
+ pip install -r livekit_requirements.txt
45
+ ```
46
+
47
+ 3. Set up environment variables:
48
+ ```
49
+ cp .env.example .env
50
+ ```
51
+ Then edit `.env` to add your OpenAI API key.
52
+
53
+ ### Running the Application
54
+
55
+ 1. Start the application:
56
+ ```
57
+ python run_livekit.py
58
+ ```
59
+
60
+ 2. Access the application through the URL provided in the terminal.
61
+
62
+ ## Usage
63
+
64
+ 1. Optionally enter a Student ID to track sessions
65
+ 2. Select your preferred AI voice
66
+ 3. Click "Start Session" to begin a speech assessment
67
+ 4. Wait for the AI to introduce itself, then speak when prompted
68
+ 5. View real-time assessment in the interface
69
+ 6. SLPs can add notes throughout the session
70
+ 7. Save the session when finished
71
+ 8. Click "Stop Session" to end
72
+
73
+ ## Benefits of Using LiveKit
74
+
75
+ - **Real-time Audio Processing**: LiveKit provides robust real-time audio streaming capabilities
76
+ - **Low Latency**: Minimizes delay between student speech and AI response
77
+ - **WebRTC Infrastructure**: Built on the same technology used for video calls
78
+ - **Connection Management**: Automatically handles connection issues and reconnections
79
+ - **Scalability**: Can support multiple concurrent sessions if needed
80
+ - **Agent Integration**: LiveKit's agent system is designed specifically for AI assistants
81
+
82
+ ## Deployment on Hugging Face Spaces
83
+
84
+ For deployment on Hugging Face Spaces, additional configuration may be required due to LiveKit's WebRTC requirements. Please refer to LiveKit documentation for details on setting up appropriate server configurations.
85
+
86
+ ## License
87
+
88
+ [MIT License](LICENSE)
__pycache__/app_main.cpython-311.pyc ADDED
Binary file (27.2 kB). View file
 
app.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ CASL Voice Bot - Hugging Face Spaces entry point
5
+ """
6
+
7
+ # Import the adaptive app that works in both LiveKit and direct modes
8
+ from implementations.livekit.livekit_gradio_hf import app
9
+
10
+ # This is the entry point that Hugging Face Spaces will use
11
+ if __name__ == "__main__":
12
+ app.launch()
app/.env.example ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ # OpenAI API Key
2
+ OPENAI_API_KEY=your_openai_api_key
app/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """
2
+ CASL Voice Bot - AI Speech Therapist
3
+ """
4
+
5
+ __version__ = "0.1.0"
app/advanced_features.py ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Advanced features that can be added to the CASL Voice Bot application.
5
+ This module contains extensions that SLPs might want to add to the base system.
6
+ """
7
+
8
+ import os
9
+ import pandas as pd
10
+ import datetime
11
+ import json
12
+ from pathlib import Path
13
+
14
+ class SessionRecorder:
15
+ """Records session data for later analysis and progress tracking"""
16
+
17
+ def __init__(self, storage_dir="session_data"):
18
+ self.storage_dir = storage_dir
19
+ Path(storage_dir).mkdir(exist_ok=True)
20
+ self.current_session = {
21
+ "timestamp": datetime.datetime.now().isoformat(),
22
+ "student_id": None,
23
+ "transcript": [],
24
+ "assessment": {}
25
+ }
26
+
27
+ def set_student_id(self, student_id):
28
+ """Set the student ID for the current session"""
29
+ self.current_session["student_id"] = student_id
30
+
31
+ def add_transcript_entry(self, speaker, text):
32
+ """Add an entry to the transcript"""
33
+ self.current_session["transcript"].append({
34
+ "timestamp": datetime.datetime.now().isoformat(),
35
+ "speaker": speaker,
36
+ "text": text
37
+ })
38
+
39
+ def add_assessment_note(self, category, note):
40
+ """Add an assessment note for a CASL-2 category"""
41
+ if category not in self.current_session["assessment"]:
42
+ self.current_session["assessment"][category] = []
43
+
44
+ self.current_session["assessment"][category].append({
45
+ "timestamp": datetime.datetime.now().isoformat(),
46
+ "note": note
47
+ })
48
+
49
+ def save_session(self):
50
+ """Save the current session to a JSON file"""
51
+ student_id = self.current_session["student_id"] or "anonymous"
52
+ timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
53
+ filename = f"{student_id}_{timestamp}.json"
54
+
55
+ with open(os.path.join(self.storage_dir, filename), 'w') as f:
56
+ json.dump(self.current_session, f, indent=2)
57
+
58
+ return filename
59
+
60
+
61
+ class CASLAnalyzer:
62
+ """Analyzes transcripts based on CASL-2 framework categories"""
63
+
64
+ def __init__(self):
65
+ self.categories = {
66
+ "lexical_semantic": {
67
+ "description": "Vocabulary knowledge and word meanings",
68
+ "keywords": ["synonym", "antonym", "vocabulary", "word choice", "meaning"]
69
+ },
70
+ "syntactic": {
71
+ "description": "Grammar and sentence structure",
72
+ "keywords": ["grammar", "sentence", "verb tense", "agreement", "structure"]
73
+ },
74
+ "supralinguistic": {
75
+ "description": "Higher-level language skills",
76
+ "keywords": ["inference", "figurative", "metaphor", "context", "implied"]
77
+ },
78
+ "pragmatic": {
79
+ "description": "Social use of language",
80
+ "keywords": ["conversation", "social", "turn-taking", "appropriate", "context"]
81
+ }
82
+ }
83
+
84
+ def categorize_text(self, text):
85
+ """Categorize text into CASL-2 framework categories"""
86
+ result = {}
87
+
88
+ for category, info in self.categories.items():
89
+ score = 0
90
+ for keyword in info["keywords"]:
91
+ if keyword.lower() in text.lower():
92
+ score += 1
93
+
94
+ if score > 0:
95
+ result[category] = score
96
+
97
+ return result
98
+
99
+ def generate_summary(self, transcript):
100
+ """Generate a summary of the transcript based on CASL-2 categories"""
101
+ all_text = " ".join([entry["text"] for entry in transcript])
102
+ categorization = self.categorize_text(all_text)
103
+
104
+ summary = {
105
+ "categories_covered": list(categorization.keys()),
106
+ "focus_areas": sorted(categorization.items(), key=lambda x: x[1], reverse=True),
107
+ "recommendations": []
108
+ }
109
+
110
+ # Generate recommendations based on categories covered
111
+ for category in self.categories:
112
+ if category not in categorization:
113
+ summary["recommendations"].append(
114
+ f"Consider adding more {self.categories[category]['description']} exercises"
115
+ )
116
+
117
+ return summary
118
+
119
+
120
+ class VoiceMetricsAnalyzer:
121
+ """Analyzes voice metrics for speech patterns"""
122
+
123
+ def __init__(self):
124
+ self.metrics = {
125
+ "word_count": 0,
126
+ "unique_words": set(),
127
+ "sentence_count": 0,
128
+ "average_words_per_sentence": 0,
129
+ "hesitations": 0,
130
+ "speech_rate": 0 # words per minute
131
+ }
132
+
133
+ def analyze_text(self, text, duration_seconds=None):
134
+ """Analyze text for speech metrics"""
135
+ # Count words
136
+ words = text.split()
137
+ self.metrics["word_count"] = len(words)
138
+ self.metrics["unique_words"] = set(word.lower() for word in words)
139
+
140
+ # Count sentences
141
+ sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
142
+ self.metrics["sentence_count"] = len(sentences)
143
+
144
+ # Calculate average words per sentence
145
+ if self.metrics["sentence_count"] > 0:
146
+ self.metrics["average_words_per_sentence"] = self.metrics["word_count"] / self.metrics["sentence_count"]
147
+
148
+ # Count hesitations ("um", "uh", "like", etc.)
149
+ hesitation_markers = ["um", "uh", "er", "like", "you know"]
150
+ self.metrics["hesitations"] = sum(1 for word in words if word.lower() in hesitation_markers)
151
+
152
+ # Calculate speech rate if duration is provided
153
+ if duration_seconds:
154
+ self.metrics["speech_rate"] = (self.metrics["word_count"] / duration_seconds) * 60
155
+
156
+ return self.metrics
157
+
158
+ def get_summary(self):
159
+ """Get a summary of the voice metrics analysis"""
160
+ return {
161
+ "word_count": self.metrics["word_count"],
162
+ "vocabulary_diversity": len(self.metrics["unique_words"]) / max(1, self.metrics["word_count"]),
163
+ "average_words_per_sentence": self.metrics["average_words_per_sentence"],
164
+ "hesitation_frequency": self.metrics["hesitations"] / max(1, self.metrics["word_count"]),
165
+ "speech_rate": self.metrics["speech_rate"]
166
+ }
167
+
168
+
169
+ # These classes can be imported and used to extend the base CASL Voice Bot with additional features
app/gradio_app.py ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ import os
4
+ import asyncio
5
+ import gradio as gr
6
+ import logging
7
+ import tempfile
8
+ import queue
9
+ import threading
10
+ import time
11
+ from dotenv import load_dotenv
12
+ from livekit import agents
13
+ from openai import AsyncOpenAI
14
+
15
+ # Load environment variables
16
+ load_dotenv()
17
+
18
+ # Set up logging
19
+ logging.basicConfig(level=logging.INFO)
20
+ logger = logging.getLogger(__name__)
21
+
22
+ # Initialize OpenAI client
23
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
24
+
25
+ # Speech Pathologist Agent Prompt
26
+ SPEECH_PATHOLOGIST_PROMPT = """
27
+ You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
28
+ Your are working with a student with speech impediments typically with ASD
29
+ You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
30
+ Each domain from the CASL-2 framework can be analyzed using the sample:
31
+ Lexical/Semantic Skills:
32
+ This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
33
+ Key Subtests:
34
+ Antonyms: Identifying words with opposite meanings.
35
+ Synonyms: Identifying words with similar meanings.
36
+ Idiomatic Language: Understanding and interpreting idioms and figurative language.
37
+ Evaluate vocabulary diversity (type-token ratio).
38
+ Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
39
+ Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
40
+ Syntactic Skills:
41
+ This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
42
+ Key Subtests:
43
+ Sentence Expression: Producing grammatically correct sentences based on prompts.
44
+ Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
45
+ Examine sentence structure for grammatical accuracy.
46
+ Identify errors in verb tense, subject-verb agreement, or sentence complexity.
47
+ Note the use of clauses, conjunctions, and varied sentence types.
48
+ Supralinguistic Skills:
49
+ This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
50
+ Key Subtests:
51
+ Inferences: Understanding information that is not explicitly stated.
52
+ Meaning from Context: Deriving meaning from surrounding text or dialogue.
53
+ Nonliteral Language: Interpreting figurative language, such as metaphors or irony
54
+ Look for use or understanding of figurative language, idioms, or humor.
55
+ Assess ability to handle ambiguous or implied meanings in context.
56
+ Identify advanced language use for abstract or hypothetical ideas.
57
+ Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
58
+ This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
59
+ Key Subtests:
60
+ Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
61
+
62
+ Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
63
+ """
64
+
65
+ class GradioInputDevice(agents.InputDevice):
66
+ """Custom input device that works with Gradio"""
67
+
68
+ def __init__(self):
69
+ super().__init__()
70
+ self.audio_queue = queue.Queue()
71
+ self.is_active = True
72
+
73
+ async def receive(self) -> agents.AudioChunk:
74
+ """Receive audio data from the queue"""
75
+ while self.is_active:
76
+ try:
77
+ audio_data = self.audio_queue.get(block=True, timeout=0.1)
78
+ return audio_data
79
+ except queue.Empty:
80
+ await asyncio.sleep(0.1)
81
+
82
+ return None
83
+
84
+ def add_audio(self, audio_data):
85
+ """Add audio data to the queue"""
86
+ # Convert gradio audio format to AudioChunk
87
+ sample_rate, audio_array = audio_data
88
+ audio_chunk = agents.AudioChunk(
89
+ samples=audio_array,
90
+ sample_rate=sample_rate,
91
+ is_last=False
92
+ )
93
+ self.audio_queue.put(audio_chunk)
94
+
95
+ def stop(self):
96
+ """Stop the input device"""
97
+ self.is_active = False
98
+
99
+
100
+ class GradioOutputDevice(agents.OutputDevice):
101
+ """Custom output device that works with Gradio"""
102
+
103
+ def __init__(self):
104
+ super().__init__()
105
+ self.output_queue = queue.Queue()
106
+
107
+ async def transmit(self, audio_chunk: agents.AudioChunk) -> None:
108
+ """Transmit audio chunk to the queue"""
109
+ if audio_chunk is not None:
110
+ self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
111
+
112
+ def get_latest_audio(self):
113
+ """Get the latest audio from the queue"""
114
+ try:
115
+ return self.output_queue.get(block=False)
116
+ except queue.Empty:
117
+ return None
118
+
119
+
120
+ class SpeechPathologistAssistant:
121
+ """Speech pathologist assistant using LiveKit agents and Gradio"""
122
+
123
+ def __init__(self):
124
+ self.input_device = GradioInputDevice()
125
+ self.output_device = GradioOutputDevice()
126
+ self.assistant = None
127
+ self.assistant_task = None
128
+ self.transcript = []
129
+ self.is_running = False
130
+
131
+ async def initialize_assistant(self):
132
+ """Initialize the speech assistant"""
133
+ self.assistant = agents.VoiceAssistant(
134
+ openai_client=openai_client,
135
+ model="gpt-4o",
136
+ voice="shimmer", # Using a friendly, professional voice
137
+ input_device=self.input_device,
138
+ output_device=self.output_device,
139
+ initial_message=SPEECH_PATHOLOGIST_PROMPT,
140
+ real_time=True, # Enable real-time processing
141
+ )
142
+
143
+ # Add transcript callback
144
+ self.assistant.on_transcript = self.on_transcript
145
+ self.assistant.on_response = self.on_response
146
+
147
+ def on_transcript(self, transcript):
148
+ """Handle transcript from user"""
149
+ self.transcript.append(f"Student: {transcript.text}")
150
+ return True
151
+
152
+ def on_response(self, response):
153
+ """Handle response from assistant"""
154
+ self.transcript.append(f"Speech Pathologist: {response.text}")
155
+ return True
156
+
157
+ async def start_assistant(self):
158
+ """Start the assistant in a background task"""
159
+ if not self.assistant:
160
+ await self.initialize_assistant()
161
+
162
+ self.is_running = True
163
+ self.assistant_task = asyncio.create_task(self.assistant.run())
164
+
165
+ def stop_assistant(self):
166
+ """Stop the assistant"""
167
+ if self.assistant_task and not self.assistant_task.done():
168
+ self.assistant_task.cancel()
169
+
170
+ self.input_device.stop()
171
+ self.is_running = False
172
+
173
+ def process_audio(self, audio):
174
+ """Process audio from Gradio interface"""
175
+ if not self.is_running or audio is None:
176
+ return None, self.get_transcript()
177
+
178
+ self.input_device.add_audio(audio)
179
+
180
+ # Check for assistant output
181
+ output_audio = self.output_device.get_latest_audio()
182
+
183
+ return output_audio, self.get_transcript()
184
+
185
+ def get_transcript(self):
186
+ """Get the current transcript"""
187
+ return "\n".join(self.transcript)
188
+
189
+
190
+ # Create the assistant instance
191
+ speech_assistant = SpeechPathologistAssistant()
192
+
193
+
194
+ def start_session():
195
+ """Start the speech pathology session"""
196
+ asyncio.run(speech_assistant.start_assistant())
197
+ return "Session started. Please speak to begin the assessment."
198
+
199
+
200
+ def stop_session():
201
+ """Stop the speech pathology session"""
202
+ speech_assistant.stop_assistant()
203
+ return "Session stopped."
204
+
205
+
206
+ def process_audio(audio):
207
+ """Process audio from microphone"""
208
+ if audio is None:
209
+ return None, speech_assistant.get_transcript()
210
+
211
+ return speech_assistant.process_audio(audio)
212
+
213
+
214
+ # Create Gradio Interface
215
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
216
+ gr.Markdown("# Speech Pathology Assistant")
217
+ gr.Markdown("This tool provides speech therapy assessment based on the CASL-2 framework")
218
+
219
+ with gr.Row():
220
+ with gr.Column(scale=1):
221
+ start_button = gr.Button("Start Session")
222
+ stop_button = gr.Button("Stop Session")
223
+ status = gr.Textbox(label="Status", value="Ready to start")
224
+
225
+ with gr.Column(scale=2):
226
+ audio_input = gr.Audio(
227
+ label="Speak",
228
+ type="microphone",
229
+ streaming=True,
230
+ autoplay=True
231
+ )
232
+ audio_output = gr.Audio(label="Response")
233
+
234
+ with gr.Row():
235
+ transcript = gr.Textbox(label="Transcript", lines=10)
236
+
237
+ # Setup event handlers
238
+ start_button.click(fn=start_session, outputs=status)
239
+ stop_button.click(fn=stop_session, outputs=status)
240
+
241
+ # Setup continuous audio processing
242
+ audio_input.stream(
243
+ fn=process_audio,
244
+ inputs=audio_input,
245
+ outputs=[audio_output, transcript]
246
+ )
247
+
248
+
249
+ def main():
250
+ """Main function to launch the Gradio app"""
251
+ app.launch(share=True)
252
+
253
+
254
+ if __name__ == "__main__":
255
+ main()
app/huggingface_app.py ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Hugging Face Spaces deployment file for CASL Voice Bot
5
+ This file is specifically designed for deploying on Hugging Face Spaces
6
+ """
7
+
8
+ import os
9
+ import asyncio
10
+ import gradio as gr
11
+ import logging
12
+ import tempfile
13
+ import queue
14
+ import threading
15
+ import time
16
+ from dotenv import load_dotenv
17
+ from livekit import agents
18
+ from openai import AsyncOpenAI
19
+
20
+ # Load environment variables (will be set in HF Spaces secrets)
21
+ load_dotenv()
22
+
23
+ # Set up logging
24
+ logging.basicConfig(level=logging.INFO)
25
+ logger = logging.getLogger(__name__)
26
+
27
+ # Initialize OpenAI client with API key from environment
28
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
29
+
30
+ # Speech Pathologist Agent Prompt
31
+ SPEECH_PATHOLOGIST_PROMPT = """
32
+ You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
33
+ Your are working with a student with speech impediments typically with ASD
34
+ You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
35
+ Each domain from the CASL-2 framework can be analyzed using the sample:
36
+ Lexical/Semantic Skills:
37
+ This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
38
+ Key Subtests:
39
+ Antonyms: Identifying words with opposite meanings.
40
+ Synonyms: Identifying words with similar meanings.
41
+ Idiomatic Language: Understanding and interpreting idioms and figurative language.
42
+ Evaluate vocabulary diversity (type-token ratio).
43
+ Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
44
+ Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
45
+ Syntactic Skills:
46
+ This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
47
+ Key Subtests:
48
+ Sentence Expression: Producing grammatically correct sentences based on prompts.
49
+ Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
50
+ Examine sentence structure for grammatical accuracy.
51
+ Identify errors in verb tense, subject-verb agreement, or sentence complexity.
52
+ Note the use of clauses, conjunctions, and varied sentence types.
53
+ Supralinguistic Skills:
54
+ This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
55
+ Key Subtests:
56
+ Inferences: Understanding information that is not explicitly stated.
57
+ Meaning from Context: Deriving meaning from surrounding text or dialogue.
58
+ Nonliteral Language: Interpreting figurative language, such as metaphors or irony
59
+ Look for use or understanding of figurative language, idioms, or humor.
60
+ Assess ability to handle ambiguous or implied meanings in context.
61
+ Identify advanced language use for abstract or hypothetical ideas.
62
+ Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
63
+ This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
64
+ Key Subtests:
65
+ Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
66
+
67
+ Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
68
+ """
69
+
70
+ # Custom audio processing for Gradio interface
71
+ class AudioProcessor:
72
+ def __init__(self):
73
+ self.transcript = []
74
+ self.is_active = False
75
+ self.voice_model = "shimmer" # Default voice
76
+
77
+ async def process_speech(self, audio_data, openai_client):
78
+ """Process speech using OpenAI's API"""
79
+ if not self.is_active or audio_data is None:
80
+ return None, "\n".join(self.transcript)
81
+
82
+ # Prepare audio file for OpenAI
83
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
84
+ temp_file.close()
85
+
86
+ try:
87
+ # Save audio data to temporary file
88
+ sample_rate, audio_array = audio_data
89
+ import scipy.io.wavfile
90
+ scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
91
+
92
+ # Transcribe audio using OpenAI
93
+ with open(temp_file.name, "rb") as audio_file:
94
+ transcript_response = await openai_client.audio.transcriptions.create(
95
+ file=audio_file,
96
+ model="whisper-1"
97
+ )
98
+
99
+ user_text = transcript_response.text
100
+ if user_text.strip():
101
+ self.transcript.append(f"Student: {user_text}")
102
+
103
+ # Generate assistant response
104
+ chat_response = await openai_client.chat.completions.create(
105
+ model="gpt-4o",
106
+ messages=[
107
+ {"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
108
+ {"role": "user", "content": user_text}
109
+ ]
110
+ )
111
+
112
+ assistant_text = chat_response.choices[0].message.content
113
+ self.transcript.append(f"Speech Pathologist: {assistant_text}")
114
+
115
+ # Generate speech from text
116
+ speech_response = await openai_client.audio.speech.create(
117
+ model="tts-1",
118
+ voice=self.voice_model,
119
+ input=assistant_text
120
+ )
121
+
122
+ # Save speech to temporary file
123
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
124
+ response_temp_file.close()
125
+
126
+ speech_response.stream_to_file(response_temp_file.name)
127
+
128
+ # Load audio data for Gradio
129
+ import soundfile as sf
130
+ audio_data, sample_rate = sf.read(response_temp_file.name)
131
+
132
+ # Clean up
133
+ os.unlink(response_temp_file.name)
134
+
135
+ return (sample_rate, audio_data), "\n".join(self.transcript)
136
+
137
+ except Exception as e:
138
+ logger.error(f"Error processing speech: {e}")
139
+ self.transcript.append(f"Error: {str(e)}")
140
+ finally:
141
+ # Clean up temp file
142
+ os.unlink(temp_file.name)
143
+
144
+ return None, "\n".join(self.transcript)
145
+
146
+ def start_session(self, voice_model):
147
+ """Start a new session"""
148
+ self.is_active = True
149
+ self.voice_model = voice_model if voice_model else "shimmer"
150
+ self.transcript = []
151
+ self.transcript.append("Session started. The AI Speech Pathologist will speak first.")
152
+ return "Session active. Please wait for the AI to introduce itself."
153
+
154
+ def stop_session(self):
155
+ """Stop the current session"""
156
+ self.is_active = False
157
+ return "Session stopped."
158
+
159
+
160
+ # Create the audio processor instance
161
+ audio_processor = AudioProcessor()
162
+
163
+
164
+ async def start_session(voice_model):
165
+ """Start the speech pathology session"""
166
+ status = audio_processor.start_session(voice_model)
167
+
168
+ # Generate initial AI introduction
169
+ try:
170
+ # Generate assistant response
171
+ chat_response = await openai_client.chat.completions.create(
172
+ model="gpt-4o",
173
+ messages=[
174
+ {"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
175
+ {"role": "user", "content": "Hello"} # Initial trigger
176
+ ]
177
+ )
178
+
179
+ assistant_text = chat_response.choices[0].message.content
180
+ audio_processor.transcript.append(f"Speech Pathologist: {assistant_text}")
181
+
182
+ # Generate speech from text
183
+ speech_response = await openai_client.audio.speech.create(
184
+ model="tts-1",
185
+ voice=audio_processor.voice_model,
186
+ input=assistant_text
187
+ )
188
+
189
+ # Save speech to temporary file
190
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
191
+ response_temp_file.close()
192
+
193
+ speech_response.stream_to_file(response_temp_file.name)
194
+
195
+ # Load audio data for Gradio
196
+ import soundfile as sf
197
+ audio_data, sample_rate = sf.read(response_temp_file.name)
198
+
199
+ # Clean up
200
+ os.unlink(response_temp_file.name)
201
+
202
+ return status, (sample_rate, audio_data), "\n".join(audio_processor.transcript)
203
+
204
+ except Exception as e:
205
+ logger.error(f"Error starting session: {e}")
206
+ audio_processor.transcript.append(f"Error: {str(e)}")
207
+ return status, None, "\n".join(audio_processor.transcript)
208
+
209
+
210
+ def stop_session():
211
+ """Stop the speech pathology session"""
212
+ return audio_processor.stop_session(), None, "\n".join(audio_processor.transcript)
213
+
214
+
215
+ async def process_mic_input(audio, progress=gr.Progress()):
216
+ """Process microphone input"""
217
+ if audio is None or not audio_processor.is_active:
218
+ return None, "\n".join(audio_processor.transcript)
219
+
220
+ progress(0, desc="Processing speech...")
221
+ audio_output, transcript = await audio_processor.process_speech(audio, openai_client)
222
+ progress(1, desc="Done")
223
+ return audio_output, transcript
224
+
225
+
226
+ # Create Gradio Interface
227
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
228
+ gr.Markdown("# CASL-2 Speech Pathology Assistant")
229
+ gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
230
+
231
+ with gr.Row():
232
+ with gr.Column(scale=1):
233
+ voice_select = gr.Dropdown(
234
+ ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
235
+ value="shimmer",
236
+ label="Assistant Voice"
237
+ )
238
+ start_button = gr.Button("Start Session", variant="primary")
239
+ stop_button = gr.Button("Stop Session", variant="stop")
240
+ status = gr.Textbox(label="Status", value="Ready to start")
241
+
242
+ with gr.Column(scale=2):
243
+ audio_output = gr.Audio(label="AI Speech", autoplay=True)
244
+ audio_input = gr.Audio(
245
+ label="Speak to the AI",
246
+ type="microphone",
247
+ source="microphone",
248
+ streaming=True
249
+ )
250
+
251
+ with gr.Row():
252
+ transcript = gr.Textbox(label="Transcript", lines=10)
253
+
254
+ with gr.Accordion("About This Application", open=False):
255
+ gr.Markdown("""
256
+ ### About CASL-2 Speech Pathology Assistant
257
+
258
+ This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
259
+
260
+ - **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
261
+ - **Syntactic Skills**: Grammar and sentence structure
262
+ - **Supralinguistic Skills**: Higher-level language beyond literal meanings
263
+ - **Pragmatic Skills**: Social use of language (less emphasis for younger students)
264
+
265
+ The AI will provide structured assessments and exercises to help evaluate speech patterns.
266
+
267
+ ### How to Use
268
+
269
+ 1. Select the AI voice you prefer
270
+ 2. Click "Start Session" to begin
271
+ 3. The AI will introduce itself and begin the assessment
272
+ 4. Speak into your microphone when it's your turn
273
+ 5. View the transcript to track the conversation
274
+ 6. Click "Stop Session" when finished
275
+
276
+ ### For Speech-Language Pathologists
277
+
278
+ This tool is designed to supplement, not replace, professional SLP services. The source code is available for customization to meet specific assessment needs.
279
+ """)
280
+
281
+ # Setup event handlers
282
+ start_button.click(
283
+ fn=lambda voice: asyncio.run(start_session(voice)),
284
+ inputs=voice_select,
285
+ outputs=[status, audio_output, transcript]
286
+ )
287
+ stop_button.click(fn=stop_session, outputs=[status, audio_output, transcript])
288
+
289
+ # Setup audio processing
290
+ audio_input.stream(
291
+ fn=lambda audio: asyncio.run(process_mic_input(audio)),
292
+ inputs=audio_input,
293
+ outputs=[audio_output, transcript]
294
+ )
295
+
296
+ # Launch the app
297
+ app.launch(share=True)
app/main.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ import os
4
+ import asyncio
5
+ import logging
6
+ from dotenv import load_dotenv
7
+ from livekit import agents
8
+ from livekit.agents import InputDevice, OutputDevice
9
+ from openai import AsyncOpenAI
10
+
11
+ # Load environment variables
12
+ load_dotenv()
13
+
14
+ # Set up logging
15
+ logging.basicConfig(level=logging.INFO)
16
+ logger = logging.getLogger(__name__)
17
+
18
+ # Initialize OpenAI client
19
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
20
+
21
+ # Speech Pathologist Agent Prompt
22
+ SPEECH_PATHOLOGIST_PROMPT = """
23
+ You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
24
+ Your are working with a student with speech impediments typically with ASD
25
+ You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
26
+ Each domain from the CASL-2 framework can be analyzed using the sample:
27
+ Lexical/Semantic Skills:
28
+ This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
29
+ Key Subtests:
30
+ Antonyms: Identifying words with opposite meanings.
31
+ Synonyms: Identifying words with similar meanings.
32
+ Idiomatic Language: Understanding and interpreting idioms and figurative language.
33
+ Evaluate vocabulary diversity (type-token ratio).
34
+ Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
35
+ Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
36
+ Syntactic Skills:
37
+ This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
38
+ Key Subtests:
39
+ Sentence Expression: Producing grammatically correct sentences based on prompts.
40
+ Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
41
+ Examine sentence structure for grammatical accuracy.
42
+ Identify errors in verb tense, subject-verb agreement, or sentence complexity.
43
+ Note the use of clauses, conjunctions, and varied sentence types.
44
+ Supralinguistic Skills:
45
+ This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
46
+ Key Subtests:
47
+ Inferences: Understanding information that is not explicitly stated.
48
+ Meaning from Context: Deriving meaning from surrounding text or dialogue.
49
+ Nonliteral Language: Interpreting figurative language, such as metaphors or irony
50
+ Look for use or understanding of figurative language, idioms, or humor.
51
+ Assess ability to handle ambiguous or implied meanings in context.
52
+ Identify advanced language use for abstract or hypothetical ideas.
53
+ Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
54
+ This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
55
+ Key Subtests:
56
+ Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
57
+
58
+ Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
59
+ """
60
+
61
+ async def run_speech_pathology_session(input_device: InputDevice, output_device: OutputDevice):
62
+ """
63
+ Run the speech pathology session using LiveKit agents and OpenAI
64
+ """
65
+ logger.info("Starting speech pathology session")
66
+
67
+ # Create the speech assistant
68
+ assistant = agents.VoiceAssistant(
69
+ openai_client=openai_client,
70
+ model="gpt-4o",
71
+ voice="shimmer", # Using a friendly, professional voice
72
+ input_device=input_device,
73
+ output_device=output_device,
74
+ initial_message=SPEECH_PATHOLOGIST_PROMPT,
75
+ real_time=True, # Enable real-time processing
76
+ )
77
+
78
+ # Run the assistant
79
+ await assistant.run()
80
+
81
+ async def main():
82
+ """
83
+ Main function to run the voice assistant.
84
+ """
85
+ logger.info("Initializing speech pathology assistant")
86
+
87
+ # Create devices
88
+ input_device = agents.BasicInputDevice()
89
+ output_device = agents.BasicOutputDevice()
90
+
91
+ try:
92
+ # Run the session
93
+ await run_speech_pathology_session(input_device, output_device)
94
+ except Exception as e:
95
+ logger.error(f"Error in speech pathology session: {e}")
96
+ finally:
97
+ logger.info("Speech pathology session ended")
98
+
99
+ if __name__ == "__main__":
100
+ asyncio.run(main())
app/report_generator.py ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Report generator for CASL Voice Bot.
5
+ This module generates assessment reports based on session data.
6
+ """
7
+
8
+ import os
9
+ import json
10
+ import pandas as pd
11
+ import matplotlib.pyplot as plt
12
+ from pathlib import Path
13
+ from datetime import datetime
14
+ import jinja2
15
+
16
+
17
+ class CASLReportGenerator:
18
+ """Generates reports from session data"""
19
+
20
+ def __init__(self, session_data_dir="session_data", reports_dir="reports"):
21
+ """Initialize the report generator"""
22
+ self.session_data_dir = session_data_dir
23
+ self.reports_dir = reports_dir
24
+
25
+ # Create directories if they don't exist
26
+ Path(session_data_dir).mkdir(exist_ok=True)
27
+ Path(reports_dir).mkdir(exist_ok=True)
28
+
29
+ # Set up Jinja2 template environment
30
+ self.template_loader = jinja2.FileSystemLoader(searchpath="./templates")
31
+ self.template_env = jinja2.Environment(loader=self.template_loader)
32
+
33
+ def load_session_data(self, filename=None, student_id=None):
34
+ """Load session data from file or by student ID"""
35
+ if filename:
36
+ with open(os.path.join(self.session_data_dir, filename), 'r') as f:
37
+ return json.load(f)
38
+
39
+ elif student_id:
40
+ # Find all files for this student
41
+ files = [f for f in os.listdir(self.session_data_dir)
42
+ if f.startswith(f"{student_id}_") and f.endswith(".json")]
43
+
44
+ if not files:
45
+ return None
46
+
47
+ # Sort by date (newest first) and load the most recent
48
+ files.sort(reverse=True)
49
+ with open(os.path.join(self.session_data_dir, files[0]), 'r') as f:
50
+ return json.load(f)
51
+
52
+ return None
53
+
54
+ def load_all_student_sessions(self, student_id):
55
+ """Load all sessions for a specific student"""
56
+ files = [f for f in os.listdir(self.session_data_dir)
57
+ if f.startswith(f"{student_id}_") and f.endswith(".json")]
58
+
59
+ sessions = []
60
+ for file in sorted(files):
61
+ with open(os.path.join(self.session_data_dir, file), 'r') as f:
62
+ sessions.append(json.load(f))
63
+
64
+ return sessions
65
+
66
+ def extract_casl_metrics(self, session_data):
67
+ """Extract CASL-2 metrics from session data"""
68
+ metrics = {
69
+ "lexical_semantic": 0,
70
+ "syntactic": 0,
71
+ "supralinguistic": 0,
72
+ "pragmatic": 0
73
+ }
74
+
75
+ # Count assessment notes per category
76
+ assessment = session_data.get("assessment", {})
77
+ for category, notes in assessment.items():
78
+ if category in metrics:
79
+ metrics[category] = len(notes)
80
+
81
+ return metrics
82
+
83
+ def generate_progress_chart(self, student_id, output_path=None):
84
+ """Generate a progress chart for a student"""
85
+ sessions = self.load_all_student_sessions(student_id)
86
+
87
+ if not sessions:
88
+ return None
89
+
90
+ # Extract dates and metrics
91
+ dates = []
92
+ metrics = {
93
+ "lexical_semantic": [],
94
+ "syntactic": [],
95
+ "supralinguistic": [],
96
+ "pragmatic": []
97
+ }
98
+
99
+ for session in sessions:
100
+ dates.append(datetime.fromisoformat(session["timestamp"]).strftime("%m/%d/%Y"))
101
+ session_metrics = self.extract_casl_metrics(session)
102
+
103
+ for category in metrics:
104
+ metrics[category].append(session_metrics.get(category, 0))
105
+
106
+ # Create chart
107
+ plt.figure(figsize=(10, 6))
108
+ for category, values in metrics.items():
109
+ plt.plot(dates, values, marker='o', label=category.replace('_', ' ').title())
110
+
111
+ plt.title(f"CASL-2 Assessment Progress for Student {student_id}")
112
+ plt.xlabel("Session Date")
113
+ plt.ylabel("Assessment Score")
114
+ plt.legend()
115
+ plt.xticks(rotation=45)
116
+ plt.tight_layout()
117
+
118
+ # Save or return
119
+ if output_path:
120
+ plt.savefig(output_path)
121
+ return output_path
122
+ else:
123
+ chart_path = os.path.join(self.reports_dir, f"{student_id}_progress.png")
124
+ plt.savefig(chart_path)
125
+ return chart_path
126
+
127
+ def generate_session_summary(self, session_data):
128
+ """Generate a summary of a single session"""
129
+ if not session_data:
130
+ return None
131
+
132
+ # Extract basic info
133
+ timestamp = datetime.fromisoformat(session_data["timestamp"])
134
+ student_id = session_data.get("student_id", "anonymous")
135
+
136
+ # Extract transcript
137
+ transcript = session_data.get("transcript", [])
138
+
139
+ # Calculate metrics
140
+ word_count = 0
141
+ student_turns = 0
142
+
143
+ for entry in transcript:
144
+ if entry.get("speaker") == "Student":
145
+ text = entry.get("text", "")
146
+ words = text.split()
147
+ word_count += len(words)
148
+ student_turns += 1
149
+
150
+ # Get CASL-2 metrics
151
+ casl_metrics = self.extract_casl_metrics(session_data)
152
+
153
+ # Create summary
154
+ summary = {
155
+ "date": timestamp.strftime("%m/%d/%Y"),
156
+ "time": timestamp.strftime("%H:%M"),
157
+ "student_id": student_id,
158
+ "duration_minutes": len(transcript) // 2, # Approximate based on turns
159
+ "student_turns": student_turns,
160
+ "total_words": word_count,
161
+ "average_words_per_turn": word_count / max(1, student_turns),
162
+ "casl_metrics": casl_metrics
163
+ }
164
+
165
+ return summary
166
+
167
+ def generate_html_report(self, student_id, output_path=None):
168
+ """Generate an HTML report for a student"""
169
+ # Load all sessions for the student
170
+ sessions = self.load_all_student_sessions(student_id)
171
+
172
+ if not sessions:
173
+ return None
174
+
175
+ # Generate progress chart
176
+ chart_path = self.generate_progress_chart(student_id)
177
+
178
+ # Get latest session data
179
+ latest_session = sessions[-1]
180
+ latest_summary = self.generate_session_summary(latest_session)
181
+
182
+ # Calculate overall progress
183
+ if len(sessions) > 1:
184
+ first_metrics = self.extract_casl_metrics(sessions[0])
185
+ latest_metrics = self.extract_casl_metrics(sessions[-1])
186
+
187
+ progress = {}
188
+ for category in first_metrics:
189
+ if first_metrics[category] > 0:
190
+ progress[category] = (latest_metrics[category] - first_metrics[category]) / first_metrics[category]
191
+ else:
192
+ progress[category] = 0 if latest_metrics[category] == 0 else 1
193
+ else:
194
+ progress = {category: 0 for category in latest_summary["casl_metrics"]}
195
+
196
+ # Prepare report data
197
+ report_data = {
198
+ "student_id": student_id,
199
+ "report_date": datetime.now().strftime("%m/%d/%Y"),
200
+ "session_count": len(sessions),
201
+ "latest_session": latest_summary,
202
+ "progress": progress,
203
+ "chart_path": os.path.basename(chart_path),
204
+ "recommendations": self.generate_recommendations(sessions)
205
+ }
206
+
207
+ # Load and render template
208
+ try:
209
+ template = self.template_env.get_template("report_template.html")
210
+ report_html = template.render(**report_data)
211
+
212
+ # Save report
213
+ if not output_path:
214
+ output_path = os.path.join(self.reports_dir, f"{student_id}_report.html")
215
+
216
+ with open(output_path, 'w') as f:
217
+ f.write(report_html)
218
+
219
+ return output_path
220
+
221
+ except jinja2.exceptions.TemplateNotFound:
222
+ # Create a simple report if template is not found
223
+ report = f"CASL-2 Assessment Report for Student {student_id}\n"
224
+ report += f"Report Date: {report_data['report_date']}\n"
225
+ report += f"Total Sessions: {report_data['session_count']}\n\n"
226
+
227
+ report += "Latest Session Summary:\n"
228
+ for key, value in latest_summary.items():
229
+ if key != "casl_metrics":
230
+ report += f" {key}: {value}\n"
231
+
232
+ report += "\nCASL-2 Metrics:\n"
233
+ for category, value in latest_summary["casl_metrics"].items():
234
+ report += f" {category}: {value}\n"
235
+
236
+ report += "\nRecommendations:\n"
237
+ for rec in report_data["recommendations"]:
238
+ report += f" - {rec}\n"
239
+
240
+ # Save simple report
241
+ if not output_path:
242
+ output_path = os.path.join(self.reports_dir, f"{student_id}_report.txt")
243
+
244
+ with open(output_path, 'w') as f:
245
+ f.write(report)
246
+
247
+ return output_path
248
+
249
+ def generate_recommendations(self, sessions):
250
+ """Generate recommendations based on session data"""
251
+ if not sessions:
252
+ return []
253
+
254
+ latest_session = sessions[-1]
255
+ metrics = self.extract_casl_metrics(latest_session)
256
+
257
+ recommendations = []
258
+
259
+ # Check for areas needing improvement
260
+ weak_areas = [category for category, value in metrics.items() if value < 2]
261
+ for area in weak_areas:
262
+ if area == "lexical_semantic":
263
+ recommendations.append("Focus on vocabulary building exercises such as synonyms, antonyms, and word associations")
264
+ elif area == "syntactic":
265
+ recommendations.append("Practice sentence formation and grammar through structured activities")
266
+ elif area == "supralinguistic":
267
+ recommendations.append("Work on understanding figurative language and making inferences from context")
268
+ elif area == "pragmatic":
269
+ recommendations.append("Engage in role-playing activities to practice social communication skills")
270
+
271
+ # Add general recommendations
272
+ if len(sessions) > 1:
273
+ recommendations.append("Continue regular assessment sessions to track progress")
274
+
275
+ if not recommendations:
276
+ recommendations.append("Continue current therapy approach as all areas show adequate progress")
277
+
278
+ return recommendations
279
+
280
+
281
+ # This module can be used to generate reports from the session data collected by the CASL Voice Bot
282
+ if __name__ == "__main__":
283
+ # Example usage
284
+ report_gen = CASLReportGenerator()
285
+ # report_gen.generate_html_report("student123")
app/requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ python-dotenv>=1.0.0
2
+ livekit-agents>=0.7.0
3
+ openai>=1.3.0
4
+ gradio>=4.0.0
5
+ asyncio>=3.4.3
6
+ numpy>=1.24.0
7
+ soundfile>=0.12.1
app/templates/report_template.html ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>CASL-2 Assessment Report</title>
7
+ <style>
8
+ body {
9
+ font-family: Arial, sans-serif;
10
+ line-height: 1.6;
11
+ margin: 0;
12
+ padding: 20px;
13
+ color: #333;
14
+ }
15
+ .report-header {
16
+ text-align: center;
17
+ margin-bottom: 30px;
18
+ border-bottom: 2px solid #2c3e50;
19
+ padding-bottom: 20px;
20
+ }
21
+ .report-section {
22
+ margin-bottom: 30px;
23
+ padding: 20px;
24
+ background-color: #f8f9fa;
25
+ border-radius: 5px;
26
+ }
27
+ .metrics-grid {
28
+ display: grid;
29
+ grid-template-columns: 1fr 1fr;
30
+ gap: 15px;
31
+ }
32
+ .metric-card {
33
+ background-color: white;
34
+ padding: 15px;
35
+ border-radius: 5px;
36
+ box-shadow: 0 2px 4px rgba(0,0,0,0.1);
37
+ }
38
+ .metric-title {
39
+ font-weight: bold;
40
+ color: #2c3e50;
41
+ margin-bottom: 5px;
42
+ }
43
+ .metric-value {
44
+ font-size: 1.2em;
45
+ color: #3498db;
46
+ }
47
+ .progress-chart {
48
+ width: 100%;
49
+ max-width: 800px;
50
+ margin: 20px auto;
51
+ text-align: center;
52
+ }
53
+ .recommendations {
54
+ background-color: #e8f4fd;
55
+ padding: 20px;
56
+ border-left: 4px solid #3498db;
57
+ margin-top: 20px;
58
+ }
59
+ .recommendation-item {
60
+ margin-bottom: 10px;
61
+ }
62
+ .footer {
63
+ text-align: center;
64
+ margin-top: 40px;
65
+ font-size: 0.9em;
66
+ color: #7f8c8d;
67
+ padding-top: 20px;
68
+ border-top: 1px solid #ecf0f1;
69
+ }
70
+ @media print {
71
+ body {
72
+ padding: 0;
73
+ }
74
+ .report-section {
75
+ break-inside: avoid;
76
+ }
77
+ }
78
+ </style>
79
+ </head>
80
+ <body>
81
+ <div class="report-header">
82
+ <h1>CASL-2 Assessment Report</h1>
83
+ <h2>Student ID: {{ student_id }}</h2>
84
+ <p>Report Generated: {{ report_date }}</p>
85
+ </div>
86
+
87
+ <div class="report-section">
88
+ <h2>Assessment Summary</h2>
89
+ <div class="metrics-grid">
90
+ <div class="metric-card">
91
+ <div class="metric-title">Total Sessions</div>
92
+ <div class="metric-value">{{ session_count }}</div>
93
+ </div>
94
+ <div class="metric-card">
95
+ <div class="metric-title">Latest Session Date</div>
96
+ <div class="metric-value">{{ latest_session.date }}</div>
97
+ </div>
98
+ <div class="metric-card">
99
+ <div class="metric-title">Student Turns</div>
100
+ <div class="metric-value">{{ latest_session.student_turns }}</div>
101
+ </div>
102
+ <div class="metric-card">
103
+ <div class="metric-title">Total Words</div>
104
+ <div class="metric-value">{{ latest_session.total_words }}</div>
105
+ </div>
106
+ <div class="metric-card">
107
+ <div class="metric-title">Words Per Turn</div>
108
+ <div class="metric-value">{{ "%.1f"|format(latest_session.average_words_per_turn) }}</div>
109
+ </div>
110
+ <div class="metric-card">
111
+ <div class="metric-title">Session Duration</div>
112
+ <div class="metric-value">{{ latest_session.duration_minutes }} minutes</div>
113
+ </div>
114
+ </div>
115
+ </div>
116
+
117
+ <div class="report-section">
118
+ <h2>CASL-2 Domain Assessment</h2>
119
+ <div class="metrics-grid">
120
+ <div class="metric-card">
121
+ <div class="metric-title">Lexical/Semantic Skills</div>
122
+ <div class="metric-value">{{ latest_session.casl_metrics.lexical_semantic }}</div>
123
+ <div>
124
+ {% if progress.lexical_semantic > 0 %}
125
+ <span style="color: green;">↑ {{ "%.1f"|format(progress.lexical_semantic * 100) }}% improvement</span>
126
+ {% elif progress.lexical_semantic < 0 %}
127
+ <span style="color: red;">↓ {{ "%.1f"|format(progress.lexical_semantic * -100) }}% decrease</span>
128
+ {% else %}
129
+ <span style="color: orange;">→ No change</span>
130
+ {% endif %}
131
+ </div>
132
+ </div>
133
+ <div class="metric-card">
134
+ <div class="metric-title">Syntactic Skills</div>
135
+ <div class="metric-value">{{ latest_session.casl_metrics.syntactic }}</div>
136
+ <div>
137
+ {% if progress.syntactic > 0 %}
138
+ <span style="color: green;">↑ {{ "%.1f"|format(progress.syntactic * 100) }}% improvement</span>
139
+ {% elif progress.syntactic < 0 %}
140
+ <span style="color: red;">↓ {{ "%.1f"|format(progress.syntactic * -100) }}% decrease</span>
141
+ {% else %}
142
+ <span style="color: orange;">→ No change</span>
143
+ {% endif %}
144
+ </div>
145
+ </div>
146
+ <div class="metric-card">
147
+ <div class="metric-title">Supralinguistic Skills</div>
148
+ <div class="metric-value">{{ latest_session.casl_metrics.supralinguistic }}</div>
149
+ <div>
150
+ {% if progress.supralinguistic > 0 %}
151
+ <span style="color: green;">↑ {{ "%.1f"|format(progress.supralinguistic * 100) }}% improvement</span>
152
+ {% elif progress.supralinguistic < 0 %}
153
+ <span style="color: red;">↓ {{ "%.1f"|format(progress.supralinguistic * -100) }}% decrease</span>
154
+ {% else %}
155
+ <span style="color: orange;">→ No change</span>
156
+ {% endif %}
157
+ </div>
158
+ </div>
159
+ <div class="metric-card">
160
+ <div class="metric-title">Pragmatic Skills</div>
161
+ <div class="metric-value">{{ latest_session.casl_metrics.pragmatic }}</div>
162
+ <div>
163
+ {% if progress.pragmatic > 0 %}
164
+ <span style="color: green;">↑ {{ "%.1f"|format(progress.pragmatic * 100) }}% improvement</span>
165
+ {% elif progress.pragmatic < 0 %}
166
+ <span style="color: red;">↓ {{ "%.1f"|format(progress.pragmatic * -100) }}% decrease</span>
167
+ {% else %}
168
+ <span style="color: orange;">→ No change</span>
169
+ {% endif %}
170
+ </div>
171
+ </div>
172
+ </div>
173
+ </div>
174
+
175
+ <div class="report-section">
176
+ <h2>Progress Chart</h2>
177
+ <div class="progress-chart">
178
+ <img src="{{ chart_path }}" alt="Progress Chart" style="max-width: 100%;">
179
+ </div>
180
+ </div>
181
+
182
+ <div class="report-section">
183
+ <h2>Recommendations</h2>
184
+ <div class="recommendations">
185
+ {% for recommendation in recommendations %}
186
+ <div class="recommendation-item">• {{ recommendation }}</div>
187
+ {% endfor %}
188
+ </div>
189
+ </div>
190
+
191
+ <div class="footer">
192
+ <p>Generated by CASL Voice Bot Speech Therapy Assessment Tool</p>
193
+ <p>© {{ report_date.split('/')[-1] }} Speech Therapy Assessment System</p>
194
+ </div>
195
+ </body>
196
+ </html>
app_hf.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Hugging Face Spaces entry point for CASL Voice Bot
5
+ """
6
+
7
+ # Import the app from app_main.py
8
+ from app_main import app
9
+
10
+ # This is the entry point that Hugging Face Spaces will use
11
+ if __name__ == "__main__":
12
+ app.launch()
app_livekit.py ADDED
@@ -0,0 +1,469 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ CASL Voice Bot - Speech Pathology Assistant
5
+ Using LiveKit agents with OpenAI's real-time capabilities
6
+ """
7
+
8
+ import os
9
+ import asyncio
10
+ import gradio as gr
11
+ import logging
12
+ import tempfile
13
+ import queue
14
+ import threading
15
+ import time
16
+ from dotenv import load_dotenv
17
+ from livekit import agents
18
+ from openai import AsyncOpenAI
19
+
20
+ # Load environment variables
21
+ load_dotenv()
22
+
23
+ # Set up logging
24
+ logging.basicConfig(level=logging.INFO)
25
+ logger = logging.getLogger(__name__)
26
+
27
+ # Initialize OpenAI client
28
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
29
+
30
+ # Speech Pathologist Agent Prompt
31
+ SPEECH_PATHOLOGIST_PROMPT = """
32
+ You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
33
+ Your are working with a student with speech impediments typically with ASD
34
+ You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
35
+ Each domain from the CASL-2 framework can be analyzed using the sample:
36
+ Lexical/Semantic Skills:
37
+ This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
38
+ Key Subtests:
39
+ Antonyms: Identifying words with opposite meanings.
40
+ Synonyms: Identifying words with similar meanings.
41
+ Idiomatic Language: Understanding and interpreting idioms and figurative language.
42
+ Evaluate vocabulary diversity (type-token ratio).
43
+ Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
44
+ Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
45
+ Syntactic Skills:
46
+ This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
47
+ Key Subtests:
48
+ Sentence Expression: Producing grammatically correct sentences based on prompts.
49
+ Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
50
+ Examine sentence structure for grammatical accuracy.
51
+ Identify errors in verb tense, subject-verb agreement, or sentence complexity.
52
+ Note the use of clauses, conjunctions, and varied sentence types.
53
+ Supralinguistic Skills:
54
+ This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
55
+ Key Subtests:
56
+ Inferences: Understanding information that is not explicitly stated.
57
+ Meaning from Context: Deriving meaning from surrounding text or dialogue.
58
+ Nonliteral Language: Interpreting figurative language, such as metaphors or irony
59
+ Look for use or understanding of figurative language, idioms, or humor.
60
+ Assess ability to handle ambiguous or implied meanings in context.
61
+ Identify advanced language use for abstract or hypothetical ideas.
62
+ Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
63
+ This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
64
+ Key Subtests:
65
+ Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
66
+
67
+ Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
68
+ """
69
+
70
+ class GradioInputDevice(agents.InputDevice):
71
+ """Custom input device that works with Gradio"""
72
+
73
+ def __init__(self):
74
+ super().__init__()
75
+ self.audio_queue = asyncio.Queue()
76
+ self.is_active = True
77
+
78
+ async def receive(self) -> agents.AudioChunk:
79
+ """Receive audio data from the queue"""
80
+ try:
81
+ audio_data = await asyncio.wait_for(self.audio_queue.get(), timeout=0.1)
82
+ return audio_data
83
+ except asyncio.TimeoutError:
84
+ return None
85
+
86
+ async def add_audio(self, audio_data):
87
+ """Add audio data to the queue"""
88
+ if audio_data is None:
89
+ return
90
+
91
+ # Convert gradio audio format to AudioChunk
92
+ sample_rate, audio_array = audio_data
93
+ audio_chunk = agents.AudioChunk(
94
+ samples=audio_array,
95
+ sample_rate=sample_rate,
96
+ is_last=False
97
+ )
98
+ await self.audio_queue.put(audio_chunk)
99
+
100
+ def stop(self):
101
+ """Stop the input device"""
102
+ self.is_active = False
103
+
104
+
105
+ class GradioOutputDevice(agents.OutputDevice):
106
+ """Custom output device that works with Gradio"""
107
+
108
+ def __init__(self):
109
+ super().__init__()
110
+ self.output_queue = asyncio.Queue()
111
+
112
+ async def transmit(self, audio_chunk: agents.AudioChunk) -> None:
113
+ """Transmit audio chunk to the queue"""
114
+ if audio_chunk is not None:
115
+ await self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
116
+
117
+ async def get_latest_audio(self):
118
+ """Get the latest audio from the queue"""
119
+ try:
120
+ return await asyncio.wait_for(self.output_queue.get(), timeout=0.1)
121
+ except asyncio.TimeoutError:
122
+ return None
123
+
124
+
125
+ class SpeechPathologistAssistant:
126
+ """Speech pathologist assistant using LiveKit agents"""
127
+
128
+ def __init__(self):
129
+ self.input_device = GradioInputDevice()
130
+ self.output_device = GradioOutputDevice()
131
+ self.assistant = None
132
+ self.assistant_task = None
133
+ self.transcript = []
134
+ self.is_running = False
135
+ self.notes = []
136
+ self.current_assessment = {
137
+ "lexical_semantic": 0,
138
+ "syntactic": 0,
139
+ "supralinguistic": 0,
140
+ "pragmatic": 0
141
+ }
142
+
143
+ async def initialize_assistant(self, voice="shimmer"):
144
+ """Initialize the speech assistant"""
145
+ self.assistant = agents.VoiceAssistant(
146
+ openai_client=openai_client,
147
+ model="gpt-4o",
148
+ voice=voice,
149
+ input_device=self.input_device,
150
+ output_device=self.output_device,
151
+ initial_message=SPEECH_PATHOLOGIST_PROMPT,
152
+ real_time=True, # Enable real-time processing
153
+ )
154
+
155
+ # Add transcript and response callbacks
156
+ self.assistant.on_transcript = self.on_transcript
157
+ self.assistant.on_response = self.on_response
158
+
159
+ def on_transcript(self, transcript):
160
+ """Handle transcript from user"""
161
+ self.transcript.append(f"Student: {transcript.text}")
162
+
163
+ # Basic analysis of speech for CASL-2 categories
164
+ self.analyze_speech(transcript.text)
165
+
166
+ return True
167
+
168
+ def on_response(self, response):
169
+ """Handle response from assistant"""
170
+ self.transcript.append(f"Speech Pathologist: {response.text}")
171
+ return True
172
+
173
+ def analyze_speech(self, text):
174
+ """Analyze speech for CASL-2 categories"""
175
+ # Simple heuristic analysis - in a real app, this would use more sophisticated NLP
176
+
177
+ # Lexical/Semantic: check vocabulary diversity
178
+ words = text.lower().split()
179
+ unique_words = set(words)
180
+ if len(unique_words) / max(1, len(words)) > 0.7:
181
+ self.current_assessment["lexical_semantic"] += 1
182
+ self.notes.append("Lexical/Semantic: Good vocabulary diversity")
183
+
184
+ # Syntactic: check for sentence complexity
185
+ sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
186
+ avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
187
+ if avg_words > 7:
188
+ self.current_assessment["syntactic"] += 1
189
+ self.notes.append("Syntactic: Complex sentence structures used")
190
+
191
+ # Supralinguistic: check for figurative language (very basic check)
192
+ figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
193
+ if any(marker in text.lower() for marker in figurative_markers):
194
+ self.current_assessment["supralinguistic"] += 1
195
+ self.notes.append("Supralinguistic: Potential figurative language detected")
196
+
197
+ # Pragmatic: basic check for conversational elements
198
+ pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
199
+ if any(marker in text.lower() for marker in pragmatic_markers):
200
+ self.current_assessment["pragmatic"] += 1
201
+ self.notes.append("Pragmatic: Appropriate social language detected")
202
+
203
+ async def start_assistant(self, voice_model, student_id):
204
+ """Start the assistant in a background task"""
205
+ await self.initialize_assistant(voice_model)
206
+
207
+ self.is_running = True
208
+
209
+ # Add student info to transcript
210
+ student_info = f" for {student_id}" if student_id else ""
211
+ self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
212
+
213
+ # Run the assistant in a background task
214
+ self.assistant_task = asyncio.create_task(self.assistant.run())
215
+
216
+ return "Session active. The AI will introduce itself."
217
+
218
+ def stop_assistant(self):
219
+ """Stop the assistant"""
220
+ if self.assistant_task and not self.assistant_task.done():
221
+ self.assistant_task.cancel()
222
+
223
+ self.input_device.stop()
224
+ self.is_running = False
225
+
226
+ # Add ending to transcript
227
+ self.transcript.append("Session ended.")
228
+ return "Session stopped."
229
+
230
+ async def process_audio(self, audio):
231
+ """Process audio from Gradio interface"""
232
+ if not self.is_running or audio is None:
233
+ return None, self.get_transcript(), self.get_assessment_html()
234
+
235
+ # Add audio to input device
236
+ await self.input_device.add_audio(audio)
237
+
238
+ # Check for assistant output
239
+ output_audio = await self.output_device.get_latest_audio()
240
+
241
+ return output_audio, self.get_transcript(), self.get_assessment_html()
242
+
243
+ def get_transcript(self):
244
+ """Get the current transcript"""
245
+ return "\n".join(self.transcript)
246
+
247
+ def get_assessment_html(self):
248
+ """Get HTML representation of the current assessment"""
249
+ html = """
250
+ <div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
251
+ <h3>CASL-2 Assessment Progress</h3>
252
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
253
+ """
254
+
255
+ for category, value in self.current_assessment.items():
256
+ category_name = category.replace('_', ' ').title()
257
+ progress_width = min(100, value * 10)
258
+ html += f"""
259
+ <div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
260
+ <div><strong>{category_name}</strong></div>
261
+ <div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
262
+ <div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
263
+ </div>
264
+ <div style="margin-top: 5px;">{value} points</div>
265
+ </div>
266
+ """
267
+
268
+ html += """
269
+ </div>
270
+ """
271
+
272
+ # Add recent notes
273
+ if self.notes:
274
+ html += """
275
+ <div style="margin-top: 15px;">
276
+ <h4>Recent Observations</h4>
277
+ <ul style="margin-top: 5px;">
278
+ """
279
+
280
+ for note in self.notes[-5:]:
281
+ html += f"<li>{note}</li>"
282
+
283
+ html += """
284
+ </ul>
285
+ </div>
286
+ """
287
+
288
+ html += "</div>"
289
+ return html
290
+
291
+ def add_note(self, note):
292
+ """Add a custom note"""
293
+ if note.strip():
294
+ self.notes.append(note)
295
+ return f"Note added: {note}"
296
+ return "Note was empty, not added."
297
+
298
+ def save_session(self, student_id):
299
+ """Save session to file"""
300
+ if not student_id:
301
+ student_id = "anonymous"
302
+
303
+ # Create session data directory if it doesn't exist
304
+ os.makedirs("session_data", exist_ok=True)
305
+
306
+ # Save transcript
307
+ timestamp = time.strftime("%Y%m%d-%H%M%S")
308
+ filename = f"session_data/{student_id}_{timestamp}.txt"
309
+
310
+ with open(filename, "w") as f:
311
+ f.write("\n".join(self.transcript))
312
+ f.write("\n\n--- ASSESSMENT NOTES ---\n")
313
+ for note in self.notes:
314
+ f.write(f"- {note}\n")
315
+ f.write("\n--- CASL-2 SCORES ---\n")
316
+ for category, score in self.current_assessment.items():
317
+ f.write(f"{category.replace('_', ' ').title()}: {score}\n")
318
+
319
+ return f"Session saved to {filename}"
320
+
321
+
322
+ # Create the speech pathology assistant
323
+ speech_assistant = SpeechPathologistAssistant()
324
+
325
+
326
+ async def start_session(voice_model, student_id):
327
+ """Start the speech pathology session"""
328
+ status = await speech_assistant.start_assistant(voice_model, student_id)
329
+ return status, None, speech_assistant.get_transcript(), speech_assistant.get_assessment_html()
330
+
331
+
332
+ def stop_session():
333
+ """Stop the speech pathology session"""
334
+ return speech_assistant.stop_assistant(), None, speech_assistant.get_transcript(), speech_assistant.get_assessment_html()
335
+
336
+
337
+ async def process_mic_input(audio, progress=gr.Progress()):
338
+ """Process microphone input"""
339
+ progress(0, desc="Processing speech...")
340
+ audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
341
+ progress(1, desc="Done")
342
+ return audio_output, transcript, assessment
343
+
344
+
345
+ def add_note(note):
346
+ """Add a note to the session"""
347
+ result = speech_assistant.add_note(note)
348
+ return "", result, speech_assistant.get_assessment_html()
349
+
350
+
351
+ def save_session(student_id):
352
+ """Save the current session"""
353
+ return speech_assistant.save_session(student_id)
354
+
355
+
356
+ # Create Gradio Interface
357
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
358
+ gr.Markdown("# CASL-2 Speech Pathology Assistant")
359
+ gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
360
+
361
+ with gr.Row():
362
+ with gr.Column(scale=1):
363
+ student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
364
+ voice_select = gr.Dropdown(
365
+ ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
366
+ value="shimmer",
367
+ label="Assistant Voice"
368
+ )
369
+ start_button = gr.Button("Start Session", variant="primary")
370
+ stop_button = gr.Button("Stop Session", variant="stop")
371
+ status = gr.Textbox(label="Status", value="Ready to start")
372
+
373
+ with gr.Accordion("SLP Tools", open=True):
374
+ note_input = gr.Textbox(
375
+ label="Add Assessment Note",
376
+ placeholder="Enter observation or assessment note here..."
377
+ )
378
+ note_button = gr.Button("Add Note")
379
+ note_status = gr.Textbox(label="Note Status")
380
+ save_button = gr.Button("Save Session")
381
+ save_status = gr.Textbox(label="Save Status")
382
+
383
+ with gr.Column(scale=2):
384
+ audio_output = gr.Audio(label="AI Speech", autoplay=True)
385
+ audio_input = gr.Audio(
386
+ label="Speak to the AI",
387
+ type="microphone",
388
+ source="microphone",
389
+ streaming=True
390
+ )
391
+
392
+ with gr.Row():
393
+ with gr.Column(scale=1):
394
+ assessment_html = gr.HTML(label="Assessment Progress")
395
+ with gr.Column(scale=1):
396
+ transcript = gr.Textbox(label="Transcript", lines=10)
397
+
398
+ with gr.Accordion("About This Application", open=False):
399
+ gr.Markdown("""
400
+ ### About CASL-2 Speech Pathology Assistant
401
+
402
+ This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
403
+
404
+ - **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
405
+ - **Syntactic Skills**: Grammar and sentence structure
406
+ - **Supralinguistic Skills**: Higher-level language beyond literal meanings
407
+ - **Pragmatic Skills**: Social use of language (less emphasis for younger students)
408
+
409
+ The AI will provide structured assessments and exercises to help evaluate speech patterns.
410
+
411
+ ### How to Use
412
+
413
+ 1. Optionally enter a Student ID to track sessions
414
+ 2. Select the AI voice you prefer
415
+ 3. Click "Start Session" to begin
416
+ 4. The AI will introduce itself and begin the assessment
417
+ 5. Speak into your microphone when it's your turn
418
+ 6. View the transcript to track the conversation
419
+ 7. SLPs can add assessment notes as needed
420
+ 8. Save the session when finished
421
+ 9. Click "Stop Session" when done
422
+
423
+ ### For Speech-Language Pathologists
424
+
425
+ This tool is designed to supplement, not replace, professional SLP services. SLPs can:
426
+
427
+ - Add custom notes during the session
428
+ - Save session data for later reference
429
+ - Track progress across multiple sessions
430
+ - Use the AI as a consistent assessment tool
431
+ """)
432
+
433
+ # Setup event handlers
434
+ start_button.click(
435
+ fn=lambda voice, student: asyncio.run(start_session(voice, student)),
436
+ inputs=[voice_select, student_id],
437
+ outputs=[status, audio_output, transcript, assessment_html]
438
+ )
439
+ stop_button.click(
440
+ fn=stop_session,
441
+ outputs=[status, audio_output, transcript, assessment_html]
442
+ )
443
+ note_button.click(
444
+ fn=add_note,
445
+ inputs=note_input,
446
+ outputs=[note_input, note_status, assessment_html]
447
+ )
448
+ save_button.click(
449
+ fn=save_session,
450
+ inputs=student_id,
451
+ outputs=save_status
452
+ )
453
+
454
+ # Setup audio processing
455
+ audio_input.stream(
456
+ fn=lambda audio: asyncio.run(process_mic_input(audio)),
457
+ inputs=audio_input,
458
+ outputs=[audio_output, transcript, assessment_html]
459
+ )
460
+
461
+
462
+ def main(share=True):
463
+ """Main function to launch the app"""
464
+ app.launch(share=share)
465
+
466
+
467
+ # Entry point for the application
468
+ if __name__ == "__main__":
469
+ main()
app_main.py ADDED
@@ -0,0 +1,469 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ CASL Voice Bot - Speech Pathology Assistant
5
+ Main application file that can be used for both local deployment and Hugging Face Spaces
6
+ """
7
+
8
+ import os
9
+ import asyncio
10
+ import gradio as gr
11
+ import logging
12
+ import tempfile
13
+ import queue
14
+ import threading
15
+ import time
16
+ from dotenv import load_dotenv
17
+ from openai import AsyncOpenAI
18
+
19
+ # Load environment variables
20
+ load_dotenv()
21
+
22
+ # Set up logging
23
+ logging.basicConfig(level=logging.INFO)
24
+ logger = logging.getLogger(__name__)
25
+
26
+ # Initialize OpenAI client with API key from environment
27
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
28
+
29
+ # Speech Pathologist Agent Prompt
30
+ SPEECH_PATHOLOGIST_PROMPT = """
31
+ You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
32
+ Your are working with a student with speech impediments typically with ASD
33
+ You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
34
+ Each domain from the CASL-2 framework can be analyzed using the sample:
35
+ Lexical/Semantic Skills:
36
+ This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
37
+ Key Subtests:
38
+ Antonyms: Identifying words with opposite meanings.
39
+ Synonyms: Identifying words with similar meanings.
40
+ Idiomatic Language: Understanding and interpreting idioms and figurative language.
41
+ Evaluate vocabulary diversity (type-token ratio).
42
+ Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
43
+ Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
44
+ Syntactic Skills:
45
+ This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
46
+ Key Subtests:
47
+ Sentence Expression: Producing grammatically correct sentences based on prompts.
48
+ Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
49
+ Examine sentence structure for grammatical accuracy.
50
+ Identify errors in verb tense, subject-verb agreement, or sentence complexity.
51
+ Note the use of clauses, conjunctions, and varied sentence types.
52
+ Supralinguistic Skills:
53
+ This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
54
+ Key Subtests:
55
+ Inferences: Understanding information that is not explicitly stated.
56
+ Meaning from Context: Deriving meaning from surrounding text or dialogue.
57
+ Nonliteral Language: Interpreting figurative language, such as metaphors or irony
58
+ Look for use or understanding of figurative language, idioms, or humor.
59
+ Assess ability to handle ambiguous or implied meanings in context.
60
+ Identify advanced language use for abstract or hypothetical ideas.
61
+ Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
62
+ This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
63
+ Key Subtests:
64
+ Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
65
+
66
+ Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
67
+ """
68
+
69
+ # Custom audio processing for Gradio interface
70
+ class AudioProcessor:
71
+ def __init__(self):
72
+ self.transcript = []
73
+ self.is_active = False
74
+ self.voice_model = "shimmer" # Default voice
75
+ self.notes = []
76
+ self.current_assessment = {
77
+ "lexical_semantic": 0,
78
+ "syntactic": 0,
79
+ "supralinguistic": 0,
80
+ "pragmatic": 0
81
+ }
82
+
83
+ async def process_speech(self, audio_data, openai_client):
84
+ """Process speech using OpenAI's API"""
85
+ if not self.is_active or audio_data is None:
86
+ return None, "\n".join(self.transcript), self.get_assessment_html()
87
+
88
+ # Prepare audio file for OpenAI
89
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
90
+ temp_file.close()
91
+
92
+ try:
93
+ # Save audio data to temporary file
94
+ sample_rate, audio_array = audio_data
95
+ import scipy.io.wavfile
96
+ scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
97
+
98
+ # Transcribe audio using OpenAI
99
+ with open(temp_file.name, "rb") as audio_file:
100
+ transcript_response = await openai_client.audio.transcriptions.create(
101
+ file=audio_file,
102
+ model="whisper-1"
103
+ )
104
+
105
+ user_text = transcript_response.text
106
+ if user_text.strip():
107
+ self.transcript.append(f"Student: {user_text}")
108
+
109
+ # Analyze speech for CASL-2 categories
110
+ self.analyze_speech(user_text)
111
+
112
+ # Generate assistant response
113
+ chat_response = await openai_client.chat.completions.create(
114
+ model="gpt-4o",
115
+ messages=[
116
+ {"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
117
+ {"role": "user", "content": user_text}
118
+ ]
119
+ )
120
+
121
+ assistant_text = chat_response.choices[0].message.content
122
+ self.transcript.append(f"Speech Pathologist: {assistant_text}")
123
+
124
+ # Generate speech from text
125
+ speech_response = await openai_client.audio.speech.create(
126
+ model="tts-1",
127
+ voice=self.voice_model,
128
+ input=assistant_text
129
+ )
130
+
131
+ # Save speech to temporary file
132
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
133
+ response_temp_file.close()
134
+
135
+ speech_response.stream_to_file(response_temp_file.name)
136
+
137
+ # Load audio data for Gradio
138
+ import soundfile as sf
139
+ audio_data, sample_rate = sf.read(response_temp_file.name)
140
+
141
+ # Clean up
142
+ os.unlink(response_temp_file.name)
143
+
144
+ return (sample_rate, audio_data), "\n".join(self.transcript), self.get_assessment_html()
145
+
146
+ except Exception as e:
147
+ logger.error(f"Error processing speech: {e}")
148
+ self.transcript.append(f"Error: {str(e)}")
149
+ finally:
150
+ # Clean up temp file
151
+ os.unlink(temp_file.name)
152
+
153
+ return None, "\n".join(self.transcript), self.get_assessment_html()
154
+
155
+ def analyze_speech(self, text):
156
+ """Analyze speech for CASL-2 categories"""
157
+ # Simple heuristic analysis - in a real app, this would use more sophisticated NLP
158
+
159
+ # Lexical/Semantic: check vocabulary diversity
160
+ words = text.lower().split()
161
+ unique_words = set(words)
162
+ if len(unique_words) / max(1, len(words)) > 0.7:
163
+ self.current_assessment["lexical_semantic"] += 1
164
+ self.notes.append("Lexical/Semantic: Good vocabulary diversity")
165
+
166
+ # Syntactic: check for sentence complexity
167
+ sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
168
+ avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
169
+ if avg_words > 7:
170
+ self.current_assessment["syntactic"] += 1
171
+ self.notes.append("Syntactic: Complex sentence structures used")
172
+
173
+ # Supralinguistic: check for figurative language (very basic check)
174
+ figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
175
+ if any(marker in text.lower() for marker in figurative_markers):
176
+ self.current_assessment["supralinguistic"] += 1
177
+ self.notes.append("Supralinguistic: Potential figurative language detected")
178
+
179
+ # Pragmatic: basic check for conversational elements
180
+ pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
181
+ if any(marker in text.lower() for marker in pragmatic_markers):
182
+ self.current_assessment["pragmatic"] += 1
183
+ self.notes.append("Pragmatic: Appropriate social language detected")
184
+
185
+ def get_assessment_html(self):
186
+ """Get HTML representation of the current assessment"""
187
+ html = """
188
+ <div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
189
+ <h3>CASL-2 Assessment Progress</h3>
190
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
191
+ """
192
+
193
+ for category, value in self.current_assessment.items():
194
+ category_name = category.replace('_', ' ').title()
195
+ progress_width = min(100, value * 10)
196
+ html += f"""
197
+ <div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
198
+ <div><strong>{category_name}</strong></div>
199
+ <div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
200
+ <div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
201
+ </div>
202
+ <div style="margin-top: 5px;">{value} points</div>
203
+ </div>
204
+ """
205
+
206
+ html += """
207
+ </div>
208
+ """
209
+
210
+ # Add recent notes
211
+ if self.notes:
212
+ html += """
213
+ <div style="margin-top: 15px;">
214
+ <h4>Recent Observations</h4>
215
+ <ul style="margin-top: 5px;">
216
+ """
217
+
218
+ for note in self.notes[-5:]:
219
+ html += f"<li>{note}</li>"
220
+
221
+ html += """
222
+ </ul>
223
+ </div>
224
+ """
225
+
226
+ html += "</div>"
227
+ return html
228
+
229
+ def start_session(self, voice_model, student_id):
230
+ """Start a new session"""
231
+ self.is_active = True
232
+ self.voice_model = voice_model if voice_model else "shimmer"
233
+ self.transcript = []
234
+ self.notes = []
235
+ self.current_assessment = {
236
+ "lexical_semantic": 0,
237
+ "syntactic": 0,
238
+ "supralinguistic": 0,
239
+ "pragmatic": 0
240
+ }
241
+
242
+ student_info = f" for {student_id}" if student_id else ""
243
+ self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
244
+ return "Session active. Please wait for the AI to introduce itself."
245
+
246
+ def stop_session(self):
247
+ """Stop the current session"""
248
+ self.is_active = False
249
+ self.transcript.append("Session ended.")
250
+ return "Session stopped."
251
+
252
+ def add_note(self, note):
253
+ """Add a custom note"""
254
+ if note.strip():
255
+ self.notes.append(note)
256
+ return f"Note added: {note}"
257
+ return "Note was empty, not added."
258
+
259
+
260
+ # Create the audio processor instance
261
+ audio_processor = AudioProcessor()
262
+
263
+
264
+ async def start_session(voice_model, student_id):
265
+ """Start the speech pathology session"""
266
+ status = audio_processor.start_session(voice_model, student_id)
267
+
268
+ # Generate initial AI introduction
269
+ try:
270
+ # Generate assistant response
271
+ chat_response = await openai_client.chat.completions.create(
272
+ model="gpt-4o",
273
+ messages=[
274
+ {"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
275
+ {"role": "user", "content": "Hello"} # Initial trigger
276
+ ]
277
+ )
278
+
279
+ assistant_text = chat_response.choices[0].message.content
280
+ audio_processor.transcript.append(f"Speech Pathologist: {assistant_text}")
281
+
282
+ # Generate speech from text
283
+ speech_response = await openai_client.audio.speech.create(
284
+ model="tts-1",
285
+ voice=audio_processor.voice_model,
286
+ input=assistant_text
287
+ )
288
+
289
+ # Save speech to temporary file
290
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
291
+ response_temp_file.close()
292
+
293
+ speech_response.stream_to_file(response_temp_file.name)
294
+
295
+ # Load audio data for Gradio
296
+ import soundfile as sf
297
+ audio_data, sample_rate = sf.read(response_temp_file.name)
298
+
299
+ # Clean up
300
+ os.unlink(response_temp_file.name)
301
+
302
+ return status, (sample_rate, audio_data), "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
303
+
304
+ except Exception as e:
305
+ logger.error(f"Error starting session: {e}")
306
+ audio_processor.transcript.append(f"Error: {str(e)}")
307
+ return status, None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
308
+
309
+
310
+ def stop_session():
311
+ """Stop the speech pathology session"""
312
+ return audio_processor.stop_session(), None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
313
+
314
+
315
+ async def process_mic_input(audio, progress=gr.Progress()):
316
+ """Process microphone input"""
317
+ if audio is None or not audio_processor.is_active:
318
+ return None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
319
+
320
+ progress(0, desc="Processing speech...")
321
+ audio_output, transcript, assessment = await audio_processor.process_speech(audio, openai_client)
322
+ progress(1, desc="Done")
323
+ return audio_output, transcript, assessment
324
+
325
+
326
+ def add_note(note):
327
+ """Add a note to the session"""
328
+ result = audio_processor.add_note(note)
329
+ return "", result, audio_processor.get_assessment_html()
330
+
331
+
332
+ def save_session(student_id):
333
+ """Save session to file"""
334
+ if not student_id:
335
+ student_id = "anonymous"
336
+
337
+ # Create session data directory if it doesn't exist
338
+ os.makedirs("session_data", exist_ok=True)
339
+
340
+ # Save transcript
341
+ timestamp = time.strftime("%Y%m%d-%H%M%S")
342
+ filename = f"session_data/{student_id}_{timestamp}.txt"
343
+
344
+ with open(filename, "w") as f:
345
+ f.write("\n".join(audio_processor.transcript))
346
+ f.write("\n\n--- ASSESSMENT NOTES ---\n")
347
+ for note in audio_processor.notes:
348
+ f.write(f"- {note}\n")
349
+ f.write("\n--- CASL-2 SCORES ---\n")
350
+ for category, score in audio_processor.current_assessment.items():
351
+ f.write(f"{category.replace('_', ' ').title()}: {score}\n")
352
+
353
+ return f"Session saved to {filename}"
354
+
355
+
356
+ # Create Gradio Interface
357
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
358
+ gr.Markdown("# CASL-2 Speech Pathology Assistant")
359
+ gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
360
+
361
+ with gr.Row():
362
+ with gr.Column(scale=1):
363
+ student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
364
+ voice_select = gr.Dropdown(
365
+ ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
366
+ value="shimmer",
367
+ label="Assistant Voice"
368
+ )
369
+ start_button = gr.Button("Start Session", variant="primary")
370
+ stop_button = gr.Button("Stop Session", variant="stop")
371
+ status = gr.Textbox(label="Status", value="Ready to start")
372
+
373
+ with gr.Accordion("SLP Tools", open=True):
374
+ note_input = gr.Textbox(
375
+ label="Add Assessment Note",
376
+ placeholder="Enter observation or assessment note here..."
377
+ )
378
+ note_button = gr.Button("Add Note")
379
+ note_status = gr.Textbox(label="Note Status")
380
+ save_button = gr.Button("Save Session")
381
+ save_status = gr.Textbox(label="Save Status")
382
+
383
+ with gr.Column(scale=2):
384
+ audio_output = gr.Audio(label="AI Speech", autoplay=True)
385
+ audio_input = gr.Audio(
386
+ label="Speak to the AI",
387
+ type="microphone",
388
+ source="microphone",
389
+ streaming=True
390
+ )
391
+
392
+ with gr.Row():
393
+ with gr.Column(scale=1):
394
+ assessment_html = gr.HTML(label="Assessment Progress")
395
+ with gr.Column(scale=1):
396
+ transcript = gr.Textbox(label="Transcript", lines=10)
397
+
398
+ with gr.Accordion("About This Application", open=False):
399
+ gr.Markdown("""
400
+ ### About CASL-2 Speech Pathology Assistant
401
+
402
+ This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
403
+
404
+ - **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
405
+ - **Syntactic Skills**: Grammar and sentence structure
406
+ - **Supralinguistic Skills**: Higher-level language beyond literal meanings
407
+ - **Pragmatic Skills**: Social use of language (less emphasis for younger students)
408
+
409
+ The AI will provide structured assessments and exercises to help evaluate speech patterns.
410
+
411
+ ### How to Use
412
+
413
+ 1. Optionally enter a Student ID to track sessions
414
+ 2. Select the AI voice you prefer
415
+ 3. Click "Start Session" to begin
416
+ 4. The AI will introduce itself and begin the assessment
417
+ 5. Speak into your microphone when it's your turn
418
+ 6. View the transcript to track the conversation
419
+ 7. SLPs can add assessment notes as needed
420
+ 8. Save the session when finished
421
+ 9. Click "Stop Session" when done
422
+
423
+ ### For Speech-Language Pathologists
424
+
425
+ This tool is designed to supplement, not replace, professional SLP services. SLPs can:
426
+
427
+ - Add custom notes during the session
428
+ - Save session data for later reference
429
+ - Track progress across multiple sessions
430
+ - Use the AI as a consistent assessment tool
431
+ """)
432
+
433
+ # Setup event handlers
434
+ start_button.click(
435
+ fn=lambda voice, student: asyncio.run(start_session(voice, student)),
436
+ inputs=[voice_select, student_id],
437
+ outputs=[status, audio_output, transcript, assessment_html]
438
+ )
439
+ stop_button.click(
440
+ fn=stop_session,
441
+ outputs=[status, audio_output, transcript, assessment_html]
442
+ )
443
+ note_button.click(
444
+ fn=add_note,
445
+ inputs=note_input,
446
+ outputs=[note_input, note_status, assessment_html]
447
+ )
448
+ save_button.click(
449
+ fn=save_session,
450
+ inputs=student_id,
451
+ outputs=save_status
452
+ )
453
+
454
+ # Setup audio processing
455
+ audio_input.stream(
456
+ fn=lambda audio: asyncio.run(process_mic_input(audio)),
457
+ inputs=audio_input,
458
+ outputs=[audio_output, transcript, assessment_html]
459
+ )
460
+
461
+
462
+ def main(share=True):
463
+ """Main function to launch the app"""
464
+ app.launch(share=share)
465
+
466
+
467
+ # Entry point for the application
468
+ if __name__ == "__main__":
469
+ main()
app_ui.py ADDED
@@ -0,0 +1,469 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ CASL Voice Bot - Speech Pathology Assistant
5
+ Unified UI application for both local deployment and Hugging Face Spaces
6
+ """
7
+
8
+ import os
9
+ import asyncio
10
+ import gradio as gr
11
+ import logging
12
+ import tempfile
13
+ import queue
14
+ import threading
15
+ import time
16
+ from dotenv import load_dotenv
17
+ from openai import AsyncOpenAI
18
+
19
+ # Load environment variables
20
+ load_dotenv()
21
+
22
+ # Set up logging
23
+ logging.basicConfig(level=logging.INFO)
24
+ logger = logging.getLogger(__name__)
25
+
26
+ # Initialize OpenAI client with API key from environment
27
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
28
+
29
+ # Speech Pathologist Agent Prompt
30
+ SPEECH_PATHOLOGIST_PROMPT = """
31
+ You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
32
+ Your are working with a student with speech impediments typically with ASD
33
+ You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
34
+ Each domain from the CASL-2 framework can be analyzed using the sample:
35
+ Lexical/Semantic Skills:
36
+ This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
37
+ Key Subtests:
38
+ Antonyms: Identifying words with opposite meanings.
39
+ Synonyms: Identifying words with similar meanings.
40
+ Idiomatic Language: Understanding and interpreting idioms and figurative language.
41
+ Evaluate vocabulary diversity (type-token ratio).
42
+ Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
43
+ Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
44
+ Syntactic Skills:
45
+ This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
46
+ Key Subtests:
47
+ Sentence Expression: Producing grammatically correct sentences based on prompts.
48
+ Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
49
+ Examine sentence structure for grammatical accuracy.
50
+ Identify errors in verb tense, subject-verb agreement, or sentence complexity.
51
+ Note the use of clauses, conjunctions, and varied sentence types.
52
+ Supralinguistic Skills:
53
+ This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
54
+ Key Subtests:
55
+ Inferences: Understanding information that is not explicitly stated.
56
+ Meaning from Context: Deriving meaning from surrounding text or dialogue.
57
+ Nonliteral Language: Interpreting figurative language, such as metaphors or irony
58
+ Look for use or understanding of figurative language, idioms, or humor.
59
+ Assess ability to handle ambiguous or implied meanings in context.
60
+ Identify advanced language use for abstract or hypothetical ideas.
61
+ Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
62
+ This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
63
+ Key Subtests:
64
+ Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
65
+
66
+ Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
67
+ """
68
+
69
+ # Custom audio processing for Gradio interface
70
+ class AudioProcessor:
71
+ def __init__(self):
72
+ self.transcript = []
73
+ self.is_active = False
74
+ self.voice_model = "shimmer" # Default voice
75
+ self.notes = []
76
+ self.current_assessment = {
77
+ "lexical_semantic": 0,
78
+ "syntactic": 0,
79
+ "supralinguistic": 0,
80
+ "pragmatic": 0
81
+ }
82
+
83
+ async def process_speech(self, audio_data, openai_client):
84
+ """Process speech using OpenAI's API"""
85
+ if not self.is_active or audio_data is None:
86
+ return None, "\n".join(self.transcript), self.get_assessment_html()
87
+
88
+ # Prepare audio file for OpenAI
89
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
90
+ temp_file.close()
91
+
92
+ try:
93
+ # Save audio data to temporary file
94
+ sample_rate, audio_array = audio_data
95
+ import scipy.io.wavfile
96
+ scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
97
+
98
+ # Transcribe audio using OpenAI
99
+ with open(temp_file.name, "rb") as audio_file:
100
+ transcript_response = await openai_client.audio.transcriptions.create(
101
+ file=audio_file,
102
+ model="whisper-1"
103
+ )
104
+
105
+ user_text = transcript_response.text
106
+ if user_text.strip():
107
+ self.transcript.append(f"Student: {user_text}")
108
+
109
+ # Analyze speech for CASL-2 categories
110
+ self.analyze_speech(user_text)
111
+
112
+ # Generate assistant response
113
+ chat_response = await openai_client.chat.completions.create(
114
+ model="gpt-4o",
115
+ messages=[
116
+ {"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
117
+ {"role": "user", "content": user_text}
118
+ ]
119
+ )
120
+
121
+ assistant_text = chat_response.choices[0].message.content
122
+ self.transcript.append(f"Speech Pathologist: {assistant_text}")
123
+
124
+ # Generate speech from text
125
+ speech_response = await openai_client.audio.speech.create(
126
+ model="tts-1",
127
+ voice=self.voice_model,
128
+ input=assistant_text
129
+ )
130
+
131
+ # Save speech to temporary file
132
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
133
+ response_temp_file.close()
134
+
135
+ speech_response.stream_to_file(response_temp_file.name)
136
+
137
+ # Load audio data for Gradio
138
+ import soundfile as sf
139
+ audio_data, sample_rate = sf.read(response_temp_file.name)
140
+
141
+ # Clean up
142
+ os.unlink(response_temp_file.name)
143
+
144
+ return (sample_rate, audio_data), "\n".join(self.transcript), self.get_assessment_html()
145
+
146
+ except Exception as e:
147
+ logger.error(f"Error processing speech: {e}")
148
+ self.transcript.append(f"Error: {str(e)}")
149
+ finally:
150
+ # Clean up temp file
151
+ os.unlink(temp_file.name)
152
+
153
+ return None, "\n".join(self.transcript), self.get_assessment_html()
154
+
155
+ def analyze_speech(self, text):
156
+ """Analyze speech for CASL-2 categories"""
157
+ # Simple heuristic analysis - in a real app, this would use more sophisticated NLP
158
+
159
+ # Lexical/Semantic: check vocabulary diversity
160
+ words = text.lower().split()
161
+ unique_words = set(words)
162
+ if len(unique_words) / max(1, len(words)) > 0.7:
163
+ self.current_assessment["lexical_semantic"] += 1
164
+ self.notes.append("Lexical/Semantic: Good vocabulary diversity")
165
+
166
+ # Syntactic: check for sentence complexity
167
+ sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
168
+ avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
169
+ if avg_words > 7:
170
+ self.current_assessment["syntactic"] += 1
171
+ self.notes.append("Syntactic: Complex sentence structures used")
172
+
173
+ # Supralinguistic: check for figurative language (very basic check)
174
+ figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
175
+ if any(marker in text.lower() for marker in figurative_markers):
176
+ self.current_assessment["supralinguistic"] += 1
177
+ self.notes.append("Supralinguistic: Potential figurative language detected")
178
+
179
+ # Pragmatic: basic check for conversational elements
180
+ pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
181
+ if any(marker in text.lower() for marker in pragmatic_markers):
182
+ self.current_assessment["pragmatic"] += 1
183
+ self.notes.append("Pragmatic: Appropriate social language detected")
184
+
185
+ def get_assessment_html(self):
186
+ """Get HTML representation of the current assessment"""
187
+ html = """
188
+ <div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
189
+ <h3>CASL-2 Assessment Progress</h3>
190
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
191
+ """
192
+
193
+ for category, value in self.current_assessment.items():
194
+ category_name = category.replace('_', ' ').title()
195
+ progress_width = min(100, value * 10)
196
+ html += f"""
197
+ <div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
198
+ <div><strong>{category_name}</strong></div>
199
+ <div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
200
+ <div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
201
+ </div>
202
+ <div style="margin-top: 5px;">{value} points</div>
203
+ </div>
204
+ """
205
+
206
+ html += """
207
+ </div>
208
+ """
209
+
210
+ # Add recent notes
211
+ if self.notes:
212
+ html += """
213
+ <div style="margin-top: 15px;">
214
+ <h4>Recent Observations</h4>
215
+ <ul style="margin-top: 5px;">
216
+ """
217
+
218
+ for note in self.notes[-5:]:
219
+ html += f"<li>{note}</li>"
220
+
221
+ html += """
222
+ </ul>
223
+ </div>
224
+ """
225
+
226
+ html += "</div>"
227
+ return html
228
+
229
+ def start_session(self, voice_model, student_id):
230
+ """Start a new session"""
231
+ self.is_active = True
232
+ self.voice_model = voice_model if voice_model else "shimmer"
233
+ self.transcript = []
234
+ self.notes = []
235
+ self.current_assessment = {
236
+ "lexical_semantic": 0,
237
+ "syntactic": 0,
238
+ "supralinguistic": 0,
239
+ "pragmatic": 0
240
+ }
241
+
242
+ student_info = f" for {student_id}" if student_id else ""
243
+ self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
244
+ return "Session active. Please wait for the AI to introduce itself."
245
+
246
+ def stop_session(self):
247
+ """Stop the current session"""
248
+ self.is_active = False
249
+ self.transcript.append("Session ended.")
250
+ return "Session stopped."
251
+
252
+ def add_note(self, note):
253
+ """Add a custom note"""
254
+ if note.strip():
255
+ self.notes.append(note)
256
+ return f"Note added: {note}"
257
+ return "Note was empty, not added."
258
+
259
+
260
+ # Create the audio processor instance
261
+ audio_processor = AudioProcessor()
262
+
263
+
264
+ async def start_session(voice_model, student_id):
265
+ """Start the speech pathology session"""
266
+ status = audio_processor.start_session(voice_model, student_id)
267
+
268
+ # Generate initial AI introduction
269
+ try:
270
+ # Generate assistant response
271
+ chat_response = await openai_client.chat.completions.create(
272
+ model="gpt-4o",
273
+ messages=[
274
+ {"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
275
+ {"role": "user", "content": "Hello"} # Initial trigger
276
+ ]
277
+ )
278
+
279
+ assistant_text = chat_response.choices[0].message.content
280
+ audio_processor.transcript.append(f"Speech Pathologist: {assistant_text}")
281
+
282
+ # Generate speech from text
283
+ speech_response = await openai_client.audio.speech.create(
284
+ model="tts-1",
285
+ voice=audio_processor.voice_model,
286
+ input=assistant_text
287
+ )
288
+
289
+ # Save speech to temporary file
290
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
291
+ response_temp_file.close()
292
+
293
+ speech_response.stream_to_file(response_temp_file.name)
294
+
295
+ # Load audio data for Gradio
296
+ import soundfile as sf
297
+ audio_data, sample_rate = sf.read(response_temp_file.name)
298
+
299
+ # Clean up
300
+ os.unlink(response_temp_file.name)
301
+
302
+ return status, (sample_rate, audio_data), "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
303
+
304
+ except Exception as e:
305
+ logger.error(f"Error starting session: {e}")
306
+ audio_processor.transcript.append(f"Error: {str(e)}")
307
+ return status, None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
308
+
309
+
310
+ def stop_session():
311
+ """Stop the speech pathology session"""
312
+ return audio_processor.stop_session(), None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
313
+
314
+
315
+ async def process_mic_input(audio, progress=gr.Progress()):
316
+ """Process microphone input"""
317
+ if audio is None or not audio_processor.is_active:
318
+ return None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
319
+
320
+ progress(0, desc="Processing speech...")
321
+ audio_output, transcript, assessment = await audio_processor.process_speech(audio, openai_client)
322
+ progress(1, desc="Done")
323
+ return audio_output, transcript, assessment
324
+
325
+
326
+ def add_note(note):
327
+ """Add a note to the session"""
328
+ result = audio_processor.add_note(note)
329
+ return "", result, audio_processor.get_assessment_html()
330
+
331
+
332
+ def save_session(student_id):
333
+ """Save session to file"""
334
+ if not student_id:
335
+ student_id = "anonymous"
336
+
337
+ # Create session data directory if it doesn't exist
338
+ os.makedirs("session_data", exist_ok=True)
339
+
340
+ # Save transcript
341
+ timestamp = time.strftime("%Y%m%d-%H%M%S")
342
+ filename = f"session_data/{student_id}_{timestamp}.txt"
343
+
344
+ with open(filename, "w") as f:
345
+ f.write("\n".join(audio_processor.transcript))
346
+ f.write("\n\n--- ASSESSMENT NOTES ---\n")
347
+ for note in audio_processor.notes:
348
+ f.write(f"- {note}\n")
349
+ f.write("\n--- CASL-2 SCORES ---\n")
350
+ for category, score in audio_processor.current_assessment.items():
351
+ f.write(f"{category.replace('_', ' ').title()}: {score}\n")
352
+
353
+ return f"Session saved to {filename}"
354
+
355
+
356
+ # Create Gradio Interface
357
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
358
+ gr.Markdown("# CASL-2 Speech Pathology Assistant")
359
+ gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
360
+
361
+ with gr.Row():
362
+ with gr.Column(scale=1):
363
+ student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
364
+ voice_select = gr.Dropdown(
365
+ ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
366
+ value="shimmer",
367
+ label="Assistant Voice"
368
+ )
369
+ start_button = gr.Button("Start Session", variant="primary")
370
+ stop_button = gr.Button("Stop Session", variant="stop")
371
+ status = gr.Textbox(label="Status", value="Ready to start")
372
+
373
+ with gr.Accordion("SLP Tools", open=True):
374
+ note_input = gr.Textbox(
375
+ label="Add Assessment Note",
376
+ placeholder="Enter observation or assessment note here..."
377
+ )
378
+ note_button = gr.Button("Add Note")
379
+ note_status = gr.Textbox(label="Note Status")
380
+ save_button = gr.Button("Save Session")
381
+ save_status = gr.Textbox(label="Save Status")
382
+
383
+ with gr.Column(scale=2):
384
+ audio_output = gr.Audio(label="AI Speech", autoplay=True)
385
+ audio_input = gr.Audio(
386
+ label="Speak to the AI",
387
+ type="microphone",
388
+ source="microphone",
389
+ streaming=True
390
+ )
391
+
392
+ with gr.Row():
393
+ with gr.Column(scale=1):
394
+ assessment_html = gr.HTML(label="Assessment Progress")
395
+ with gr.Column(scale=1):
396
+ transcript = gr.Textbox(label="Transcript", lines=10)
397
+
398
+ with gr.Accordion("About This Application", open=False):
399
+ gr.Markdown("""
400
+ ### About CASL-2 Speech Pathology Assistant
401
+
402
+ This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
403
+
404
+ - **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
405
+ - **Syntactic Skills**: Grammar and sentence structure
406
+ - **Supralinguistic Skills**: Higher-level language beyond literal meanings
407
+ - **Pragmatic Skills**: Social use of language (less emphasis for younger students)
408
+
409
+ The AI will provide structured assessments and exercises to help evaluate speech patterns.
410
+
411
+ ### How to Use
412
+
413
+ 1. Optionally enter a Student ID to track sessions
414
+ 2. Select the AI voice you prefer
415
+ 3. Click "Start Session" to begin
416
+ 4. The AI will introduce itself and begin the assessment
417
+ 5. Speak into your microphone when it's your turn
418
+ 6. View the transcript to track the conversation
419
+ 7. SLPs can add assessment notes as needed
420
+ 8. Save the session when finished
421
+ 9. Click "Stop Session" when done
422
+
423
+ ### For Speech-Language Pathologists
424
+
425
+ This tool is designed to supplement, not replace, professional SLP services. SLPs can:
426
+
427
+ - Add custom notes during the session
428
+ - Save session data for later reference
429
+ - Track progress across multiple sessions
430
+ - Use the AI as a consistent assessment tool
431
+ """)
432
+
433
+ # Setup event handlers
434
+ start_button.click(
435
+ fn=lambda voice, student: asyncio.run(start_session(voice, student)),
436
+ inputs=[voice_select, student_id],
437
+ outputs=[status, audio_output, transcript, assessment_html]
438
+ )
439
+ stop_button.click(
440
+ fn=stop_session,
441
+ outputs=[status, audio_output, transcript, assessment_html]
442
+ )
443
+ note_button.click(
444
+ fn=add_note,
445
+ inputs=note_input,
446
+ outputs=[note_input, note_status, assessment_html]
447
+ )
448
+ save_button.click(
449
+ fn=save_session,
450
+ inputs=student_id,
451
+ outputs=save_status
452
+ )
453
+
454
+ # Setup audio processing
455
+ audio_input.stream(
456
+ fn=lambda audio: asyncio.run(process_mic_input(audio)),
457
+ inputs=audio_input,
458
+ outputs=[audio_output, transcript, assessment_html]
459
+ )
460
+
461
+
462
+ def main(share=True):
463
+ """Main function to launch the app"""
464
+ app.launch(share=share)
465
+
466
+
467
+ # Entry point for the application
468
+ if __name__ == "__main__":
469
+ main()
huggingface_requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ python-dotenv>=1.0.0
2
+ openai>=1.3.0
3
+ gradio>=4.0.0
4
+ soundfile>=0.12.1
5
+ scipy>=1.10.0
6
+ asyncio>=3.4.3
7
+ numpy>=1.24.0
implementations/common/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ """
2
+ Common utilities for CASL Voice Bot
3
+ """
implementations/common/casl_utils.py ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Common utilities for CASL Voice Bot implementations
5
+ """
6
+
7
+ import os
8
+ import time
9
+
10
+ class CASLAssessment:
11
+ """CASL-2 assessment tracker"""
12
+
13
+ def __init__(self):
14
+ """Initialize the assessment tracker"""
15
+ self.notes = []
16
+ self.current_assessment = {
17
+ "lexical_semantic": 0,
18
+ "syntactic": 0,
19
+ "supralinguistic": 0,
20
+ "pragmatic": 0
21
+ }
22
+
23
+ def analyze_speech(self, text):
24
+ """Analyze speech for CASL-2 categories"""
25
+ # Simple heuristic analysis - in a real app, this would use more sophisticated NLP
26
+
27
+ # Lexical/Semantic: check vocabulary diversity
28
+ words = text.lower().split()
29
+ unique_words = set(words)
30
+ if len(unique_words) / max(1, len(words)) > 0.7:
31
+ self.current_assessment["lexical_semantic"] += 1
32
+ self.notes.append("Lexical/Semantic: Good vocabulary diversity")
33
+
34
+ # Syntactic: check for sentence complexity
35
+ sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
36
+ avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
37
+ if avg_words > 7:
38
+ self.current_assessment["syntactic"] += 1
39
+ self.notes.append("Syntactic: Complex sentence structures used")
40
+
41
+ # Supralinguistic: check for figurative language (very basic check)
42
+ figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
43
+ if any(marker in text.lower() for marker in figurative_markers):
44
+ self.current_assessment["supralinguistic"] += 1
45
+ self.notes.append("Supralinguistic: Potential figurative language detected")
46
+
47
+ # Pragmatic: basic check for conversational elements
48
+ pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
49
+ if any(marker in text.lower() for marker in pragmatic_markers):
50
+ self.current_assessment["pragmatic"] += 1
51
+ self.notes.append("Pragmatic: Appropriate social language detected")
52
+
53
+ def get_assessment_html(self):
54
+ """Get HTML representation of the current assessment"""
55
+ html = """
56
+ <div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
57
+ <h3>CASL-2 Assessment Progress</h3>
58
+ <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
59
+ """
60
+
61
+ for category, value in self.current_assessment.items():
62
+ category_name = category.replace('_', ' ').title()
63
+ progress_width = min(100, value * 10)
64
+ html += f"""
65
+ <div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
66
+ <div><strong>{category_name}</strong></div>
67
+ <div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
68
+ <div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
69
+ </div>
70
+ <div style="margin-top: 5px;">{value} points</div>
71
+ </div>
72
+ """
73
+
74
+ html += """
75
+ </div>
76
+ """
77
+
78
+ # Add recent notes
79
+ if self.notes:
80
+ html += """
81
+ <div style="margin-top: 15px;">
82
+ <h4>Recent Observations</h4>
83
+ <ul style="margin-top: 5px;">
84
+ """
85
+
86
+ for note in self.notes[-5:]:
87
+ html += f"<li>{note}</li>"
88
+
89
+ html += """
90
+ </ul>
91
+ </div>
92
+ """
93
+
94
+ html += "</div>"
95
+ return html
96
+
97
+ def add_note(self, note):
98
+ """Add a custom note"""
99
+ if note.strip():
100
+ self.notes.append(note)
101
+ return f"Note added: {note}"
102
+ return "Note was empty, not added."
103
+
104
+
105
+ def save_session_data(transcript, assessment, student_id=None):
106
+ """Save session data to a file"""
107
+ if not student_id:
108
+ student_id = "anonymous"
109
+
110
+ # Create session data directory if it doesn't exist
111
+ os.makedirs("session_data", exist_ok=True)
112
+
113
+ # Save transcript
114
+ timestamp = time.strftime("%Y%m%d-%H%M%S")
115
+ filename = f"session_data/{student_id}_{timestamp}.txt"
116
+
117
+ with open(filename, "w") as f:
118
+ f.write("\n".join(transcript))
119
+ f.write("\n\n--- ASSESSMENT NOTES ---\n")
120
+ for note in assessment.notes:
121
+ f.write(f"- {note}\n")
122
+ f.write("\n--- CASL-2 SCORES ---\n")
123
+ for category, score in assessment.current_assessment.items():
124
+ f.write(f"{category.replace('_', ' ').title()}: {score}\n")
125
+
126
+ return f"Session saved to {filename}"
127
+
128
+
129
+ # Common prompt used by all implementations
130
+ CASL_PROMPT = """
131
+ You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
132
+ Your are working with a student with speech impediments typically with ASD
133
+ You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
134
+ Each domain from the CASL-2 framework can be analyzed using the sample:
135
+ Lexical/Semantic Skills:
136
+ This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
137
+ Key Subtests:
138
+ Antonyms: Identifying words with opposite meanings.
139
+ Synonyms: Identifying words with similar meanings.
140
+ Idiomatic Language: Understanding and interpreting idioms and figurative language.
141
+ Evaluate vocabulary diversity (type-token ratio).
142
+ Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
143
+ Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
144
+ Syntactic Skills:
145
+ This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
146
+ Key Subtests:
147
+ Sentence Expression: Producing grammatically correct sentences based on prompts.
148
+ Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
149
+ Examine sentence structure for grammatical accuracy.
150
+ Identify errors in verb tense, subject-verb agreement, or sentence complexity.
151
+ Note the use of clauses, conjunctions, and varied sentence types.
152
+ Supralinguistic Skills:
153
+ This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
154
+ Key Subtests:
155
+ Inferences: Understanding information that is not explicitly stated.
156
+ Meaning from Context: Deriving meaning from surrounding text or dialogue.
157
+ Nonliteral Language: Interpreting figurative language, such as metaphors or irony
158
+ Look for use or understanding of figurative language, idioms, or humor.
159
+ Assess ability to handle ambiguous or implied meanings in context.
160
+ Identify advanced language use for abstract or hypothetical ideas.
161
+ Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
162
+ This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
163
+ Key Subtests:
164
+ Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
165
+
166
+ Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
167
+ """
implementations/direct/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ """
2
+ Direct OpenAI API implementation of CASL Voice Bot
3
+ """
implementations/direct/app.py ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ CASL Voice Bot - Speech Pathology Assistant
5
+ Direct OpenAI API implementation (no LiveKit)
6
+ """
7
+
8
+ import os
9
+ import asyncio
10
+ import gradio as gr
11
+ import logging
12
+ import sys
13
+ import tempfile
14
+ import time
15
+ from dotenv import load_dotenv
16
+ from openai import AsyncOpenAI
17
+
18
+ # Add parent directory to path to import common utilities
19
+ sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
20
+ from implementations.common.casl_utils import CASLAssessment, save_session_data, CASL_PROMPT
21
+
22
+ # Load environment variables
23
+ load_dotenv()
24
+
25
+ # Set up logging
26
+ logging.basicConfig(level=logging.INFO)
27
+ logger = logging.getLogger(__name__)
28
+
29
+ # Initialize OpenAI client
30
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
31
+
32
+
33
+ class SpeechPathologistAssistant:
34
+ """Speech pathologist assistant using direct OpenAI API"""
35
+
36
+ def __init__(self):
37
+ self.transcript = []
38
+ self.is_running = False
39
+ self.assessment = CASLAssessment()
40
+ self.voice_model = "shimmer"
41
+ self.student_id = None
42
+
43
+ async def start_session(self, voice_model, student_id):
44
+ """Start a new session"""
45
+ self.is_running = True
46
+ self.voice_model = voice_model if voice_model else "shimmer"
47
+ self.student_id = student_id
48
+ self.transcript = []
49
+ self.assessment = CASLAssessment()
50
+
51
+ # Add student info to transcript
52
+ student_info = f" for {student_id}" if student_id else ""
53
+ self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
54
+
55
+ # Generate initial AI message
56
+ initial_audio = await self.generate_initial_message()
57
+
58
+ return "Session active. The AI will introduce itself.", initial_audio, self.get_transcript(), self.assessment.get_assessment_html()
59
+
60
+ async def generate_initial_message(self):
61
+ """Generate initial AI message"""
62
+ # Generate assistant response
63
+ chat_response = await openai_client.chat.completions.create(
64
+ model="gpt-4o",
65
+ messages=[
66
+ {"role": "system", "content": CASL_PROMPT},
67
+ {"role": "user", "content": "Hello"} # Initial trigger
68
+ ]
69
+ )
70
+
71
+ assistant_text = chat_response.choices[0].message.content
72
+ self.transcript.append(f"Speech Pathologist: {assistant_text}")
73
+
74
+ # Generate speech from text
75
+ speech_response = await openai_client.audio.speech.create(
76
+ model="tts-1",
77
+ voice=self.voice_model,
78
+ input=assistant_text
79
+ )
80
+
81
+ # Save speech to temporary file
82
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
83
+ response_temp_file.close()
84
+
85
+ speech_response.stream_to_file(response_temp_file.name)
86
+
87
+ # Load audio data for Gradio
88
+ import soundfile as sf
89
+ audio_data, sample_rate = sf.read(response_temp_file.name)
90
+
91
+ # Clean up
92
+ os.unlink(response_temp_file.name)
93
+
94
+ return (sample_rate, audio_data)
95
+
96
+ def stop_session(self):
97
+ """Stop the current session"""
98
+ self.is_running = False
99
+ self.transcript.append("Session ended.")
100
+ return "Session stopped.", None, self.get_transcript(), self.assessment.get_assessment_html()
101
+
102
+ async def process_audio(self, audio):
103
+ """Process audio from Gradio interface"""
104
+ if not self.is_running or audio is None:
105
+ return None, self.get_transcript(), self.assessment.get_assessment_html()
106
+
107
+ # Prepare audio file for OpenAI
108
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
109
+ temp_file.close()
110
+
111
+ try:
112
+ # Save audio data to temporary file
113
+ sample_rate, audio_array = audio
114
+ import scipy.io.wavfile
115
+ scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
116
+
117
+ # Transcribe audio using OpenAI
118
+ with open(temp_file.name, "rb") as audio_file:
119
+ transcript_response = await openai_client.audio.transcriptions.create(
120
+ file=audio_file,
121
+ model="whisper-1"
122
+ )
123
+
124
+ user_text = transcript_response.text
125
+ if user_text.strip():
126
+ self.transcript.append(f"Student: {user_text}")
127
+
128
+ # Analyze speech for CASL-2 categories
129
+ self.assessment.analyze_speech(user_text)
130
+
131
+ # Generate assistant response
132
+ chat_response = await openai_client.chat.completions.create(
133
+ model="gpt-4o",
134
+ messages=[
135
+ {"role": "system", "content": CASL_PROMPT},
136
+ {"role": "user", "content": user_text}
137
+ ]
138
+ )
139
+
140
+ assistant_text = chat_response.choices[0].message.content
141
+ self.transcript.append(f"Speech Pathologist: {assistant_text}")
142
+
143
+ # Generate speech from text
144
+ speech_response = await openai_client.audio.speech.create(
145
+ model="tts-1",
146
+ voice=self.voice_model,
147
+ input=assistant_text
148
+ )
149
+
150
+ # Save speech to temporary file
151
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
152
+ response_temp_file.close()
153
+
154
+ speech_response.stream_to_file(response_temp_file.name)
155
+
156
+ # Load audio data for Gradio
157
+ import soundfile as sf
158
+ audio_data, sample_rate = sf.read(response_temp_file.name)
159
+
160
+ # Clean up
161
+ os.unlink(response_temp_file.name)
162
+
163
+ return (sample_rate, audio_data), self.get_transcript(), self.assessment.get_assessment_html()
164
+
165
+ except Exception as e:
166
+ logger.error(f"Error processing audio: {e}")
167
+ self.transcript.append(f"Error: {str(e)}")
168
+ finally:
169
+ # Clean up temp file
170
+ os.unlink(temp_file.name)
171
+
172
+ return None, self.get_transcript(), self.assessment.get_assessment_html()
173
+
174
+ def get_transcript(self):
175
+ """Get the current transcript"""
176
+ return "\n".join(self.transcript)
177
+
178
+ def add_note(self, note):
179
+ """Add a custom note"""
180
+ result = self.assessment.add_note(note)
181
+ return "", result, self.assessment.get_assessment_html()
182
+
183
+ def save_session(self, student_id=None):
184
+ """Save session to file"""
185
+ student_id = student_id or self.student_id
186
+ return save_session_data(self.transcript, self.assessment, student_id)
187
+
188
+
189
+ # Create the speech pathology assistant
190
+ speech_assistant = SpeechPathologistAssistant()
191
+
192
+
193
+ async def start_session(voice_model, student_id):
194
+ """Start the speech pathology session"""
195
+ return await speech_assistant.start_session(voice_model, student_id)
196
+
197
+
198
+ def stop_session():
199
+ """Stop the speech pathology session"""
200
+ return speech_assistant.stop_session()
201
+
202
+
203
+ async def process_mic_input(audio, progress=gr.Progress()):
204
+ """Process microphone input"""
205
+ progress(0, desc="Processing speech...")
206
+ audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
207
+ progress(1, desc="Done")
208
+ return audio_output, transcript, assessment
209
+
210
+
211
+ def add_note(note):
212
+ """Add a note to the session"""
213
+ return speech_assistant.add_note(note)
214
+
215
+
216
+ def save_session(student_id):
217
+ """Save the current session"""
218
+ return speech_assistant.save_session(student_id)
219
+
220
+
221
+ # Create Gradio Interface
222
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
223
+ gr.Markdown("# CASL-2 Speech Pathology Assistant")
224
+ gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
225
+
226
+ with gr.Row():
227
+ with gr.Column(scale=1):
228
+ student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
229
+ voice_select = gr.Dropdown(
230
+ ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
231
+ value="shimmer",
232
+ label="Assistant Voice"
233
+ )
234
+ start_button = gr.Button("Start Session", variant="primary")
235
+ stop_button = gr.Button("Stop Session", variant="stop")
236
+ status = gr.Textbox(label="Status", value="Ready to start")
237
+
238
+ with gr.Accordion("SLP Tools", open=True):
239
+ note_input = gr.Textbox(
240
+ label="Add Assessment Note",
241
+ placeholder="Enter observation or assessment note here..."
242
+ )
243
+ note_button = gr.Button("Add Note")
244
+ note_status = gr.Textbox(label="Note Status")
245
+ save_button = gr.Button("Save Session")
246
+ save_status = gr.Textbox(label="Save Status")
247
+
248
+ with gr.Column(scale=2):
249
+ audio_output = gr.Audio(label="AI Speech", autoplay=True)
250
+ audio_input = gr.Audio(
251
+ label="Speak to the AI",
252
+ type="microphone",
253
+ source="microphone",
254
+ streaming=True
255
+ )
256
+
257
+ with gr.Row():
258
+ with gr.Column(scale=1):
259
+ assessment_html = gr.HTML(label="Assessment Progress")
260
+ with gr.Column(scale=1):
261
+ transcript = gr.Textbox(label="Transcript", lines=10)
262
+
263
+ with gr.Accordion("About This Application", open=False):
264
+ gr.Markdown("""
265
+ ### About CASL-2 Speech Pathology Assistant
266
+
267
+ This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
268
+
269
+ - **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
270
+ - **Syntactic Skills**: Grammar and sentence structure
271
+ - **Supralinguistic Skills**: Higher-level language beyond literal meanings
272
+ - **Pragmatic Skills**: Social use of language (less emphasis for younger students)
273
+
274
+ The AI will provide structured assessments and exercises to help evaluate speech patterns.
275
+
276
+ ### How to Use
277
+
278
+ 1. Optionally enter a Student ID to track sessions
279
+ 2. Select the AI voice you prefer
280
+ 3. Click "Start Session" to begin
281
+ 4. The AI will introduce itself and begin the assessment
282
+ 5. Speak into your microphone when it's your turn
283
+ 6. View the transcript to track the conversation
284
+ 7. SLPs can add notes throughout the session
285
+ 8. Save the session when finished
286
+ 9. Click "Stop Session" when done
287
+
288
+ ### For Speech-Language Pathologists
289
+
290
+ This tool is designed to supplement, not replace, professional SLP services. SLPs can:
291
+
292
+ - Add custom notes during the session
293
+ - Save session data for later reference
294
+ - Track progress across multiple sessions
295
+ - Use the AI as a consistent assessment tool
296
+ """)
297
+
298
+ # Setup event handlers
299
+ start_button.click(
300
+ fn=lambda voice, student: asyncio.run(start_session(voice, student)),
301
+ inputs=[voice_select, student_id],
302
+ outputs=[status, audio_output, transcript, assessment_html]
303
+ )
304
+ stop_button.click(
305
+ fn=stop_session,
306
+ outputs=[status, audio_output, transcript, assessment_html]
307
+ )
308
+ note_button.click(
309
+ fn=add_note,
310
+ inputs=note_input,
311
+ outputs=[note_input, note_status, assessment_html]
312
+ )
313
+ save_button.click(
314
+ fn=save_session,
315
+ inputs=student_id,
316
+ outputs=save_status
317
+ )
318
+
319
+ # Setup audio processing
320
+ audio_input.stream(
321
+ fn=lambda audio: asyncio.run(process_mic_input(audio)),
322
+ inputs=audio_input,
323
+ outputs=[audio_output, transcript, assessment_html]
324
+ )
325
+
326
+
327
+ def main(share=True):
328
+ """Main function to launch the app"""
329
+ app.launch(share=share)
330
+
331
+
332
+ # Entry point for the application
333
+ if __name__ == "__main__":
334
+ main()
implementations/livekit/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ """
2
+ LiveKit implementation of CASL Voice Bot
3
+ """
implementations/livekit/app.py ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ CASL Voice Bot - Speech Pathology Assistant
5
+ Using LiveKit agents with OpenAI's real-time capabilities
6
+ """
7
+
8
+ import os
9
+ import asyncio
10
+ import gradio as gr
11
+ import logging
12
+ import sys
13
+ import time
14
+ from dotenv import load_dotenv
15
+ from livekit import agents
16
+ from openai import AsyncOpenAI
17
+
18
+ # Add parent directory to path to import common utilities
19
+ sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
20
+ from implementations.common.casl_utils import CASLAssessment, save_session_data, CASL_PROMPT
21
+
22
+ # Load environment variables
23
+ load_dotenv()
24
+
25
+ # Set up logging
26
+ logging.basicConfig(level=logging.INFO)
27
+ logger = logging.getLogger(__name__)
28
+
29
+ # Initialize OpenAI client
30
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
31
+
32
+ class GradioInputDevice(agents.InputDevice):
33
+ """Custom input device that works with Gradio"""
34
+
35
+ def __init__(self):
36
+ super().__init__()
37
+ self.audio_queue = asyncio.Queue()
38
+ self.is_active = True
39
+
40
+ async def receive(self) -> agents.AudioChunk:
41
+ """Receive audio data from the queue"""
42
+ try:
43
+ audio_data = await asyncio.wait_for(self.audio_queue.get(), timeout=0.1)
44
+ return audio_data
45
+ except asyncio.TimeoutError:
46
+ return None
47
+
48
+ async def add_audio(self, audio_data):
49
+ """Add audio data to the queue"""
50
+ if audio_data is None:
51
+ return
52
+
53
+ # Convert gradio audio format to AudioChunk
54
+ sample_rate, audio_array = audio_data
55
+ audio_chunk = agents.AudioChunk(
56
+ samples=audio_array,
57
+ sample_rate=sample_rate,
58
+ is_last=False
59
+ )
60
+ await self.audio_queue.put(audio_chunk)
61
+
62
+ def stop(self):
63
+ """Stop the input device"""
64
+ self.is_active = False
65
+
66
+
67
+ class GradioOutputDevice(agents.OutputDevice):
68
+ """Custom output device that works with Gradio"""
69
+
70
+ def __init__(self):
71
+ super().__init__()
72
+ self.output_queue = asyncio.Queue()
73
+
74
+ async def transmit(self, audio_chunk: agents.AudioChunk) -> None:
75
+ """Transmit audio chunk to the queue"""
76
+ if audio_chunk is not None:
77
+ await self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
78
+
79
+ async def get_latest_audio(self):
80
+ """Get the latest audio from the queue"""
81
+ try:
82
+ return await asyncio.wait_for(self.output_queue.get(), timeout=0.1)
83
+ except asyncio.TimeoutError:
84
+ return None
85
+
86
+
87
+ class SpeechPathologistAssistant:
88
+ """Speech pathologist assistant using LiveKit agents"""
89
+
90
+ def __init__(self):
91
+ self.input_device = GradioInputDevice()
92
+ self.output_device = GradioOutputDevice()
93
+ self.assistant = None
94
+ self.assistant_task = None
95
+ self.transcript = []
96
+ self.is_running = False
97
+ self.assessment = CASLAssessment()
98
+
99
+ async def initialize_assistant(self, voice="shimmer"):
100
+ """Initialize the speech assistant"""
101
+ self.assistant = agents.VoiceAssistant(
102
+ openai_client=openai_client,
103
+ model="gpt-4o",
104
+ voice=voice,
105
+ input_device=self.input_device,
106
+ output_device=self.output_device,
107
+ initial_message=CASL_PROMPT,
108
+ real_time=True, # Enable real-time processing
109
+ )
110
+
111
+ # Add transcript and response callbacks
112
+ self.assistant.on_transcript = self.on_transcript
113
+ self.assistant.on_response = self.on_response
114
+
115
+ def on_transcript(self, transcript):
116
+ """Handle transcript from user"""
117
+ self.transcript.append(f"Student: {transcript.text}")
118
+
119
+ # Basic analysis of speech for CASL-2 categories
120
+ self.assessment.analyze_speech(transcript.text)
121
+
122
+ return True
123
+
124
+ def on_response(self, response):
125
+ """Handle response from assistant"""
126
+ self.transcript.append(f"Speech Pathologist: {response.text}")
127
+ return True
128
+
129
+ async def start_assistant(self, voice_model, student_id):
130
+ """Start the assistant in a background task"""
131
+ await self.initialize_assistant(voice_model)
132
+
133
+ self.is_running = True
134
+
135
+ # Add student info to transcript
136
+ student_info = f" for {student_id}" if student_id else ""
137
+ self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
138
+
139
+ # Run the assistant in a background task
140
+ self.assistant_task = asyncio.create_task(self.assistant.run())
141
+
142
+ return "Session active. The AI will introduce itself."
143
+
144
+ def stop_assistant(self):
145
+ """Stop the assistant"""
146
+ if self.assistant_task and not self.assistant_task.done():
147
+ self.assistant_task.cancel()
148
+
149
+ self.input_device.stop()
150
+ self.is_running = False
151
+
152
+ # Add ending to transcript
153
+ self.transcript.append("Session ended.")
154
+ return "Session stopped."
155
+
156
+ async def process_audio(self, audio):
157
+ """Process audio from Gradio interface"""
158
+ if not self.is_running or audio is None:
159
+ return None, self.get_transcript(), self.assessment.get_assessment_html()
160
+
161
+ # Add audio to input device
162
+ await self.input_device.add_audio(audio)
163
+
164
+ # Check for assistant output
165
+ output_audio = await self.output_device.get_latest_audio()
166
+
167
+ return output_audio, self.get_transcript(), self.assessment.get_assessment_html()
168
+
169
+ def get_transcript(self):
170
+ """Get the current transcript"""
171
+ return "\n".join(self.transcript)
172
+
173
+ def add_note(self, note):
174
+ """Add a custom note"""
175
+ result = self.assessment.add_note(note)
176
+ return "", result, self.assessment.get_assessment_html()
177
+
178
+ def save_session(self, student_id):
179
+ """Save session to file"""
180
+ return save_session_data(self.transcript, self.assessment, student_id)
181
+
182
+
183
+ # Create the speech pathology assistant
184
+ speech_assistant = SpeechPathologistAssistant()
185
+
186
+
187
+ async def start_session(voice_model, student_id):
188
+ """Start the speech pathology session"""
189
+ status = await speech_assistant.start_assistant(voice_model, student_id)
190
+ return status, None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
191
+
192
+
193
+ def stop_session():
194
+ """Stop the speech pathology session"""
195
+ return speech_assistant.stop_assistant(), None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
196
+
197
+
198
+ async def process_mic_input(audio, progress=gr.Progress()):
199
+ """Process microphone input"""
200
+ progress(0, desc="Processing speech...")
201
+ audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
202
+ progress(1, desc="Done")
203
+ return audio_output, transcript, assessment
204
+
205
+
206
+ def add_note(note):
207
+ """Add a note to the session"""
208
+ return speech_assistant.add_note(note)
209
+
210
+
211
+ def save_session(student_id):
212
+ """Save the current session"""
213
+ return speech_assistant.save_session(student_id)
214
+
215
+
216
+ # Create Gradio Interface
217
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
218
+ gr.Markdown("# CASL-2 Speech Pathology Assistant")
219
+ gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
220
+
221
+ with gr.Row():
222
+ with gr.Column(scale=1):
223
+ student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
224
+ voice_select = gr.Dropdown(
225
+ ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
226
+ value="shimmer",
227
+ label="Assistant Voice"
228
+ )
229
+ start_button = gr.Button("Start Session", variant="primary")
230
+ stop_button = gr.Button("Stop Session", variant="stop")
231
+ status = gr.Textbox(label="Status", value="Ready to start")
232
+
233
+ with gr.Accordion("SLP Tools", open=True):
234
+ note_input = gr.Textbox(
235
+ label="Add Assessment Note",
236
+ placeholder="Enter observation or assessment note here..."
237
+ )
238
+ note_button = gr.Button("Add Note")
239
+ note_status = gr.Textbox(label="Note Status")
240
+ save_button = gr.Button("Save Session")
241
+ save_status = gr.Textbox(label="Save Status")
242
+
243
+ with gr.Column(scale=2):
244
+ audio_output = gr.Audio(label="AI Speech", autoplay=True)
245
+ audio_input = gr.Audio(
246
+ label="Speak to the AI",
247
+ type="microphone",
248
+ source="microphone",
249
+ streaming=True
250
+ )
251
+
252
+ with gr.Row():
253
+ with gr.Column(scale=1):
254
+ assessment_html = gr.HTML(label="Assessment Progress")
255
+ with gr.Column(scale=1):
256
+ transcript = gr.Textbox(label="Transcript", lines=10)
257
+
258
+ with gr.Accordion("About This Application", open=False):
259
+ gr.Markdown("""
260
+ ### About CASL-2 Speech Pathology Assistant
261
+
262
+ This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
263
+
264
+ - **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
265
+ - **Syntactic Skills**: Grammar and sentence structure
266
+ - **Supralinguistic Skills**: Higher-level language beyond literal meanings
267
+ - **Pragmatic Skills**: Social use of language (less emphasis for younger students)
268
+
269
+ The AI will provide structured assessments and exercises to help evaluate speech patterns.
270
+
271
+ ### How to Use
272
+
273
+ 1. Optionally enter a Student ID to track sessions
274
+ 2. Select the AI voice you prefer
275
+ 3. Click "Start Session" to begin
276
+ 4. The AI will introduce itself and begin the assessment
277
+ 5. Speak into your microphone when it's your turn
278
+ 6. View the transcript to track the conversation
279
+ 7. SLPs can add notes throughout the session
280
+ 8. Save the session when finished
281
+ 9. Click "Stop Session" when done
282
+
283
+ ### For Speech-Language Pathologists
284
+
285
+ This tool is designed to supplement, not replace, professional SLP services. SLPs can:
286
+
287
+ - Add custom notes during the session
288
+ - Save session data for later reference
289
+ - Track progress across multiple sessions
290
+ - Use the AI as a consistent assessment tool
291
+ """)
292
+
293
+ # Setup event handlers
294
+ start_button.click(
295
+ fn=lambda voice, student: asyncio.run(start_session(voice, student)),
296
+ inputs=[voice_select, student_id],
297
+ outputs=[status, audio_output, transcript, assessment_html]
298
+ )
299
+ stop_button.click(
300
+ fn=stop_session,
301
+ outputs=[status, audio_output, transcript, assessment_html]
302
+ )
303
+ note_button.click(
304
+ fn=add_note,
305
+ inputs=note_input,
306
+ outputs=[note_input, note_status, assessment_html]
307
+ )
308
+ save_button.click(
309
+ fn=save_session,
310
+ inputs=student_id,
311
+ outputs=save_status
312
+ )
313
+
314
+ # Setup audio processing
315
+ audio_input.stream(
316
+ fn=lambda audio: asyncio.run(process_mic_input(audio)),
317
+ inputs=audio_input,
318
+ outputs=[audio_output, transcript, assessment_html]
319
+ )
320
+
321
+
322
+ def main(share=True):
323
+ """Main function to launch the app"""
324
+ app.launch(share=share)
325
+
326
+
327
+ # Entry point for the application
328
+ if __name__ == "__main__":
329
+ main()
implementations/livekit/livekit_gradio_hf.py ADDED
@@ -0,0 +1,490 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ CASL Voice Bot - Hugging Face Spaces deployment version
5
+ Using LiveKit with Gradio for Hugging Face Spaces compatibility
6
+
7
+ This is a special version optimized for Hugging Face Spaces deployment
8
+ that works with LiveKit's WebRTC capabilities.
9
+ """
10
+
11
+ import os
12
+ import asyncio
13
+ import gradio as gr
14
+ import logging
15
+ import sys
16
+ import time
17
+ import importlib.util
18
+ from dotenv import load_dotenv
19
+ from openai import AsyncOpenAI
20
+
21
+ # Check if livekit-agents is installed
22
+ try:
23
+ from livekit import agents
24
+ LIVEKIT_AVAILABLE = True
25
+ except ImportError:
26
+ LIVEKIT_AVAILABLE = False
27
+ # Create dummy classes for type hinting
28
+ class agents:
29
+ class InputDevice: pass
30
+ class OutputDevice: pass
31
+ class AudioChunk: pass
32
+ class VoiceAssistant: pass
33
+
34
+ # Add parent directory to path to import common utilities
35
+ sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
36
+ from implementations.common.casl_utils import CASLAssessment, save_session_data, CASL_PROMPT
37
+
38
+ # Load environment variables
39
+ load_dotenv()
40
+
41
+ # Set up logging
42
+ logging.basicConfig(level=logging.INFO)
43
+ logger = logging.getLogger(__name__)
44
+
45
+ # Initialize OpenAI client
46
+ openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
47
+
48
+ # Hugging Face Spaces compatibility check
49
+ HF_SPACES = os.environ.get("SPACE_ID") is not None
50
+ logger.info(f"Running in Hugging Face Spaces: {HF_SPACES}")
51
+ logger.info(f"LiveKit available: {LIVEKIT_AVAILABLE}")
52
+
53
+ class GradioInputDevice(agents.InputDevice if LIVEKIT_AVAILABLE else object):
54
+ """Custom input device that works with Gradio"""
55
+
56
+ def __init__(self):
57
+ if LIVEKIT_AVAILABLE:
58
+ super().__init__()
59
+ self.audio_queue = asyncio.Queue()
60
+ self.is_active = True
61
+
62
+ async def receive(self):
63
+ """Receive audio data from the queue"""
64
+ try:
65
+ audio_data = await asyncio.wait_for(self.audio_queue.get(), timeout=0.1)
66
+ return audio_data
67
+ except asyncio.TimeoutError:
68
+ return None
69
+
70
+ async def add_audio(self, audio_data):
71
+ """Add audio data to the queue"""
72
+ if audio_data is None:
73
+ return
74
+
75
+ # Convert gradio audio format to AudioChunk
76
+ if LIVEKIT_AVAILABLE:
77
+ sample_rate, audio_array = audio_data
78
+ audio_chunk = agents.AudioChunk(
79
+ samples=audio_array,
80
+ sample_rate=sample_rate,
81
+ is_last=False
82
+ )
83
+ await self.audio_queue.put(audio_chunk)
84
+ else:
85
+ # Store raw audio data if LiveKit is not available
86
+ await self.audio_queue.put(audio_data)
87
+
88
+ def stop(self):
89
+ """Stop the input device"""
90
+ self.is_active = False
91
+
92
+
93
+ class GradioOutputDevice(agents.OutputDevice if LIVEKIT_AVAILABLE else object):
94
+ """Custom output device that works with Gradio"""
95
+
96
+ def __init__(self):
97
+ if LIVEKIT_AVAILABLE:
98
+ super().__init__()
99
+ self.output_queue = asyncio.Queue()
100
+
101
+ async def transmit(self, audio_chunk):
102
+ """Transmit audio chunk to the queue"""
103
+ if audio_chunk is not None:
104
+ if LIVEKIT_AVAILABLE:
105
+ await self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
106
+ else:
107
+ # Handle raw audio data if LiveKit is not available
108
+ await self.output_queue.put(audio_chunk)
109
+
110
+ async def get_latest_audio(self):
111
+ """Get the latest audio from the queue"""
112
+ try:
113
+ return await asyncio.wait_for(self.output_queue.get(), timeout=0.1)
114
+ except asyncio.TimeoutError:
115
+ return None
116
+
117
+
118
+ class SpeechPathologistAssistant:
119
+ """Speech pathologist assistant using LiveKit agents or direct OpenAI API"""
120
+
121
+ def __init__(self):
122
+ self.input_device = GradioInputDevice()
123
+ self.output_device = GradioOutputDevice()
124
+ self.assistant = None
125
+ self.assistant_task = None
126
+ self.transcript = []
127
+ self.is_running = False
128
+ self.assessment = CASLAssessment()
129
+ self.student_id = None
130
+ self.voice_model = "shimmer"
131
+
132
+ async def initialize_assistant(self, voice="shimmer"):
133
+ """Initialize the speech assistant"""
134
+ self.voice_model = voice
135
+
136
+ if LIVEKIT_AVAILABLE:
137
+ # Use LiveKit VoiceAssistant if available
138
+ self.assistant = agents.VoiceAssistant(
139
+ openai_client=openai_client,
140
+ model="gpt-4o",
141
+ voice=voice,
142
+ input_device=self.input_device,
143
+ output_device=self.output_device,
144
+ initial_message=CASL_PROMPT,
145
+ real_time=True, # Enable real-time processing
146
+ )
147
+
148
+ # Add transcript and response callbacks
149
+ self.assistant.on_transcript = self.on_transcript
150
+ self.assistant.on_response = self.on_response
151
+ else:
152
+ # If LiveKit is not available, we'll use direct OpenAI API
153
+ logger.info("LiveKit not available, using direct OpenAI API")
154
+
155
+ def on_transcript(self, transcript):
156
+ """Handle transcript from user (for LiveKit)"""
157
+ self.transcript.append(f"Student: {transcript.text}")
158
+
159
+ # Basic analysis of speech for CASL-2 categories
160
+ self.assessment.analyze_speech(transcript.text)
161
+
162
+ return True
163
+
164
+ def on_response(self, response):
165
+ """Handle response from assistant (for LiveKit)"""
166
+ self.transcript.append(f"Speech Pathologist: {response.text}")
167
+ return True
168
+
169
+ async def start_assistant(self, voice_model, student_id):
170
+ """Start the assistant in a background task"""
171
+ self.student_id = student_id
172
+ await self.initialize_assistant(voice_model)
173
+
174
+ self.is_running = True
175
+
176
+ # Add student info to transcript
177
+ student_info = f" for {student_id}" if student_id else ""
178
+ self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
179
+
180
+ if LIVEKIT_AVAILABLE:
181
+ # Run the LiveKit assistant in a background task
182
+ self.assistant_task = asyncio.create_task(self.assistant.run())
183
+ else:
184
+ # For direct OpenAI API, generate initial message
185
+ await self.generate_initial_message()
186
+
187
+ return "Session active. The AI will introduce itself."
188
+
189
+ async def generate_initial_message(self):
190
+ """Generate initial AI message when not using LiveKit"""
191
+ # Generate assistant response using OpenAI API directly
192
+ chat_response = await openai_client.chat.completions.create(
193
+ model="gpt-4o",
194
+ messages=[
195
+ {"role": "system", "content": CASL_PROMPT},
196
+ {"role": "user", "content": "Hello"} # Initial trigger
197
+ ]
198
+ )
199
+
200
+ assistant_text = chat_response.choices[0].message.content
201
+ self.transcript.append(f"Speech Pathologist: {assistant_text}")
202
+
203
+ # Generate speech from text
204
+ speech_response = await openai_client.audio.speech.create(
205
+ model="tts-1",
206
+ voice=self.voice_model,
207
+ input=assistant_text
208
+ )
209
+
210
+ # Save speech to temporary file to get audio data
211
+ import tempfile
212
+ import soundfile as sf
213
+
214
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
215
+ response_temp_file.close()
216
+
217
+ speech_response.stream_to_file(response_temp_file.name)
218
+
219
+ # Load audio data for Gradio
220
+ audio_data, sample_rate = sf.read(response_temp_file.name)
221
+
222
+ # Send to output device
223
+ await self.output_device.transmit((audio_data, sample_rate))
224
+
225
+ # Clean up
226
+ os.unlink(response_temp_file.name)
227
+
228
+ def stop_assistant(self):
229
+ """Stop the assistant"""
230
+ if self.assistant_task and not self.assistant_task.done():
231
+ self.assistant_task.cancel()
232
+
233
+ self.input_device.stop()
234
+ self.is_running = False
235
+
236
+ # Add ending to transcript
237
+ self.transcript.append("Session ended.")
238
+ return "Session stopped."
239
+
240
+ async def process_audio(self, audio):
241
+ """Process audio from Gradio interface"""
242
+ if not self.is_running or audio is None:
243
+ return None, self.get_transcript(), self.assessment.get_assessment_html()
244
+
245
+ if LIVEKIT_AVAILABLE:
246
+ # Add audio to input device for LiveKit
247
+ await self.input_device.add_audio(audio)
248
+
249
+ # Check for assistant output
250
+ output_audio = await self.output_device.get_latest_audio()
251
+ else:
252
+ # For direct OpenAI API
253
+ output_audio = await self.process_with_direct_api(audio)
254
+
255
+ return output_audio, self.get_transcript(), self.assessment.get_assessment_html()
256
+
257
+ async def process_with_direct_api(self, audio):
258
+ """Process audio using direct OpenAI API when LiveKit is not available"""
259
+ # Prepare audio file for OpenAI
260
+ import tempfile
261
+ import scipy.io.wavfile
262
+
263
+ temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
264
+ temp_file.close()
265
+
266
+ try:
267
+ # Save audio data to temporary file
268
+ sample_rate, audio_array = audio
269
+ scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
270
+
271
+ # Transcribe audio using OpenAI
272
+ with open(temp_file.name, "rb") as audio_file:
273
+ transcript_response = await openai_client.audio.transcriptions.create(
274
+ file=audio_file,
275
+ model="whisper-1"
276
+ )
277
+
278
+ user_text = transcript_response.text
279
+ if user_text.strip():
280
+ self.transcript.append(f"Student: {user_text}")
281
+
282
+ # Analyze speech for CASL-2 categories
283
+ self.assessment.analyze_speech(user_text)
284
+
285
+ # Generate assistant response
286
+ chat_response = await openai_client.chat.completions.create(
287
+ model="gpt-4o",
288
+ messages=[
289
+ {"role": "system", "content": CASL_PROMPT},
290
+ {"role": "user", "content": user_text}
291
+ ]
292
+ )
293
+
294
+ assistant_text = chat_response.choices[0].message.content
295
+ self.transcript.append(f"Speech Pathologist: {assistant_text}")
296
+
297
+ # Generate speech from text
298
+ speech_response = await openai_client.audio.speech.create(
299
+ model="tts-1",
300
+ voice=self.voice_model,
301
+ input=assistant_text
302
+ )
303
+
304
+ # Save speech to temporary file
305
+ import soundfile as sf
306
+
307
+ response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
308
+ response_temp_file.close()
309
+
310
+ speech_response.stream_to_file(response_temp_file.name)
311
+
312
+ # Load audio data for Gradio
313
+ audio_data, sample_rate = sf.read(response_temp_file.name)
314
+
315
+ # Clean up
316
+ os.unlink(response_temp_file.name)
317
+
318
+ return (sample_rate, audio_data)
319
+ except Exception as e:
320
+ logger.error(f"Error processing with direct API: {e}")
321
+ finally:
322
+ # Clean up temp file
323
+ os.unlink(temp_file.name)
324
+
325
+ return None
326
+
327
+ def get_transcript(self):
328
+ """Get the current transcript"""
329
+ return "\n".join(self.transcript)
330
+
331
+ def add_note(self, note):
332
+ """Add a custom note"""
333
+ result = self.assessment.add_note(note)
334
+ return "", result, self.assessment.get_assessment_html()
335
+
336
+ def save_session(self, student_id=None):
337
+ """Save session to file"""
338
+ student_id = student_id or self.student_id
339
+ return save_session_data(self.transcript, self.assessment, student_id)
340
+
341
+
342
+ # Create the speech pathology assistant
343
+ speech_assistant = SpeechPathologistAssistant()
344
+
345
+
346
+ async def start_session(voice_model, student_id):
347
+ """Start the speech pathology session"""
348
+ status = await speech_assistant.start_assistant(voice_model, student_id)
349
+ return status, None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
350
+
351
+
352
+ def stop_session():
353
+ """Stop the speech pathology session"""
354
+ return speech_assistant.stop_assistant(), None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
355
+
356
+
357
+ async def process_mic_input(audio, progress=gr.Progress()):
358
+ """Process microphone input"""
359
+ progress(0, desc="Processing speech...")
360
+ audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
361
+ progress(1, desc="Done")
362
+ return audio_output, transcript, assessment
363
+
364
+
365
+ def add_note(note):
366
+ """Add a note to the session"""
367
+ return speech_assistant.add_note(note)
368
+
369
+
370
+ def save_session(student_id):
371
+ """Save the current session"""
372
+ return speech_assistant.save_session(student_id)
373
+
374
+
375
+ # Create Gradio Interface
376
+ with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
377
+ gr.Markdown("# CASL-2 Speech Pathology Assistant")
378
+ gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
379
+
380
+ # Show LiveKit availability
381
+ if not LIVEKIT_AVAILABLE:
382
+ gr.Markdown(
383
+ "⚠️ **Notice:** Running without LiveKit agents. Using direct OpenAI API instead. "
384
+ "For best performance, install livekit-agents package."
385
+ )
386
+
387
+ with gr.Row():
388
+ with gr.Column(scale=1):
389
+ student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
390
+ voice_select = gr.Dropdown(
391
+ ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
392
+ value="shimmer",
393
+ label="Assistant Voice"
394
+ )
395
+ start_button = gr.Button("Start Session", variant="primary")
396
+ stop_button = gr.Button("Stop Session", variant="stop")
397
+ status = gr.Textbox(label="Status", value="Ready to start")
398
+
399
+ with gr.Accordion("SLP Tools", open=True):
400
+ note_input = gr.Textbox(
401
+ label="Add Assessment Note",
402
+ placeholder="Enter observation or assessment note here..."
403
+ )
404
+ note_button = gr.Button("Add Note")
405
+ note_status = gr.Textbox(label="Note Status")
406
+ save_button = gr.Button("Save Session")
407
+ save_status = gr.Textbox(label="Save Status")
408
+
409
+ with gr.Column(scale=2):
410
+ audio_output = gr.Audio(label="AI Speech", autoplay=True)
411
+ audio_input = gr.Audio(
412
+ label="Speak to the AI",
413
+ type="microphone",
414
+ source="microphone",
415
+ streaming=True
416
+ )
417
+
418
+ with gr.Row():
419
+ with gr.Column(scale=1):
420
+ assessment_html = gr.HTML(label="Assessment Progress")
421
+ with gr.Column(scale=1):
422
+ transcript = gr.Textbox(label="Transcript", lines=10)
423
+
424
+ with gr.Accordion("About This Application", open=False):
425
+ gr.Markdown("""
426
+ ### About CASL-2 Speech Pathology Assistant
427
+
428
+ This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
429
+
430
+ - **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
431
+ - **Syntactic Skills**: Grammar and sentence structure
432
+ - **Supralinguistic Skills**: Higher-level language beyond literal meanings
433
+ - **Pragmatic Skills**: Social use of language (less emphasis for younger students)
434
+
435
+ The AI will provide structured assessments and exercises to help evaluate speech patterns.
436
+
437
+ ### How to Use
438
+
439
+ 1. Optionally enter a Student ID to track sessions
440
+ 2. Select the AI voice you prefer
441
+ 3. Click "Start Session" to begin
442
+ 4. The AI will introduce itself and begin the assessment
443
+ 5. Speak into your microphone when it's your turn
444
+ 6. View the transcript to track the conversation
445
+ 7. SLPs can add notes throughout the session
446
+ 8. Save the session when finished
447
+ 9. Click "Stop Session" when done
448
+
449
+ ### For Speech-Language Pathologists
450
+
451
+ This tool is designed to supplement, not replace, professional SLP services. SLPs can:
452
+
453
+ - Add custom notes during the session
454
+ - Save session data for later reference
455
+ - Track progress across multiple sessions
456
+ - Use the AI as a consistent assessment tool
457
+ """)
458
+
459
+ # Setup event handlers
460
+ start_button.click(
461
+ fn=lambda voice, student: asyncio.run(start_session(voice, student)),
462
+ inputs=[voice_select, student_id],
463
+ outputs=[status, audio_output, transcript, assessment_html]
464
+ )
465
+ stop_button.click(
466
+ fn=stop_session,
467
+ outputs=[status, audio_output, transcript, assessment_html]
468
+ )
469
+ note_button.click(
470
+ fn=add_note,
471
+ inputs=note_input,
472
+ outputs=[note_input, note_status, assessment_html]
473
+ )
474
+ save_button.click(
475
+ fn=save_session,
476
+ inputs=student_id,
477
+ outputs=save_status
478
+ )
479
+
480
+ # Setup audio processing
481
+ audio_input.stream(
482
+ fn=lambda audio: asyncio.run(process_mic_input(audio)),
483
+ inputs=audio_input,
484
+ outputs=[audio_output, transcript, assessment_html]
485
+ )
486
+
487
+
488
+ # Entry point for the application
489
+ if __name__ == "__main__":
490
+ app.launch(share=True)
livekit_requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ python-dotenv>=1.0.0
2
+ livekit-agents>=0.7.0
3
+ openai>=1.3.0
4
+ gradio>=4.0.0
5
+ soundfile>=0.12.1
6
+ scipy>=1.10.0
7
+ asyncio>=3.4.3
8
+ numpy>=1.24.0
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ python-dotenv>=1.0.0
2
+ openai>=1.3.0
3
+ gradio>=4.0.0
4
+ soundfile>=0.12.1
5
+ scipy>=1.10.0
6
+ asyncio>=3.4.3
7
+ numpy>=1.24.0
8
+
9
+ # Optional: LiveKit integration (uncomment to use)
10
+ # livekit-agents>=0.7.0
run.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Main entry point for the CASL Voice Bot application.
5
+ Launches the Gradio web interface.
6
+ """
7
+
8
+ import argparse
9
+ from app.gradio_app import main
10
+
11
+ if __name__ == "__main__":
12
+ parser = argparse.ArgumentParser(description="CASL Voice Bot - AI Speech Therapist")
13
+ parser.add_argument("--share", action="store_true", help="Share the app publicly (for HuggingFace)")
14
+ args = parser.parse_args()
15
+
16
+ main()
run_app.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Main entry point for the CASL Voice Bot application.
5
+ Launches the Gradio web interface.
6
+ """
7
+
8
+ import argparse
9
+ from app_main import main
10
+
11
+ if __name__ == "__main__":
12
+ parser = argparse.ArgumentParser(description="CASL Voice Bot - AI Speech Therapist")
13
+ parser.add_argument("--share", action="store_true", help="Share the app publicly")
14
+ parser.add_argument("--local", action="store_true", help="Run the app locally without sharing")
15
+ args = parser.parse_args()
16
+
17
+ main(share=not args.local)
run_direct.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Run the CASL Voice Bot using direct OpenAI API (no LiveKit)
5
+ """
6
+
7
+ import argparse
8
+ from implementations.direct.app import main
9
+
10
+ if __name__ == "__main__":
11
+ parser = argparse.ArgumentParser(description="CASL Voice Bot - AI Speech Therapist (Direct API)")
12
+ parser.add_argument("--share", action="store_true", help="Share the app publicly")
13
+ parser.add_argument("--local", action="store_true", help="Run the app locally without sharing")
14
+ args = parser.parse_args()
15
+
16
+ main(share=not args.local)
run_livekit.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ """
4
+ Run the CASL Voice Bot using LiveKit and OpenAI real-time capabilities
5
+ """
6
+
7
+ import argparse
8
+ from implementations.livekit.app import main
9
+
10
+ if __name__ == "__main__":
11
+ parser = argparse.ArgumentParser(description="CASL Voice Bot with LiveKit - AI Speech Therapist")
12
+ parser.add_argument("--share", action="store_true", help="Share the app publicly")
13
+ parser.add_argument("--local", action="store_true", help="Run the app locally without sharing")
14
+ args = parser.parse_args()
15
+
16
+ main(share=not args.local)