Spaces:
Configuration error
Configuration error
Upload 31 files
Browse files- README.md +123 -12
- README_livekit.md +88 -0
- __pycache__/app_main.cpython-311.pyc +0 -0
- app.py +12 -0
- app/.env.example +2 -0
- app/__init__.py +5 -0
- app/advanced_features.py +169 -0
- app/gradio_app.py +255 -0
- app/huggingface_app.py +297 -0
- app/main.py +100 -0
- app/report_generator.py +285 -0
- app/requirements.txt +7 -0
- app/templates/report_template.html +196 -0
- app_hf.py +12 -0
- app_livekit.py +469 -0
- app_main.py +469 -0
- app_ui.py +469 -0
- huggingface_requirements.txt +7 -0
- implementations/common/__init__.py +3 -0
- implementations/common/casl_utils.py +167 -0
- implementations/direct/__init__.py +3 -0
- implementations/direct/app.py +334 -0
- implementations/livekit/__init__.py +3 -0
- implementations/livekit/app.py +329 -0
- implementations/livekit/livekit_gradio_hf.py +490 -0
- livekit_requirements.txt +8 -0
- requirements.txt +10 -0
- run.py +16 -0
- run_app.py +17 -0
- run_direct.py +16 -0
- run_livekit.py +16 -0
README.md
CHANGED
|
@@ -1,12 +1,123 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CASL Voice Bot
|
| 2 |
+
|
| 3 |
+
A speech pathology assistant using AI to assess students' speaking abilities based on the CASL-2 framework. This application helps speech-language pathologists (SLPs) with speech assessment in school settings.
|
| 4 |
+
|
| 5 |
+
## Implementations
|
| 6 |
+
|
| 7 |
+
This project provides multiple implementations:
|
| 8 |
+
|
| 9 |
+
1. **LiveKit Implementation** - Uses LiveKit agents with OpenAI's real-time voice API for low-latency, high-quality audio streaming.
|
| 10 |
+
2. **Direct API Implementation** - Uses OpenAI's API directly without LiveKit, for simpler deployment.
|
| 11 |
+
3. **Hugging Face Spaces** - An adaptive implementation that works on Hugging Face Spaces, automatically detecting whether LiveKit is available.
|
| 12 |
+
|
| 13 |
+
## Features
|
| 14 |
+
|
| 15 |
+
- Voice-to-voice interaction with AI speech pathologist
|
| 16 |
+
- CASL-2 framework assessment
|
| 17 |
+
- Real-time assessment tracking
|
| 18 |
+
- Session recording and saving
|
| 19 |
+
- Custom note-taking for SLPs
|
| 20 |
+
- Gradio web interface for easy sharing and use in school settings
|
| 21 |
+
|
| 22 |
+
## CASL-2 Assessment Areas
|
| 23 |
+
|
| 24 |
+
The AI speech pathologist assesses students in these key areas:
|
| 25 |
+
|
| 26 |
+
1. **Lexical/Semantic Skills**: Vocabulary knowledge, word meanings, and contextual word use
|
| 27 |
+
2. **Syntactic Skills**: Grammar and sentence structure understanding
|
| 28 |
+
3. **Supralinguistic Skills**: Higher-level language skills beyond literal meanings
|
| 29 |
+
4. **Pragmatic Skills**: Language use in social contexts (less emphasis for younger students)
|
| 30 |
+
|
| 31 |
+
## Setup Instructions
|
| 32 |
+
|
| 33 |
+
### Prerequisites
|
| 34 |
+
|
| 35 |
+
- Python 3.8+
|
| 36 |
+
- OpenAI API key with access to GPT-4o and TTS models
|
| 37 |
+
|
| 38 |
+
### Installation
|
| 39 |
+
|
| 40 |
+
1. Clone the repository:
|
| 41 |
+
```
|
| 42 |
+
git clone https://github.com/yourusername/CASLVoiceBot.git
|
| 43 |
+
cd CASLVoiceBot
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
2. Create a virtual environment and install dependencies:
|
| 47 |
+
```
|
| 48 |
+
python -m venv venv
|
| 49 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 50 |
+
pip install -r requirements.txt
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
3. For LiveKit implementation, install LiveKit dependencies:
|
| 54 |
+
```
|
| 55 |
+
# Edit requirements.txt to uncomment the livekit-agents line
|
| 56 |
+
pip install livekit-agents>=0.7.0
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
4. Set up environment variables:
|
| 60 |
+
```
|
| 61 |
+
cp .env.example .env
|
| 62 |
+
```
|
| 63 |
+
Then edit `.env` to add your OpenAI API key.
|
| 64 |
+
|
| 65 |
+
### Running the Application
|
| 66 |
+
|
| 67 |
+
#### LiveKit Implementation (recommended for best performance)
|
| 68 |
+
```
|
| 69 |
+
python run_livekit.py
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
#### Direct API Implementation (simpler deployment)
|
| 73 |
+
```
|
| 74 |
+
python run_direct.py
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
#### Command Line Options
|
| 78 |
+
Both implementations support these options:
|
| 79 |
+
- `--share`: Share the app publicly (enabled by default)
|
| 80 |
+
- `--local`: Run the app locally without sharing
|
| 81 |
+
|
| 82 |
+
## Deployment on Hugging Face Spaces
|
| 83 |
+
|
| 84 |
+
1. Create a new Space on Hugging Face with the Gradio SDK
|
| 85 |
+
2. Upload the repository contents to the Space
|
| 86 |
+
3. Add your OPENAI_API_KEY as a secret in the Space settings
|
| 87 |
+
|
| 88 |
+
By default, the Hugging Face Spaces deployment will try to use LiveKit if available, and fall back to direct API if not.
|
| 89 |
+
|
| 90 |
+
## Project Structure
|
| 91 |
+
|
| 92 |
+
```
|
| 93 |
+
CASLVoiceBot/
|
| 94 |
+
├── app.py # Hugging Face Spaces entry point
|
| 95 |
+
├── run_direct.py # Direct API implementation runner
|
| 96 |
+
├── run_livekit.py # LiveKit implementation runner
|
| 97 |
+
├── requirements.txt # Common dependencies
|
| 98 |
+
├── .env.example # Environment variables template
|
| 99 |
+
├── implementations/
|
| 100 |
+
│ ├── common/ # Shared utilities
|
| 101 |
+
│ │ ├── casl_utils.py # CASL-2 assessment utilities
|
| 102 |
+
│ ├── direct/ # Direct API implementation
|
| 103 |
+
│ │ ├── app.py # Direct OpenAI API app
|
| 104 |
+
│ ├── livekit/ # LiveKit implementation
|
| 105 |
+
│ │ ├── app.py # LiveKit app
|
| 106 |
+
│ │ ├── livekit_gradio_hf.py # HF-compatible LiveKit app
|
| 107 |
+
├── session_data/ # Saved session data
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
## Usage
|
| 111 |
+
|
| 112 |
+
1. Optionally enter a Student ID to track sessions
|
| 113 |
+
2. Select your preferred AI voice
|
| 114 |
+
3. Click "Start Session" to begin a speech assessment
|
| 115 |
+
4. Wait for the AI to introduce itself, then speak when prompted
|
| 116 |
+
5. View real-time assessment in the interface
|
| 117 |
+
6. SLPs can add notes throughout the session
|
| 118 |
+
7. Save the session when finished
|
| 119 |
+
8. Click "Stop Session" to end
|
| 120 |
+
|
| 121 |
+
## License
|
| 122 |
+
|
| 123 |
+
[MIT License](LICENSE)
|
README_livekit.md
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CASL Voice Bot with LiveKit
|
| 2 |
+
|
| 3 |
+
A speech pathology assistant using LiveKit agents with OpenAI's real-time voice capabilities. This application helps speech-language pathologists (SLPs) assess students' speaking abilities based on the CASL-2 framework.
|
| 4 |
+
|
| 5 |
+
## Features
|
| 6 |
+
|
| 7 |
+
- Real-time voice interaction with AI speech pathologist using LiveKit
|
| 8 |
+
- OpenAI's GPT-4o for intelligent conversation
|
| 9 |
+
- CASL-2 framework assessment
|
| 10 |
+
- Real-time assessment tracking
|
| 11 |
+
- Session recording and saving
|
| 12 |
+
- Custom note-taking for SLPs
|
| 13 |
+
- Gradio web interface for easy sharing and use in school settings
|
| 14 |
+
|
| 15 |
+
## CASL-2 Assessment Areas
|
| 16 |
+
|
| 17 |
+
The AI speech pathologist assesses students in these key areas:
|
| 18 |
+
|
| 19 |
+
1. **Lexical/Semantic Skills**: Vocabulary knowledge, word meanings, and contextual word use
|
| 20 |
+
2. **Syntactic Skills**: Grammar and sentence structure understanding
|
| 21 |
+
3. **Supralinguistic Skills**: Higher-level language skills beyond literal meanings
|
| 22 |
+
4. **Pragmatic Skills**: Language use in social contexts (less emphasis for younger students)
|
| 23 |
+
|
| 24 |
+
## Setup Instructions
|
| 25 |
+
|
| 26 |
+
### Prerequisites
|
| 27 |
+
|
| 28 |
+
- Python 3.8+
|
| 29 |
+
- OpenAI API key with access to GPT-4o and TTS models
|
| 30 |
+
- Created using the LiveKit multimodal agent template
|
| 31 |
+
|
| 32 |
+
### Installation
|
| 33 |
+
|
| 34 |
+
1. Clone the repository:
|
| 35 |
+
```
|
| 36 |
+
git clone https://github.com/yourusername/CASLVoiceBot.git
|
| 37 |
+
cd CASLVoiceBot
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
2. Create a virtual environment and install dependencies:
|
| 41 |
+
```
|
| 42 |
+
python -m venv venv
|
| 43 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 44 |
+
pip install -r livekit_requirements.txt
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
3. Set up environment variables:
|
| 48 |
+
```
|
| 49 |
+
cp .env.example .env
|
| 50 |
+
```
|
| 51 |
+
Then edit `.env` to add your OpenAI API key.
|
| 52 |
+
|
| 53 |
+
### Running the Application
|
| 54 |
+
|
| 55 |
+
1. Start the application:
|
| 56 |
+
```
|
| 57 |
+
python run_livekit.py
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
2. Access the application through the URL provided in the terminal.
|
| 61 |
+
|
| 62 |
+
## Usage
|
| 63 |
+
|
| 64 |
+
1. Optionally enter a Student ID to track sessions
|
| 65 |
+
2. Select your preferred AI voice
|
| 66 |
+
3. Click "Start Session" to begin a speech assessment
|
| 67 |
+
4. Wait for the AI to introduce itself, then speak when prompted
|
| 68 |
+
5. View real-time assessment in the interface
|
| 69 |
+
6. SLPs can add notes throughout the session
|
| 70 |
+
7. Save the session when finished
|
| 71 |
+
8. Click "Stop Session" to end
|
| 72 |
+
|
| 73 |
+
## Benefits of Using LiveKit
|
| 74 |
+
|
| 75 |
+
- **Real-time Audio Processing**: LiveKit provides robust real-time audio streaming capabilities
|
| 76 |
+
- **Low Latency**: Minimizes delay between student speech and AI response
|
| 77 |
+
- **WebRTC Infrastructure**: Built on the same technology used for video calls
|
| 78 |
+
- **Connection Management**: Automatically handles connection issues and reconnections
|
| 79 |
+
- **Scalability**: Can support multiple concurrent sessions if needed
|
| 80 |
+
- **Agent Integration**: LiveKit's agent system is designed specifically for AI assistants
|
| 81 |
+
|
| 82 |
+
## Deployment on Hugging Face Spaces
|
| 83 |
+
|
| 84 |
+
For deployment on Hugging Face Spaces, additional configuration may be required due to LiveKit's WebRTC requirements. Please refer to LiveKit documentation for details on setting up appropriate server configurations.
|
| 85 |
+
|
| 86 |
+
## License
|
| 87 |
+
|
| 88 |
+
[MIT License](LICENSE)
|
__pycache__/app_main.cpython-311.pyc
ADDED
|
Binary file (27.2 kB). View file
|
|
|
app.py
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
CASL Voice Bot - Hugging Face Spaces entry point
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Import the adaptive app that works in both LiveKit and direct modes
|
| 8 |
+
from implementations.livekit.livekit_gradio_hf import app
|
| 9 |
+
|
| 10 |
+
# This is the entry point that Hugging Face Spaces will use
|
| 11 |
+
if __name__ == "__main__":
|
| 12 |
+
app.launch()
|
app/.env.example
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OpenAI API Key
|
| 2 |
+
OPENAI_API_KEY=your_openai_api_key
|
app/__init__.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
CASL Voice Bot - AI Speech Therapist
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
__version__ = "0.1.0"
|
app/advanced_features.py
ADDED
|
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Advanced features that can be added to the CASL Voice Bot application.
|
| 5 |
+
This module contains extensions that SLPs might want to add to the base system.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import pandas as pd
|
| 10 |
+
import datetime
|
| 11 |
+
import json
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
|
| 14 |
+
class SessionRecorder:
|
| 15 |
+
"""Records session data for later analysis and progress tracking"""
|
| 16 |
+
|
| 17 |
+
def __init__(self, storage_dir="session_data"):
|
| 18 |
+
self.storage_dir = storage_dir
|
| 19 |
+
Path(storage_dir).mkdir(exist_ok=True)
|
| 20 |
+
self.current_session = {
|
| 21 |
+
"timestamp": datetime.datetime.now().isoformat(),
|
| 22 |
+
"student_id": None,
|
| 23 |
+
"transcript": [],
|
| 24 |
+
"assessment": {}
|
| 25 |
+
}
|
| 26 |
+
|
| 27 |
+
def set_student_id(self, student_id):
|
| 28 |
+
"""Set the student ID for the current session"""
|
| 29 |
+
self.current_session["student_id"] = student_id
|
| 30 |
+
|
| 31 |
+
def add_transcript_entry(self, speaker, text):
|
| 32 |
+
"""Add an entry to the transcript"""
|
| 33 |
+
self.current_session["transcript"].append({
|
| 34 |
+
"timestamp": datetime.datetime.now().isoformat(),
|
| 35 |
+
"speaker": speaker,
|
| 36 |
+
"text": text
|
| 37 |
+
})
|
| 38 |
+
|
| 39 |
+
def add_assessment_note(self, category, note):
|
| 40 |
+
"""Add an assessment note for a CASL-2 category"""
|
| 41 |
+
if category not in self.current_session["assessment"]:
|
| 42 |
+
self.current_session["assessment"][category] = []
|
| 43 |
+
|
| 44 |
+
self.current_session["assessment"][category].append({
|
| 45 |
+
"timestamp": datetime.datetime.now().isoformat(),
|
| 46 |
+
"note": note
|
| 47 |
+
})
|
| 48 |
+
|
| 49 |
+
def save_session(self):
|
| 50 |
+
"""Save the current session to a JSON file"""
|
| 51 |
+
student_id = self.current_session["student_id"] or "anonymous"
|
| 52 |
+
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
|
| 53 |
+
filename = f"{student_id}_{timestamp}.json"
|
| 54 |
+
|
| 55 |
+
with open(os.path.join(self.storage_dir, filename), 'w') as f:
|
| 56 |
+
json.dump(self.current_session, f, indent=2)
|
| 57 |
+
|
| 58 |
+
return filename
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
class CASLAnalyzer:
|
| 62 |
+
"""Analyzes transcripts based on CASL-2 framework categories"""
|
| 63 |
+
|
| 64 |
+
def __init__(self):
|
| 65 |
+
self.categories = {
|
| 66 |
+
"lexical_semantic": {
|
| 67 |
+
"description": "Vocabulary knowledge and word meanings",
|
| 68 |
+
"keywords": ["synonym", "antonym", "vocabulary", "word choice", "meaning"]
|
| 69 |
+
},
|
| 70 |
+
"syntactic": {
|
| 71 |
+
"description": "Grammar and sentence structure",
|
| 72 |
+
"keywords": ["grammar", "sentence", "verb tense", "agreement", "structure"]
|
| 73 |
+
},
|
| 74 |
+
"supralinguistic": {
|
| 75 |
+
"description": "Higher-level language skills",
|
| 76 |
+
"keywords": ["inference", "figurative", "metaphor", "context", "implied"]
|
| 77 |
+
},
|
| 78 |
+
"pragmatic": {
|
| 79 |
+
"description": "Social use of language",
|
| 80 |
+
"keywords": ["conversation", "social", "turn-taking", "appropriate", "context"]
|
| 81 |
+
}
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
def categorize_text(self, text):
|
| 85 |
+
"""Categorize text into CASL-2 framework categories"""
|
| 86 |
+
result = {}
|
| 87 |
+
|
| 88 |
+
for category, info in self.categories.items():
|
| 89 |
+
score = 0
|
| 90 |
+
for keyword in info["keywords"]:
|
| 91 |
+
if keyword.lower() in text.lower():
|
| 92 |
+
score += 1
|
| 93 |
+
|
| 94 |
+
if score > 0:
|
| 95 |
+
result[category] = score
|
| 96 |
+
|
| 97 |
+
return result
|
| 98 |
+
|
| 99 |
+
def generate_summary(self, transcript):
|
| 100 |
+
"""Generate a summary of the transcript based on CASL-2 categories"""
|
| 101 |
+
all_text = " ".join([entry["text"] for entry in transcript])
|
| 102 |
+
categorization = self.categorize_text(all_text)
|
| 103 |
+
|
| 104 |
+
summary = {
|
| 105 |
+
"categories_covered": list(categorization.keys()),
|
| 106 |
+
"focus_areas": sorted(categorization.items(), key=lambda x: x[1], reverse=True),
|
| 107 |
+
"recommendations": []
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
# Generate recommendations based on categories covered
|
| 111 |
+
for category in self.categories:
|
| 112 |
+
if category not in categorization:
|
| 113 |
+
summary["recommendations"].append(
|
| 114 |
+
f"Consider adding more {self.categories[category]['description']} exercises"
|
| 115 |
+
)
|
| 116 |
+
|
| 117 |
+
return summary
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
class VoiceMetricsAnalyzer:
|
| 121 |
+
"""Analyzes voice metrics for speech patterns"""
|
| 122 |
+
|
| 123 |
+
def __init__(self):
|
| 124 |
+
self.metrics = {
|
| 125 |
+
"word_count": 0,
|
| 126 |
+
"unique_words": set(),
|
| 127 |
+
"sentence_count": 0,
|
| 128 |
+
"average_words_per_sentence": 0,
|
| 129 |
+
"hesitations": 0,
|
| 130 |
+
"speech_rate": 0 # words per minute
|
| 131 |
+
}
|
| 132 |
+
|
| 133 |
+
def analyze_text(self, text, duration_seconds=None):
|
| 134 |
+
"""Analyze text for speech metrics"""
|
| 135 |
+
# Count words
|
| 136 |
+
words = text.split()
|
| 137 |
+
self.metrics["word_count"] = len(words)
|
| 138 |
+
self.metrics["unique_words"] = set(word.lower() for word in words)
|
| 139 |
+
|
| 140 |
+
# Count sentences
|
| 141 |
+
sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
|
| 142 |
+
self.metrics["sentence_count"] = len(sentences)
|
| 143 |
+
|
| 144 |
+
# Calculate average words per sentence
|
| 145 |
+
if self.metrics["sentence_count"] > 0:
|
| 146 |
+
self.metrics["average_words_per_sentence"] = self.metrics["word_count"] / self.metrics["sentence_count"]
|
| 147 |
+
|
| 148 |
+
# Count hesitations ("um", "uh", "like", etc.)
|
| 149 |
+
hesitation_markers = ["um", "uh", "er", "like", "you know"]
|
| 150 |
+
self.metrics["hesitations"] = sum(1 for word in words if word.lower() in hesitation_markers)
|
| 151 |
+
|
| 152 |
+
# Calculate speech rate if duration is provided
|
| 153 |
+
if duration_seconds:
|
| 154 |
+
self.metrics["speech_rate"] = (self.metrics["word_count"] / duration_seconds) * 60
|
| 155 |
+
|
| 156 |
+
return self.metrics
|
| 157 |
+
|
| 158 |
+
def get_summary(self):
|
| 159 |
+
"""Get a summary of the voice metrics analysis"""
|
| 160 |
+
return {
|
| 161 |
+
"word_count": self.metrics["word_count"],
|
| 162 |
+
"vocabulary_diversity": len(self.metrics["unique_words"]) / max(1, self.metrics["word_count"]),
|
| 163 |
+
"average_words_per_sentence": self.metrics["average_words_per_sentence"],
|
| 164 |
+
"hesitation_frequency": self.metrics["hesitations"] / max(1, self.metrics["word_count"]),
|
| 165 |
+
"speech_rate": self.metrics["speech_rate"]
|
| 166 |
+
}
|
| 167 |
+
|
| 168 |
+
|
| 169 |
+
# These classes can be imported and used to extend the base CASL Voice Bot with additional features
|
app/gradio_app.py
ADDED
|
@@ -0,0 +1,255 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
import os
|
| 4 |
+
import asyncio
|
| 5 |
+
import gradio as gr
|
| 6 |
+
import logging
|
| 7 |
+
import tempfile
|
| 8 |
+
import queue
|
| 9 |
+
import threading
|
| 10 |
+
import time
|
| 11 |
+
from dotenv import load_dotenv
|
| 12 |
+
from livekit import agents
|
| 13 |
+
from openai import AsyncOpenAI
|
| 14 |
+
|
| 15 |
+
# Load environment variables
|
| 16 |
+
load_dotenv()
|
| 17 |
+
|
| 18 |
+
# Set up logging
|
| 19 |
+
logging.basicConfig(level=logging.INFO)
|
| 20 |
+
logger = logging.getLogger(__name__)
|
| 21 |
+
|
| 22 |
+
# Initialize OpenAI client
|
| 23 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 24 |
+
|
| 25 |
+
# Speech Pathologist Agent Prompt
|
| 26 |
+
SPEECH_PATHOLOGIST_PROMPT = """
|
| 27 |
+
You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
|
| 28 |
+
Your are working with a student with speech impediments typically with ASD
|
| 29 |
+
You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
|
| 30 |
+
Each domain from the CASL-2 framework can be analyzed using the sample:
|
| 31 |
+
Lexical/Semantic Skills:
|
| 32 |
+
This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
|
| 33 |
+
Key Subtests:
|
| 34 |
+
Antonyms: Identifying words with opposite meanings.
|
| 35 |
+
Synonyms: Identifying words with similar meanings.
|
| 36 |
+
Idiomatic Language: Understanding and interpreting idioms and figurative language.
|
| 37 |
+
Evaluate vocabulary diversity (type-token ratio).
|
| 38 |
+
Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
|
| 39 |
+
Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
|
| 40 |
+
Syntactic Skills:
|
| 41 |
+
This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
|
| 42 |
+
Key Subtests:
|
| 43 |
+
Sentence Expression: Producing grammatically correct sentences based on prompts.
|
| 44 |
+
Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
|
| 45 |
+
Examine sentence structure for grammatical accuracy.
|
| 46 |
+
Identify errors in verb tense, subject-verb agreement, or sentence complexity.
|
| 47 |
+
Note the use of clauses, conjunctions, and varied sentence types.
|
| 48 |
+
Supralinguistic Skills:
|
| 49 |
+
This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
|
| 50 |
+
Key Subtests:
|
| 51 |
+
Inferences: Understanding information that is not explicitly stated.
|
| 52 |
+
Meaning from Context: Deriving meaning from surrounding text or dialogue.
|
| 53 |
+
Nonliteral Language: Interpreting figurative language, such as metaphors or irony
|
| 54 |
+
Look for use or understanding of figurative language, idioms, or humor.
|
| 55 |
+
Assess ability to handle ambiguous or implied meanings in context.
|
| 56 |
+
Identify advanced language use for abstract or hypothetical ideas.
|
| 57 |
+
Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
|
| 58 |
+
This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
|
| 59 |
+
Key Subtests:
|
| 60 |
+
Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
|
| 61 |
+
|
| 62 |
+
Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
|
| 63 |
+
"""
|
| 64 |
+
|
| 65 |
+
class GradioInputDevice(agents.InputDevice):
|
| 66 |
+
"""Custom input device that works with Gradio"""
|
| 67 |
+
|
| 68 |
+
def __init__(self):
|
| 69 |
+
super().__init__()
|
| 70 |
+
self.audio_queue = queue.Queue()
|
| 71 |
+
self.is_active = True
|
| 72 |
+
|
| 73 |
+
async def receive(self) -> agents.AudioChunk:
|
| 74 |
+
"""Receive audio data from the queue"""
|
| 75 |
+
while self.is_active:
|
| 76 |
+
try:
|
| 77 |
+
audio_data = self.audio_queue.get(block=True, timeout=0.1)
|
| 78 |
+
return audio_data
|
| 79 |
+
except queue.Empty:
|
| 80 |
+
await asyncio.sleep(0.1)
|
| 81 |
+
|
| 82 |
+
return None
|
| 83 |
+
|
| 84 |
+
def add_audio(self, audio_data):
|
| 85 |
+
"""Add audio data to the queue"""
|
| 86 |
+
# Convert gradio audio format to AudioChunk
|
| 87 |
+
sample_rate, audio_array = audio_data
|
| 88 |
+
audio_chunk = agents.AudioChunk(
|
| 89 |
+
samples=audio_array,
|
| 90 |
+
sample_rate=sample_rate,
|
| 91 |
+
is_last=False
|
| 92 |
+
)
|
| 93 |
+
self.audio_queue.put(audio_chunk)
|
| 94 |
+
|
| 95 |
+
def stop(self):
|
| 96 |
+
"""Stop the input device"""
|
| 97 |
+
self.is_active = False
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
class GradioOutputDevice(agents.OutputDevice):
|
| 101 |
+
"""Custom output device that works with Gradio"""
|
| 102 |
+
|
| 103 |
+
def __init__(self):
|
| 104 |
+
super().__init__()
|
| 105 |
+
self.output_queue = queue.Queue()
|
| 106 |
+
|
| 107 |
+
async def transmit(self, audio_chunk: agents.AudioChunk) -> None:
|
| 108 |
+
"""Transmit audio chunk to the queue"""
|
| 109 |
+
if audio_chunk is not None:
|
| 110 |
+
self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
|
| 111 |
+
|
| 112 |
+
def get_latest_audio(self):
|
| 113 |
+
"""Get the latest audio from the queue"""
|
| 114 |
+
try:
|
| 115 |
+
return self.output_queue.get(block=False)
|
| 116 |
+
except queue.Empty:
|
| 117 |
+
return None
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
class SpeechPathologistAssistant:
|
| 121 |
+
"""Speech pathologist assistant using LiveKit agents and Gradio"""
|
| 122 |
+
|
| 123 |
+
def __init__(self):
|
| 124 |
+
self.input_device = GradioInputDevice()
|
| 125 |
+
self.output_device = GradioOutputDevice()
|
| 126 |
+
self.assistant = None
|
| 127 |
+
self.assistant_task = None
|
| 128 |
+
self.transcript = []
|
| 129 |
+
self.is_running = False
|
| 130 |
+
|
| 131 |
+
async def initialize_assistant(self):
|
| 132 |
+
"""Initialize the speech assistant"""
|
| 133 |
+
self.assistant = agents.VoiceAssistant(
|
| 134 |
+
openai_client=openai_client,
|
| 135 |
+
model="gpt-4o",
|
| 136 |
+
voice="shimmer", # Using a friendly, professional voice
|
| 137 |
+
input_device=self.input_device,
|
| 138 |
+
output_device=self.output_device,
|
| 139 |
+
initial_message=SPEECH_PATHOLOGIST_PROMPT,
|
| 140 |
+
real_time=True, # Enable real-time processing
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
# Add transcript callback
|
| 144 |
+
self.assistant.on_transcript = self.on_transcript
|
| 145 |
+
self.assistant.on_response = self.on_response
|
| 146 |
+
|
| 147 |
+
def on_transcript(self, transcript):
|
| 148 |
+
"""Handle transcript from user"""
|
| 149 |
+
self.transcript.append(f"Student: {transcript.text}")
|
| 150 |
+
return True
|
| 151 |
+
|
| 152 |
+
def on_response(self, response):
|
| 153 |
+
"""Handle response from assistant"""
|
| 154 |
+
self.transcript.append(f"Speech Pathologist: {response.text}")
|
| 155 |
+
return True
|
| 156 |
+
|
| 157 |
+
async def start_assistant(self):
|
| 158 |
+
"""Start the assistant in a background task"""
|
| 159 |
+
if not self.assistant:
|
| 160 |
+
await self.initialize_assistant()
|
| 161 |
+
|
| 162 |
+
self.is_running = True
|
| 163 |
+
self.assistant_task = asyncio.create_task(self.assistant.run())
|
| 164 |
+
|
| 165 |
+
def stop_assistant(self):
|
| 166 |
+
"""Stop the assistant"""
|
| 167 |
+
if self.assistant_task and not self.assistant_task.done():
|
| 168 |
+
self.assistant_task.cancel()
|
| 169 |
+
|
| 170 |
+
self.input_device.stop()
|
| 171 |
+
self.is_running = False
|
| 172 |
+
|
| 173 |
+
def process_audio(self, audio):
|
| 174 |
+
"""Process audio from Gradio interface"""
|
| 175 |
+
if not self.is_running or audio is None:
|
| 176 |
+
return None, self.get_transcript()
|
| 177 |
+
|
| 178 |
+
self.input_device.add_audio(audio)
|
| 179 |
+
|
| 180 |
+
# Check for assistant output
|
| 181 |
+
output_audio = self.output_device.get_latest_audio()
|
| 182 |
+
|
| 183 |
+
return output_audio, self.get_transcript()
|
| 184 |
+
|
| 185 |
+
def get_transcript(self):
|
| 186 |
+
"""Get the current transcript"""
|
| 187 |
+
return "\n".join(self.transcript)
|
| 188 |
+
|
| 189 |
+
|
| 190 |
+
# Create the assistant instance
|
| 191 |
+
speech_assistant = SpeechPathologistAssistant()
|
| 192 |
+
|
| 193 |
+
|
| 194 |
+
def start_session():
|
| 195 |
+
"""Start the speech pathology session"""
|
| 196 |
+
asyncio.run(speech_assistant.start_assistant())
|
| 197 |
+
return "Session started. Please speak to begin the assessment."
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
def stop_session():
|
| 201 |
+
"""Stop the speech pathology session"""
|
| 202 |
+
speech_assistant.stop_assistant()
|
| 203 |
+
return "Session stopped."
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
def process_audio(audio):
|
| 207 |
+
"""Process audio from microphone"""
|
| 208 |
+
if audio is None:
|
| 209 |
+
return None, speech_assistant.get_transcript()
|
| 210 |
+
|
| 211 |
+
return speech_assistant.process_audio(audio)
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
# Create Gradio Interface
|
| 215 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 216 |
+
gr.Markdown("# Speech Pathology Assistant")
|
| 217 |
+
gr.Markdown("This tool provides speech therapy assessment based on the CASL-2 framework")
|
| 218 |
+
|
| 219 |
+
with gr.Row():
|
| 220 |
+
with gr.Column(scale=1):
|
| 221 |
+
start_button = gr.Button("Start Session")
|
| 222 |
+
stop_button = gr.Button("Stop Session")
|
| 223 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 224 |
+
|
| 225 |
+
with gr.Column(scale=2):
|
| 226 |
+
audio_input = gr.Audio(
|
| 227 |
+
label="Speak",
|
| 228 |
+
type="microphone",
|
| 229 |
+
streaming=True,
|
| 230 |
+
autoplay=True
|
| 231 |
+
)
|
| 232 |
+
audio_output = gr.Audio(label="Response")
|
| 233 |
+
|
| 234 |
+
with gr.Row():
|
| 235 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 236 |
+
|
| 237 |
+
# Setup event handlers
|
| 238 |
+
start_button.click(fn=start_session, outputs=status)
|
| 239 |
+
stop_button.click(fn=stop_session, outputs=status)
|
| 240 |
+
|
| 241 |
+
# Setup continuous audio processing
|
| 242 |
+
audio_input.stream(
|
| 243 |
+
fn=process_audio,
|
| 244 |
+
inputs=audio_input,
|
| 245 |
+
outputs=[audio_output, transcript]
|
| 246 |
+
)
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
def main():
|
| 250 |
+
"""Main function to launch the Gradio app"""
|
| 251 |
+
app.launch(share=True)
|
| 252 |
+
|
| 253 |
+
|
| 254 |
+
if __name__ == "__main__":
|
| 255 |
+
main()
|
app/huggingface_app.py
ADDED
|
@@ -0,0 +1,297 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Hugging Face Spaces deployment file for CASL Voice Bot
|
| 5 |
+
This file is specifically designed for deploying on Hugging Face Spaces
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import asyncio
|
| 10 |
+
import gradio as gr
|
| 11 |
+
import logging
|
| 12 |
+
import tempfile
|
| 13 |
+
import queue
|
| 14 |
+
import threading
|
| 15 |
+
import time
|
| 16 |
+
from dotenv import load_dotenv
|
| 17 |
+
from livekit import agents
|
| 18 |
+
from openai import AsyncOpenAI
|
| 19 |
+
|
| 20 |
+
# Load environment variables (will be set in HF Spaces secrets)
|
| 21 |
+
load_dotenv()
|
| 22 |
+
|
| 23 |
+
# Set up logging
|
| 24 |
+
logging.basicConfig(level=logging.INFO)
|
| 25 |
+
logger = logging.getLogger(__name__)
|
| 26 |
+
|
| 27 |
+
# Initialize OpenAI client with API key from environment
|
| 28 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 29 |
+
|
| 30 |
+
# Speech Pathologist Agent Prompt
|
| 31 |
+
SPEECH_PATHOLOGIST_PROMPT = """
|
| 32 |
+
You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
|
| 33 |
+
Your are working with a student with speech impediments typically with ASD
|
| 34 |
+
You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
|
| 35 |
+
Each domain from the CASL-2 framework can be analyzed using the sample:
|
| 36 |
+
Lexical/Semantic Skills:
|
| 37 |
+
This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
|
| 38 |
+
Key Subtests:
|
| 39 |
+
Antonyms: Identifying words with opposite meanings.
|
| 40 |
+
Synonyms: Identifying words with similar meanings.
|
| 41 |
+
Idiomatic Language: Understanding and interpreting idioms and figurative language.
|
| 42 |
+
Evaluate vocabulary diversity (type-token ratio).
|
| 43 |
+
Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
|
| 44 |
+
Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
|
| 45 |
+
Syntactic Skills:
|
| 46 |
+
This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
|
| 47 |
+
Key Subtests:
|
| 48 |
+
Sentence Expression: Producing grammatically correct sentences based on prompts.
|
| 49 |
+
Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
|
| 50 |
+
Examine sentence structure for grammatical accuracy.
|
| 51 |
+
Identify errors in verb tense, subject-verb agreement, or sentence complexity.
|
| 52 |
+
Note the use of clauses, conjunctions, and varied sentence types.
|
| 53 |
+
Supralinguistic Skills:
|
| 54 |
+
This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
|
| 55 |
+
Key Subtests:
|
| 56 |
+
Inferences: Understanding information that is not explicitly stated.
|
| 57 |
+
Meaning from Context: Deriving meaning from surrounding text or dialogue.
|
| 58 |
+
Nonliteral Language: Interpreting figurative language, such as metaphors or irony
|
| 59 |
+
Look for use or understanding of figurative language, idioms, or humor.
|
| 60 |
+
Assess ability to handle ambiguous or implied meanings in context.
|
| 61 |
+
Identify advanced language use for abstract or hypothetical ideas.
|
| 62 |
+
Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
|
| 63 |
+
This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
|
| 64 |
+
Key Subtests:
|
| 65 |
+
Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
|
| 66 |
+
|
| 67 |
+
Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
|
| 68 |
+
"""
|
| 69 |
+
|
| 70 |
+
# Custom audio processing for Gradio interface
|
| 71 |
+
class AudioProcessor:
|
| 72 |
+
def __init__(self):
|
| 73 |
+
self.transcript = []
|
| 74 |
+
self.is_active = False
|
| 75 |
+
self.voice_model = "shimmer" # Default voice
|
| 76 |
+
|
| 77 |
+
async def process_speech(self, audio_data, openai_client):
|
| 78 |
+
"""Process speech using OpenAI's API"""
|
| 79 |
+
if not self.is_active or audio_data is None:
|
| 80 |
+
return None, "\n".join(self.transcript)
|
| 81 |
+
|
| 82 |
+
# Prepare audio file for OpenAI
|
| 83 |
+
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
|
| 84 |
+
temp_file.close()
|
| 85 |
+
|
| 86 |
+
try:
|
| 87 |
+
# Save audio data to temporary file
|
| 88 |
+
sample_rate, audio_array = audio_data
|
| 89 |
+
import scipy.io.wavfile
|
| 90 |
+
scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
|
| 91 |
+
|
| 92 |
+
# Transcribe audio using OpenAI
|
| 93 |
+
with open(temp_file.name, "rb") as audio_file:
|
| 94 |
+
transcript_response = await openai_client.audio.transcriptions.create(
|
| 95 |
+
file=audio_file,
|
| 96 |
+
model="whisper-1"
|
| 97 |
+
)
|
| 98 |
+
|
| 99 |
+
user_text = transcript_response.text
|
| 100 |
+
if user_text.strip():
|
| 101 |
+
self.transcript.append(f"Student: {user_text}")
|
| 102 |
+
|
| 103 |
+
# Generate assistant response
|
| 104 |
+
chat_response = await openai_client.chat.completions.create(
|
| 105 |
+
model="gpt-4o",
|
| 106 |
+
messages=[
|
| 107 |
+
{"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
|
| 108 |
+
{"role": "user", "content": user_text}
|
| 109 |
+
]
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
assistant_text = chat_response.choices[0].message.content
|
| 113 |
+
self.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 114 |
+
|
| 115 |
+
# Generate speech from text
|
| 116 |
+
speech_response = await openai_client.audio.speech.create(
|
| 117 |
+
model="tts-1",
|
| 118 |
+
voice=self.voice_model,
|
| 119 |
+
input=assistant_text
|
| 120 |
+
)
|
| 121 |
+
|
| 122 |
+
# Save speech to temporary file
|
| 123 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 124 |
+
response_temp_file.close()
|
| 125 |
+
|
| 126 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 127 |
+
|
| 128 |
+
# Load audio data for Gradio
|
| 129 |
+
import soundfile as sf
|
| 130 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 131 |
+
|
| 132 |
+
# Clean up
|
| 133 |
+
os.unlink(response_temp_file.name)
|
| 134 |
+
|
| 135 |
+
return (sample_rate, audio_data), "\n".join(self.transcript)
|
| 136 |
+
|
| 137 |
+
except Exception as e:
|
| 138 |
+
logger.error(f"Error processing speech: {e}")
|
| 139 |
+
self.transcript.append(f"Error: {str(e)}")
|
| 140 |
+
finally:
|
| 141 |
+
# Clean up temp file
|
| 142 |
+
os.unlink(temp_file.name)
|
| 143 |
+
|
| 144 |
+
return None, "\n".join(self.transcript)
|
| 145 |
+
|
| 146 |
+
def start_session(self, voice_model):
|
| 147 |
+
"""Start a new session"""
|
| 148 |
+
self.is_active = True
|
| 149 |
+
self.voice_model = voice_model if voice_model else "shimmer"
|
| 150 |
+
self.transcript = []
|
| 151 |
+
self.transcript.append("Session started. The AI Speech Pathologist will speak first.")
|
| 152 |
+
return "Session active. Please wait for the AI to introduce itself."
|
| 153 |
+
|
| 154 |
+
def stop_session(self):
|
| 155 |
+
"""Stop the current session"""
|
| 156 |
+
self.is_active = False
|
| 157 |
+
return "Session stopped."
|
| 158 |
+
|
| 159 |
+
|
| 160 |
+
# Create the audio processor instance
|
| 161 |
+
audio_processor = AudioProcessor()
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
async def start_session(voice_model):
|
| 165 |
+
"""Start the speech pathology session"""
|
| 166 |
+
status = audio_processor.start_session(voice_model)
|
| 167 |
+
|
| 168 |
+
# Generate initial AI introduction
|
| 169 |
+
try:
|
| 170 |
+
# Generate assistant response
|
| 171 |
+
chat_response = await openai_client.chat.completions.create(
|
| 172 |
+
model="gpt-4o",
|
| 173 |
+
messages=[
|
| 174 |
+
{"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
|
| 175 |
+
{"role": "user", "content": "Hello"} # Initial trigger
|
| 176 |
+
]
|
| 177 |
+
)
|
| 178 |
+
|
| 179 |
+
assistant_text = chat_response.choices[0].message.content
|
| 180 |
+
audio_processor.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 181 |
+
|
| 182 |
+
# Generate speech from text
|
| 183 |
+
speech_response = await openai_client.audio.speech.create(
|
| 184 |
+
model="tts-1",
|
| 185 |
+
voice=audio_processor.voice_model,
|
| 186 |
+
input=assistant_text
|
| 187 |
+
)
|
| 188 |
+
|
| 189 |
+
# Save speech to temporary file
|
| 190 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 191 |
+
response_temp_file.close()
|
| 192 |
+
|
| 193 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 194 |
+
|
| 195 |
+
# Load audio data for Gradio
|
| 196 |
+
import soundfile as sf
|
| 197 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 198 |
+
|
| 199 |
+
# Clean up
|
| 200 |
+
os.unlink(response_temp_file.name)
|
| 201 |
+
|
| 202 |
+
return status, (sample_rate, audio_data), "\n".join(audio_processor.transcript)
|
| 203 |
+
|
| 204 |
+
except Exception as e:
|
| 205 |
+
logger.error(f"Error starting session: {e}")
|
| 206 |
+
audio_processor.transcript.append(f"Error: {str(e)}")
|
| 207 |
+
return status, None, "\n".join(audio_processor.transcript)
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
def stop_session():
|
| 211 |
+
"""Stop the speech pathology session"""
|
| 212 |
+
return audio_processor.stop_session(), None, "\n".join(audio_processor.transcript)
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
async def process_mic_input(audio, progress=gr.Progress()):
|
| 216 |
+
"""Process microphone input"""
|
| 217 |
+
if audio is None or not audio_processor.is_active:
|
| 218 |
+
return None, "\n".join(audio_processor.transcript)
|
| 219 |
+
|
| 220 |
+
progress(0, desc="Processing speech...")
|
| 221 |
+
audio_output, transcript = await audio_processor.process_speech(audio, openai_client)
|
| 222 |
+
progress(1, desc="Done")
|
| 223 |
+
return audio_output, transcript
|
| 224 |
+
|
| 225 |
+
|
| 226 |
+
# Create Gradio Interface
|
| 227 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 228 |
+
gr.Markdown("# CASL-2 Speech Pathology Assistant")
|
| 229 |
+
gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
|
| 230 |
+
|
| 231 |
+
with gr.Row():
|
| 232 |
+
with gr.Column(scale=1):
|
| 233 |
+
voice_select = gr.Dropdown(
|
| 234 |
+
["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
| 235 |
+
value="shimmer",
|
| 236 |
+
label="Assistant Voice"
|
| 237 |
+
)
|
| 238 |
+
start_button = gr.Button("Start Session", variant="primary")
|
| 239 |
+
stop_button = gr.Button("Stop Session", variant="stop")
|
| 240 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 241 |
+
|
| 242 |
+
with gr.Column(scale=2):
|
| 243 |
+
audio_output = gr.Audio(label="AI Speech", autoplay=True)
|
| 244 |
+
audio_input = gr.Audio(
|
| 245 |
+
label="Speak to the AI",
|
| 246 |
+
type="microphone",
|
| 247 |
+
source="microphone",
|
| 248 |
+
streaming=True
|
| 249 |
+
)
|
| 250 |
+
|
| 251 |
+
with gr.Row():
|
| 252 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 253 |
+
|
| 254 |
+
with gr.Accordion("About This Application", open=False):
|
| 255 |
+
gr.Markdown("""
|
| 256 |
+
### About CASL-2 Speech Pathology Assistant
|
| 257 |
+
|
| 258 |
+
This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
|
| 259 |
+
|
| 260 |
+
- **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
|
| 261 |
+
- **Syntactic Skills**: Grammar and sentence structure
|
| 262 |
+
- **Supralinguistic Skills**: Higher-level language beyond literal meanings
|
| 263 |
+
- **Pragmatic Skills**: Social use of language (less emphasis for younger students)
|
| 264 |
+
|
| 265 |
+
The AI will provide structured assessments and exercises to help evaluate speech patterns.
|
| 266 |
+
|
| 267 |
+
### How to Use
|
| 268 |
+
|
| 269 |
+
1. Select the AI voice you prefer
|
| 270 |
+
2. Click "Start Session" to begin
|
| 271 |
+
3. The AI will introduce itself and begin the assessment
|
| 272 |
+
4. Speak into your microphone when it's your turn
|
| 273 |
+
5. View the transcript to track the conversation
|
| 274 |
+
6. Click "Stop Session" when finished
|
| 275 |
+
|
| 276 |
+
### For Speech-Language Pathologists
|
| 277 |
+
|
| 278 |
+
This tool is designed to supplement, not replace, professional SLP services. The source code is available for customization to meet specific assessment needs.
|
| 279 |
+
""")
|
| 280 |
+
|
| 281 |
+
# Setup event handlers
|
| 282 |
+
start_button.click(
|
| 283 |
+
fn=lambda voice: asyncio.run(start_session(voice)),
|
| 284 |
+
inputs=voice_select,
|
| 285 |
+
outputs=[status, audio_output, transcript]
|
| 286 |
+
)
|
| 287 |
+
stop_button.click(fn=stop_session, outputs=[status, audio_output, transcript])
|
| 288 |
+
|
| 289 |
+
# Setup audio processing
|
| 290 |
+
audio_input.stream(
|
| 291 |
+
fn=lambda audio: asyncio.run(process_mic_input(audio)),
|
| 292 |
+
inputs=audio_input,
|
| 293 |
+
outputs=[audio_output, transcript]
|
| 294 |
+
)
|
| 295 |
+
|
| 296 |
+
# Launch the app
|
| 297 |
+
app.launch(share=True)
|
app/main.py
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
import os
|
| 4 |
+
import asyncio
|
| 5 |
+
import logging
|
| 6 |
+
from dotenv import load_dotenv
|
| 7 |
+
from livekit import agents
|
| 8 |
+
from livekit.agents import InputDevice, OutputDevice
|
| 9 |
+
from openai import AsyncOpenAI
|
| 10 |
+
|
| 11 |
+
# Load environment variables
|
| 12 |
+
load_dotenv()
|
| 13 |
+
|
| 14 |
+
# Set up logging
|
| 15 |
+
logging.basicConfig(level=logging.INFO)
|
| 16 |
+
logger = logging.getLogger(__name__)
|
| 17 |
+
|
| 18 |
+
# Initialize OpenAI client
|
| 19 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 20 |
+
|
| 21 |
+
# Speech Pathologist Agent Prompt
|
| 22 |
+
SPEECH_PATHOLOGIST_PROMPT = """
|
| 23 |
+
You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
|
| 24 |
+
Your are working with a student with speech impediments typically with ASD
|
| 25 |
+
You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
|
| 26 |
+
Each domain from the CASL-2 framework can be analyzed using the sample:
|
| 27 |
+
Lexical/Semantic Skills:
|
| 28 |
+
This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
|
| 29 |
+
Key Subtests:
|
| 30 |
+
Antonyms: Identifying words with opposite meanings.
|
| 31 |
+
Synonyms: Identifying words with similar meanings.
|
| 32 |
+
Idiomatic Language: Understanding and interpreting idioms and figurative language.
|
| 33 |
+
Evaluate vocabulary diversity (type-token ratio).
|
| 34 |
+
Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
|
| 35 |
+
Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
|
| 36 |
+
Syntactic Skills:
|
| 37 |
+
This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
|
| 38 |
+
Key Subtests:
|
| 39 |
+
Sentence Expression: Producing grammatically correct sentences based on prompts.
|
| 40 |
+
Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
|
| 41 |
+
Examine sentence structure for grammatical accuracy.
|
| 42 |
+
Identify errors in verb tense, subject-verb agreement, or sentence complexity.
|
| 43 |
+
Note the use of clauses, conjunctions, and varied sentence types.
|
| 44 |
+
Supralinguistic Skills:
|
| 45 |
+
This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
|
| 46 |
+
Key Subtests:
|
| 47 |
+
Inferences: Understanding information that is not explicitly stated.
|
| 48 |
+
Meaning from Context: Deriving meaning from surrounding text or dialogue.
|
| 49 |
+
Nonliteral Language: Interpreting figurative language, such as metaphors or irony
|
| 50 |
+
Look for use or understanding of figurative language, idioms, or humor.
|
| 51 |
+
Assess ability to handle ambiguous or implied meanings in context.
|
| 52 |
+
Identify advanced language use for abstract or hypothetical ideas.
|
| 53 |
+
Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
|
| 54 |
+
This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
|
| 55 |
+
Key Subtests:
|
| 56 |
+
Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
|
| 57 |
+
|
| 58 |
+
Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
|
| 59 |
+
"""
|
| 60 |
+
|
| 61 |
+
async def run_speech_pathology_session(input_device: InputDevice, output_device: OutputDevice):
|
| 62 |
+
"""
|
| 63 |
+
Run the speech pathology session using LiveKit agents and OpenAI
|
| 64 |
+
"""
|
| 65 |
+
logger.info("Starting speech pathology session")
|
| 66 |
+
|
| 67 |
+
# Create the speech assistant
|
| 68 |
+
assistant = agents.VoiceAssistant(
|
| 69 |
+
openai_client=openai_client,
|
| 70 |
+
model="gpt-4o",
|
| 71 |
+
voice="shimmer", # Using a friendly, professional voice
|
| 72 |
+
input_device=input_device,
|
| 73 |
+
output_device=output_device,
|
| 74 |
+
initial_message=SPEECH_PATHOLOGIST_PROMPT,
|
| 75 |
+
real_time=True, # Enable real-time processing
|
| 76 |
+
)
|
| 77 |
+
|
| 78 |
+
# Run the assistant
|
| 79 |
+
await assistant.run()
|
| 80 |
+
|
| 81 |
+
async def main():
|
| 82 |
+
"""
|
| 83 |
+
Main function to run the voice assistant.
|
| 84 |
+
"""
|
| 85 |
+
logger.info("Initializing speech pathology assistant")
|
| 86 |
+
|
| 87 |
+
# Create devices
|
| 88 |
+
input_device = agents.BasicInputDevice()
|
| 89 |
+
output_device = agents.BasicOutputDevice()
|
| 90 |
+
|
| 91 |
+
try:
|
| 92 |
+
# Run the session
|
| 93 |
+
await run_speech_pathology_session(input_device, output_device)
|
| 94 |
+
except Exception as e:
|
| 95 |
+
logger.error(f"Error in speech pathology session: {e}")
|
| 96 |
+
finally:
|
| 97 |
+
logger.info("Speech pathology session ended")
|
| 98 |
+
|
| 99 |
+
if __name__ == "__main__":
|
| 100 |
+
asyncio.run(main())
|
app/report_generator.py
ADDED
|
@@ -0,0 +1,285 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Report generator for CASL Voice Bot.
|
| 5 |
+
This module generates assessment reports based on session data.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import json
|
| 10 |
+
import pandas as pd
|
| 11 |
+
import matplotlib.pyplot as plt
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
from datetime import datetime
|
| 14 |
+
import jinja2
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
class CASLReportGenerator:
|
| 18 |
+
"""Generates reports from session data"""
|
| 19 |
+
|
| 20 |
+
def __init__(self, session_data_dir="session_data", reports_dir="reports"):
|
| 21 |
+
"""Initialize the report generator"""
|
| 22 |
+
self.session_data_dir = session_data_dir
|
| 23 |
+
self.reports_dir = reports_dir
|
| 24 |
+
|
| 25 |
+
# Create directories if they don't exist
|
| 26 |
+
Path(session_data_dir).mkdir(exist_ok=True)
|
| 27 |
+
Path(reports_dir).mkdir(exist_ok=True)
|
| 28 |
+
|
| 29 |
+
# Set up Jinja2 template environment
|
| 30 |
+
self.template_loader = jinja2.FileSystemLoader(searchpath="./templates")
|
| 31 |
+
self.template_env = jinja2.Environment(loader=self.template_loader)
|
| 32 |
+
|
| 33 |
+
def load_session_data(self, filename=None, student_id=None):
|
| 34 |
+
"""Load session data from file or by student ID"""
|
| 35 |
+
if filename:
|
| 36 |
+
with open(os.path.join(self.session_data_dir, filename), 'r') as f:
|
| 37 |
+
return json.load(f)
|
| 38 |
+
|
| 39 |
+
elif student_id:
|
| 40 |
+
# Find all files for this student
|
| 41 |
+
files = [f for f in os.listdir(self.session_data_dir)
|
| 42 |
+
if f.startswith(f"{student_id}_") and f.endswith(".json")]
|
| 43 |
+
|
| 44 |
+
if not files:
|
| 45 |
+
return None
|
| 46 |
+
|
| 47 |
+
# Sort by date (newest first) and load the most recent
|
| 48 |
+
files.sort(reverse=True)
|
| 49 |
+
with open(os.path.join(self.session_data_dir, files[0]), 'r') as f:
|
| 50 |
+
return json.load(f)
|
| 51 |
+
|
| 52 |
+
return None
|
| 53 |
+
|
| 54 |
+
def load_all_student_sessions(self, student_id):
|
| 55 |
+
"""Load all sessions for a specific student"""
|
| 56 |
+
files = [f for f in os.listdir(self.session_data_dir)
|
| 57 |
+
if f.startswith(f"{student_id}_") and f.endswith(".json")]
|
| 58 |
+
|
| 59 |
+
sessions = []
|
| 60 |
+
for file in sorted(files):
|
| 61 |
+
with open(os.path.join(self.session_data_dir, file), 'r') as f:
|
| 62 |
+
sessions.append(json.load(f))
|
| 63 |
+
|
| 64 |
+
return sessions
|
| 65 |
+
|
| 66 |
+
def extract_casl_metrics(self, session_data):
|
| 67 |
+
"""Extract CASL-2 metrics from session data"""
|
| 68 |
+
metrics = {
|
| 69 |
+
"lexical_semantic": 0,
|
| 70 |
+
"syntactic": 0,
|
| 71 |
+
"supralinguistic": 0,
|
| 72 |
+
"pragmatic": 0
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
# Count assessment notes per category
|
| 76 |
+
assessment = session_data.get("assessment", {})
|
| 77 |
+
for category, notes in assessment.items():
|
| 78 |
+
if category in metrics:
|
| 79 |
+
metrics[category] = len(notes)
|
| 80 |
+
|
| 81 |
+
return metrics
|
| 82 |
+
|
| 83 |
+
def generate_progress_chart(self, student_id, output_path=None):
|
| 84 |
+
"""Generate a progress chart for a student"""
|
| 85 |
+
sessions = self.load_all_student_sessions(student_id)
|
| 86 |
+
|
| 87 |
+
if not sessions:
|
| 88 |
+
return None
|
| 89 |
+
|
| 90 |
+
# Extract dates and metrics
|
| 91 |
+
dates = []
|
| 92 |
+
metrics = {
|
| 93 |
+
"lexical_semantic": [],
|
| 94 |
+
"syntactic": [],
|
| 95 |
+
"supralinguistic": [],
|
| 96 |
+
"pragmatic": []
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
for session in sessions:
|
| 100 |
+
dates.append(datetime.fromisoformat(session["timestamp"]).strftime("%m/%d/%Y"))
|
| 101 |
+
session_metrics = self.extract_casl_metrics(session)
|
| 102 |
+
|
| 103 |
+
for category in metrics:
|
| 104 |
+
metrics[category].append(session_metrics.get(category, 0))
|
| 105 |
+
|
| 106 |
+
# Create chart
|
| 107 |
+
plt.figure(figsize=(10, 6))
|
| 108 |
+
for category, values in metrics.items():
|
| 109 |
+
plt.plot(dates, values, marker='o', label=category.replace('_', ' ').title())
|
| 110 |
+
|
| 111 |
+
plt.title(f"CASL-2 Assessment Progress for Student {student_id}")
|
| 112 |
+
plt.xlabel("Session Date")
|
| 113 |
+
plt.ylabel("Assessment Score")
|
| 114 |
+
plt.legend()
|
| 115 |
+
plt.xticks(rotation=45)
|
| 116 |
+
plt.tight_layout()
|
| 117 |
+
|
| 118 |
+
# Save or return
|
| 119 |
+
if output_path:
|
| 120 |
+
plt.savefig(output_path)
|
| 121 |
+
return output_path
|
| 122 |
+
else:
|
| 123 |
+
chart_path = os.path.join(self.reports_dir, f"{student_id}_progress.png")
|
| 124 |
+
plt.savefig(chart_path)
|
| 125 |
+
return chart_path
|
| 126 |
+
|
| 127 |
+
def generate_session_summary(self, session_data):
|
| 128 |
+
"""Generate a summary of a single session"""
|
| 129 |
+
if not session_data:
|
| 130 |
+
return None
|
| 131 |
+
|
| 132 |
+
# Extract basic info
|
| 133 |
+
timestamp = datetime.fromisoformat(session_data["timestamp"])
|
| 134 |
+
student_id = session_data.get("student_id", "anonymous")
|
| 135 |
+
|
| 136 |
+
# Extract transcript
|
| 137 |
+
transcript = session_data.get("transcript", [])
|
| 138 |
+
|
| 139 |
+
# Calculate metrics
|
| 140 |
+
word_count = 0
|
| 141 |
+
student_turns = 0
|
| 142 |
+
|
| 143 |
+
for entry in transcript:
|
| 144 |
+
if entry.get("speaker") == "Student":
|
| 145 |
+
text = entry.get("text", "")
|
| 146 |
+
words = text.split()
|
| 147 |
+
word_count += len(words)
|
| 148 |
+
student_turns += 1
|
| 149 |
+
|
| 150 |
+
# Get CASL-2 metrics
|
| 151 |
+
casl_metrics = self.extract_casl_metrics(session_data)
|
| 152 |
+
|
| 153 |
+
# Create summary
|
| 154 |
+
summary = {
|
| 155 |
+
"date": timestamp.strftime("%m/%d/%Y"),
|
| 156 |
+
"time": timestamp.strftime("%H:%M"),
|
| 157 |
+
"student_id": student_id,
|
| 158 |
+
"duration_minutes": len(transcript) // 2, # Approximate based on turns
|
| 159 |
+
"student_turns": student_turns,
|
| 160 |
+
"total_words": word_count,
|
| 161 |
+
"average_words_per_turn": word_count / max(1, student_turns),
|
| 162 |
+
"casl_metrics": casl_metrics
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
return summary
|
| 166 |
+
|
| 167 |
+
def generate_html_report(self, student_id, output_path=None):
|
| 168 |
+
"""Generate an HTML report for a student"""
|
| 169 |
+
# Load all sessions for the student
|
| 170 |
+
sessions = self.load_all_student_sessions(student_id)
|
| 171 |
+
|
| 172 |
+
if not sessions:
|
| 173 |
+
return None
|
| 174 |
+
|
| 175 |
+
# Generate progress chart
|
| 176 |
+
chart_path = self.generate_progress_chart(student_id)
|
| 177 |
+
|
| 178 |
+
# Get latest session data
|
| 179 |
+
latest_session = sessions[-1]
|
| 180 |
+
latest_summary = self.generate_session_summary(latest_session)
|
| 181 |
+
|
| 182 |
+
# Calculate overall progress
|
| 183 |
+
if len(sessions) > 1:
|
| 184 |
+
first_metrics = self.extract_casl_metrics(sessions[0])
|
| 185 |
+
latest_metrics = self.extract_casl_metrics(sessions[-1])
|
| 186 |
+
|
| 187 |
+
progress = {}
|
| 188 |
+
for category in first_metrics:
|
| 189 |
+
if first_metrics[category] > 0:
|
| 190 |
+
progress[category] = (latest_metrics[category] - first_metrics[category]) / first_metrics[category]
|
| 191 |
+
else:
|
| 192 |
+
progress[category] = 0 if latest_metrics[category] == 0 else 1
|
| 193 |
+
else:
|
| 194 |
+
progress = {category: 0 for category in latest_summary["casl_metrics"]}
|
| 195 |
+
|
| 196 |
+
# Prepare report data
|
| 197 |
+
report_data = {
|
| 198 |
+
"student_id": student_id,
|
| 199 |
+
"report_date": datetime.now().strftime("%m/%d/%Y"),
|
| 200 |
+
"session_count": len(sessions),
|
| 201 |
+
"latest_session": latest_summary,
|
| 202 |
+
"progress": progress,
|
| 203 |
+
"chart_path": os.path.basename(chart_path),
|
| 204 |
+
"recommendations": self.generate_recommendations(sessions)
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
# Load and render template
|
| 208 |
+
try:
|
| 209 |
+
template = self.template_env.get_template("report_template.html")
|
| 210 |
+
report_html = template.render(**report_data)
|
| 211 |
+
|
| 212 |
+
# Save report
|
| 213 |
+
if not output_path:
|
| 214 |
+
output_path = os.path.join(self.reports_dir, f"{student_id}_report.html")
|
| 215 |
+
|
| 216 |
+
with open(output_path, 'w') as f:
|
| 217 |
+
f.write(report_html)
|
| 218 |
+
|
| 219 |
+
return output_path
|
| 220 |
+
|
| 221 |
+
except jinja2.exceptions.TemplateNotFound:
|
| 222 |
+
# Create a simple report if template is not found
|
| 223 |
+
report = f"CASL-2 Assessment Report for Student {student_id}\n"
|
| 224 |
+
report += f"Report Date: {report_data['report_date']}\n"
|
| 225 |
+
report += f"Total Sessions: {report_data['session_count']}\n\n"
|
| 226 |
+
|
| 227 |
+
report += "Latest Session Summary:\n"
|
| 228 |
+
for key, value in latest_summary.items():
|
| 229 |
+
if key != "casl_metrics":
|
| 230 |
+
report += f" {key}: {value}\n"
|
| 231 |
+
|
| 232 |
+
report += "\nCASL-2 Metrics:\n"
|
| 233 |
+
for category, value in latest_summary["casl_metrics"].items():
|
| 234 |
+
report += f" {category}: {value}\n"
|
| 235 |
+
|
| 236 |
+
report += "\nRecommendations:\n"
|
| 237 |
+
for rec in report_data["recommendations"]:
|
| 238 |
+
report += f" - {rec}\n"
|
| 239 |
+
|
| 240 |
+
# Save simple report
|
| 241 |
+
if not output_path:
|
| 242 |
+
output_path = os.path.join(self.reports_dir, f"{student_id}_report.txt")
|
| 243 |
+
|
| 244 |
+
with open(output_path, 'w') as f:
|
| 245 |
+
f.write(report)
|
| 246 |
+
|
| 247 |
+
return output_path
|
| 248 |
+
|
| 249 |
+
def generate_recommendations(self, sessions):
|
| 250 |
+
"""Generate recommendations based on session data"""
|
| 251 |
+
if not sessions:
|
| 252 |
+
return []
|
| 253 |
+
|
| 254 |
+
latest_session = sessions[-1]
|
| 255 |
+
metrics = self.extract_casl_metrics(latest_session)
|
| 256 |
+
|
| 257 |
+
recommendations = []
|
| 258 |
+
|
| 259 |
+
# Check for areas needing improvement
|
| 260 |
+
weak_areas = [category for category, value in metrics.items() if value < 2]
|
| 261 |
+
for area in weak_areas:
|
| 262 |
+
if area == "lexical_semantic":
|
| 263 |
+
recommendations.append("Focus on vocabulary building exercises such as synonyms, antonyms, and word associations")
|
| 264 |
+
elif area == "syntactic":
|
| 265 |
+
recommendations.append("Practice sentence formation and grammar through structured activities")
|
| 266 |
+
elif area == "supralinguistic":
|
| 267 |
+
recommendations.append("Work on understanding figurative language and making inferences from context")
|
| 268 |
+
elif area == "pragmatic":
|
| 269 |
+
recommendations.append("Engage in role-playing activities to practice social communication skills")
|
| 270 |
+
|
| 271 |
+
# Add general recommendations
|
| 272 |
+
if len(sessions) > 1:
|
| 273 |
+
recommendations.append("Continue regular assessment sessions to track progress")
|
| 274 |
+
|
| 275 |
+
if not recommendations:
|
| 276 |
+
recommendations.append("Continue current therapy approach as all areas show adequate progress")
|
| 277 |
+
|
| 278 |
+
return recommendations
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
# This module can be used to generate reports from the session data collected by the CASL Voice Bot
|
| 282 |
+
if __name__ == "__main__":
|
| 283 |
+
# Example usage
|
| 284 |
+
report_gen = CASLReportGenerator()
|
| 285 |
+
# report_gen.generate_html_report("student123")
|
app/requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
python-dotenv>=1.0.0
|
| 2 |
+
livekit-agents>=0.7.0
|
| 3 |
+
openai>=1.3.0
|
| 4 |
+
gradio>=4.0.0
|
| 5 |
+
asyncio>=3.4.3
|
| 6 |
+
numpy>=1.24.0
|
| 7 |
+
soundfile>=0.12.1
|
app/templates/report_template.html
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>CASL-2 Assessment Report</title>
|
| 7 |
+
<style>
|
| 8 |
+
body {
|
| 9 |
+
font-family: Arial, sans-serif;
|
| 10 |
+
line-height: 1.6;
|
| 11 |
+
margin: 0;
|
| 12 |
+
padding: 20px;
|
| 13 |
+
color: #333;
|
| 14 |
+
}
|
| 15 |
+
.report-header {
|
| 16 |
+
text-align: center;
|
| 17 |
+
margin-bottom: 30px;
|
| 18 |
+
border-bottom: 2px solid #2c3e50;
|
| 19 |
+
padding-bottom: 20px;
|
| 20 |
+
}
|
| 21 |
+
.report-section {
|
| 22 |
+
margin-bottom: 30px;
|
| 23 |
+
padding: 20px;
|
| 24 |
+
background-color: #f8f9fa;
|
| 25 |
+
border-radius: 5px;
|
| 26 |
+
}
|
| 27 |
+
.metrics-grid {
|
| 28 |
+
display: grid;
|
| 29 |
+
grid-template-columns: 1fr 1fr;
|
| 30 |
+
gap: 15px;
|
| 31 |
+
}
|
| 32 |
+
.metric-card {
|
| 33 |
+
background-color: white;
|
| 34 |
+
padding: 15px;
|
| 35 |
+
border-radius: 5px;
|
| 36 |
+
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
|
| 37 |
+
}
|
| 38 |
+
.metric-title {
|
| 39 |
+
font-weight: bold;
|
| 40 |
+
color: #2c3e50;
|
| 41 |
+
margin-bottom: 5px;
|
| 42 |
+
}
|
| 43 |
+
.metric-value {
|
| 44 |
+
font-size: 1.2em;
|
| 45 |
+
color: #3498db;
|
| 46 |
+
}
|
| 47 |
+
.progress-chart {
|
| 48 |
+
width: 100%;
|
| 49 |
+
max-width: 800px;
|
| 50 |
+
margin: 20px auto;
|
| 51 |
+
text-align: center;
|
| 52 |
+
}
|
| 53 |
+
.recommendations {
|
| 54 |
+
background-color: #e8f4fd;
|
| 55 |
+
padding: 20px;
|
| 56 |
+
border-left: 4px solid #3498db;
|
| 57 |
+
margin-top: 20px;
|
| 58 |
+
}
|
| 59 |
+
.recommendation-item {
|
| 60 |
+
margin-bottom: 10px;
|
| 61 |
+
}
|
| 62 |
+
.footer {
|
| 63 |
+
text-align: center;
|
| 64 |
+
margin-top: 40px;
|
| 65 |
+
font-size: 0.9em;
|
| 66 |
+
color: #7f8c8d;
|
| 67 |
+
padding-top: 20px;
|
| 68 |
+
border-top: 1px solid #ecf0f1;
|
| 69 |
+
}
|
| 70 |
+
@media print {
|
| 71 |
+
body {
|
| 72 |
+
padding: 0;
|
| 73 |
+
}
|
| 74 |
+
.report-section {
|
| 75 |
+
break-inside: avoid;
|
| 76 |
+
}
|
| 77 |
+
}
|
| 78 |
+
</style>
|
| 79 |
+
</head>
|
| 80 |
+
<body>
|
| 81 |
+
<div class="report-header">
|
| 82 |
+
<h1>CASL-2 Assessment Report</h1>
|
| 83 |
+
<h2>Student ID: {{ student_id }}</h2>
|
| 84 |
+
<p>Report Generated: {{ report_date }}</p>
|
| 85 |
+
</div>
|
| 86 |
+
|
| 87 |
+
<div class="report-section">
|
| 88 |
+
<h2>Assessment Summary</h2>
|
| 89 |
+
<div class="metrics-grid">
|
| 90 |
+
<div class="metric-card">
|
| 91 |
+
<div class="metric-title">Total Sessions</div>
|
| 92 |
+
<div class="metric-value">{{ session_count }}</div>
|
| 93 |
+
</div>
|
| 94 |
+
<div class="metric-card">
|
| 95 |
+
<div class="metric-title">Latest Session Date</div>
|
| 96 |
+
<div class="metric-value">{{ latest_session.date }}</div>
|
| 97 |
+
</div>
|
| 98 |
+
<div class="metric-card">
|
| 99 |
+
<div class="metric-title">Student Turns</div>
|
| 100 |
+
<div class="metric-value">{{ latest_session.student_turns }}</div>
|
| 101 |
+
</div>
|
| 102 |
+
<div class="metric-card">
|
| 103 |
+
<div class="metric-title">Total Words</div>
|
| 104 |
+
<div class="metric-value">{{ latest_session.total_words }}</div>
|
| 105 |
+
</div>
|
| 106 |
+
<div class="metric-card">
|
| 107 |
+
<div class="metric-title">Words Per Turn</div>
|
| 108 |
+
<div class="metric-value">{{ "%.1f"|format(latest_session.average_words_per_turn) }}</div>
|
| 109 |
+
</div>
|
| 110 |
+
<div class="metric-card">
|
| 111 |
+
<div class="metric-title">Session Duration</div>
|
| 112 |
+
<div class="metric-value">{{ latest_session.duration_minutes }} minutes</div>
|
| 113 |
+
</div>
|
| 114 |
+
</div>
|
| 115 |
+
</div>
|
| 116 |
+
|
| 117 |
+
<div class="report-section">
|
| 118 |
+
<h2>CASL-2 Domain Assessment</h2>
|
| 119 |
+
<div class="metrics-grid">
|
| 120 |
+
<div class="metric-card">
|
| 121 |
+
<div class="metric-title">Lexical/Semantic Skills</div>
|
| 122 |
+
<div class="metric-value">{{ latest_session.casl_metrics.lexical_semantic }}</div>
|
| 123 |
+
<div>
|
| 124 |
+
{% if progress.lexical_semantic > 0 %}
|
| 125 |
+
<span style="color: green;">↑ {{ "%.1f"|format(progress.lexical_semantic * 100) }}% improvement</span>
|
| 126 |
+
{% elif progress.lexical_semantic < 0 %}
|
| 127 |
+
<span style="color: red;">↓ {{ "%.1f"|format(progress.lexical_semantic * -100) }}% decrease</span>
|
| 128 |
+
{% else %}
|
| 129 |
+
<span style="color: orange;">→ No change</span>
|
| 130 |
+
{% endif %}
|
| 131 |
+
</div>
|
| 132 |
+
</div>
|
| 133 |
+
<div class="metric-card">
|
| 134 |
+
<div class="metric-title">Syntactic Skills</div>
|
| 135 |
+
<div class="metric-value">{{ latest_session.casl_metrics.syntactic }}</div>
|
| 136 |
+
<div>
|
| 137 |
+
{% if progress.syntactic > 0 %}
|
| 138 |
+
<span style="color: green;">↑ {{ "%.1f"|format(progress.syntactic * 100) }}% improvement</span>
|
| 139 |
+
{% elif progress.syntactic < 0 %}
|
| 140 |
+
<span style="color: red;">↓ {{ "%.1f"|format(progress.syntactic * -100) }}% decrease</span>
|
| 141 |
+
{% else %}
|
| 142 |
+
<span style="color: orange;">→ No change</span>
|
| 143 |
+
{% endif %}
|
| 144 |
+
</div>
|
| 145 |
+
</div>
|
| 146 |
+
<div class="metric-card">
|
| 147 |
+
<div class="metric-title">Supralinguistic Skills</div>
|
| 148 |
+
<div class="metric-value">{{ latest_session.casl_metrics.supralinguistic }}</div>
|
| 149 |
+
<div>
|
| 150 |
+
{% if progress.supralinguistic > 0 %}
|
| 151 |
+
<span style="color: green;">↑ {{ "%.1f"|format(progress.supralinguistic * 100) }}% improvement</span>
|
| 152 |
+
{% elif progress.supralinguistic < 0 %}
|
| 153 |
+
<span style="color: red;">↓ {{ "%.1f"|format(progress.supralinguistic * -100) }}% decrease</span>
|
| 154 |
+
{% else %}
|
| 155 |
+
<span style="color: orange;">→ No change</span>
|
| 156 |
+
{% endif %}
|
| 157 |
+
</div>
|
| 158 |
+
</div>
|
| 159 |
+
<div class="metric-card">
|
| 160 |
+
<div class="metric-title">Pragmatic Skills</div>
|
| 161 |
+
<div class="metric-value">{{ latest_session.casl_metrics.pragmatic }}</div>
|
| 162 |
+
<div>
|
| 163 |
+
{% if progress.pragmatic > 0 %}
|
| 164 |
+
<span style="color: green;">↑ {{ "%.1f"|format(progress.pragmatic * 100) }}% improvement</span>
|
| 165 |
+
{% elif progress.pragmatic < 0 %}
|
| 166 |
+
<span style="color: red;">↓ {{ "%.1f"|format(progress.pragmatic * -100) }}% decrease</span>
|
| 167 |
+
{% else %}
|
| 168 |
+
<span style="color: orange;">→ No change</span>
|
| 169 |
+
{% endif %}
|
| 170 |
+
</div>
|
| 171 |
+
</div>
|
| 172 |
+
</div>
|
| 173 |
+
</div>
|
| 174 |
+
|
| 175 |
+
<div class="report-section">
|
| 176 |
+
<h2>Progress Chart</h2>
|
| 177 |
+
<div class="progress-chart">
|
| 178 |
+
<img src="{{ chart_path }}" alt="Progress Chart" style="max-width: 100%;">
|
| 179 |
+
</div>
|
| 180 |
+
</div>
|
| 181 |
+
|
| 182 |
+
<div class="report-section">
|
| 183 |
+
<h2>Recommendations</h2>
|
| 184 |
+
<div class="recommendations">
|
| 185 |
+
{% for recommendation in recommendations %}
|
| 186 |
+
<div class="recommendation-item">• {{ recommendation }}</div>
|
| 187 |
+
{% endfor %}
|
| 188 |
+
</div>
|
| 189 |
+
</div>
|
| 190 |
+
|
| 191 |
+
<div class="footer">
|
| 192 |
+
<p>Generated by CASL Voice Bot Speech Therapy Assessment Tool</p>
|
| 193 |
+
<p>© {{ report_date.split('/')[-1] }} Speech Therapy Assessment System</p>
|
| 194 |
+
</div>
|
| 195 |
+
</body>
|
| 196 |
+
</html>
|
app_hf.py
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Hugging Face Spaces entry point for CASL Voice Bot
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Import the app from app_main.py
|
| 8 |
+
from app_main import app
|
| 9 |
+
|
| 10 |
+
# This is the entry point that Hugging Face Spaces will use
|
| 11 |
+
if __name__ == "__main__":
|
| 12 |
+
app.launch()
|
app_livekit.py
ADDED
|
@@ -0,0 +1,469 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
CASL Voice Bot - Speech Pathology Assistant
|
| 5 |
+
Using LiveKit agents with OpenAI's real-time capabilities
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import asyncio
|
| 10 |
+
import gradio as gr
|
| 11 |
+
import logging
|
| 12 |
+
import tempfile
|
| 13 |
+
import queue
|
| 14 |
+
import threading
|
| 15 |
+
import time
|
| 16 |
+
from dotenv import load_dotenv
|
| 17 |
+
from livekit import agents
|
| 18 |
+
from openai import AsyncOpenAI
|
| 19 |
+
|
| 20 |
+
# Load environment variables
|
| 21 |
+
load_dotenv()
|
| 22 |
+
|
| 23 |
+
# Set up logging
|
| 24 |
+
logging.basicConfig(level=logging.INFO)
|
| 25 |
+
logger = logging.getLogger(__name__)
|
| 26 |
+
|
| 27 |
+
# Initialize OpenAI client
|
| 28 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 29 |
+
|
| 30 |
+
# Speech Pathologist Agent Prompt
|
| 31 |
+
SPEECH_PATHOLOGIST_PROMPT = """
|
| 32 |
+
You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
|
| 33 |
+
Your are working with a student with speech impediments typically with ASD
|
| 34 |
+
You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
|
| 35 |
+
Each domain from the CASL-2 framework can be analyzed using the sample:
|
| 36 |
+
Lexical/Semantic Skills:
|
| 37 |
+
This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
|
| 38 |
+
Key Subtests:
|
| 39 |
+
Antonyms: Identifying words with opposite meanings.
|
| 40 |
+
Synonyms: Identifying words with similar meanings.
|
| 41 |
+
Idiomatic Language: Understanding and interpreting idioms and figurative language.
|
| 42 |
+
Evaluate vocabulary diversity (type-token ratio).
|
| 43 |
+
Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
|
| 44 |
+
Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
|
| 45 |
+
Syntactic Skills:
|
| 46 |
+
This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
|
| 47 |
+
Key Subtests:
|
| 48 |
+
Sentence Expression: Producing grammatically correct sentences based on prompts.
|
| 49 |
+
Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
|
| 50 |
+
Examine sentence structure for grammatical accuracy.
|
| 51 |
+
Identify errors in verb tense, subject-verb agreement, or sentence complexity.
|
| 52 |
+
Note the use of clauses, conjunctions, and varied sentence types.
|
| 53 |
+
Supralinguistic Skills:
|
| 54 |
+
This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
|
| 55 |
+
Key Subtests:
|
| 56 |
+
Inferences: Understanding information that is not explicitly stated.
|
| 57 |
+
Meaning from Context: Deriving meaning from surrounding text or dialogue.
|
| 58 |
+
Nonliteral Language: Interpreting figurative language, such as metaphors or irony
|
| 59 |
+
Look for use or understanding of figurative language, idioms, or humor.
|
| 60 |
+
Assess ability to handle ambiguous or implied meanings in context.
|
| 61 |
+
Identify advanced language use for abstract or hypothetical ideas.
|
| 62 |
+
Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
|
| 63 |
+
This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
|
| 64 |
+
Key Subtests:
|
| 65 |
+
Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
|
| 66 |
+
|
| 67 |
+
Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
|
| 68 |
+
"""
|
| 69 |
+
|
| 70 |
+
class GradioInputDevice(agents.InputDevice):
|
| 71 |
+
"""Custom input device that works with Gradio"""
|
| 72 |
+
|
| 73 |
+
def __init__(self):
|
| 74 |
+
super().__init__()
|
| 75 |
+
self.audio_queue = asyncio.Queue()
|
| 76 |
+
self.is_active = True
|
| 77 |
+
|
| 78 |
+
async def receive(self) -> agents.AudioChunk:
|
| 79 |
+
"""Receive audio data from the queue"""
|
| 80 |
+
try:
|
| 81 |
+
audio_data = await asyncio.wait_for(self.audio_queue.get(), timeout=0.1)
|
| 82 |
+
return audio_data
|
| 83 |
+
except asyncio.TimeoutError:
|
| 84 |
+
return None
|
| 85 |
+
|
| 86 |
+
async def add_audio(self, audio_data):
|
| 87 |
+
"""Add audio data to the queue"""
|
| 88 |
+
if audio_data is None:
|
| 89 |
+
return
|
| 90 |
+
|
| 91 |
+
# Convert gradio audio format to AudioChunk
|
| 92 |
+
sample_rate, audio_array = audio_data
|
| 93 |
+
audio_chunk = agents.AudioChunk(
|
| 94 |
+
samples=audio_array,
|
| 95 |
+
sample_rate=sample_rate,
|
| 96 |
+
is_last=False
|
| 97 |
+
)
|
| 98 |
+
await self.audio_queue.put(audio_chunk)
|
| 99 |
+
|
| 100 |
+
def stop(self):
|
| 101 |
+
"""Stop the input device"""
|
| 102 |
+
self.is_active = False
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
class GradioOutputDevice(agents.OutputDevice):
|
| 106 |
+
"""Custom output device that works with Gradio"""
|
| 107 |
+
|
| 108 |
+
def __init__(self):
|
| 109 |
+
super().__init__()
|
| 110 |
+
self.output_queue = asyncio.Queue()
|
| 111 |
+
|
| 112 |
+
async def transmit(self, audio_chunk: agents.AudioChunk) -> None:
|
| 113 |
+
"""Transmit audio chunk to the queue"""
|
| 114 |
+
if audio_chunk is not None:
|
| 115 |
+
await self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
|
| 116 |
+
|
| 117 |
+
async def get_latest_audio(self):
|
| 118 |
+
"""Get the latest audio from the queue"""
|
| 119 |
+
try:
|
| 120 |
+
return await asyncio.wait_for(self.output_queue.get(), timeout=0.1)
|
| 121 |
+
except asyncio.TimeoutError:
|
| 122 |
+
return None
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
class SpeechPathologistAssistant:
|
| 126 |
+
"""Speech pathologist assistant using LiveKit agents"""
|
| 127 |
+
|
| 128 |
+
def __init__(self):
|
| 129 |
+
self.input_device = GradioInputDevice()
|
| 130 |
+
self.output_device = GradioOutputDevice()
|
| 131 |
+
self.assistant = None
|
| 132 |
+
self.assistant_task = None
|
| 133 |
+
self.transcript = []
|
| 134 |
+
self.is_running = False
|
| 135 |
+
self.notes = []
|
| 136 |
+
self.current_assessment = {
|
| 137 |
+
"lexical_semantic": 0,
|
| 138 |
+
"syntactic": 0,
|
| 139 |
+
"supralinguistic": 0,
|
| 140 |
+
"pragmatic": 0
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
async def initialize_assistant(self, voice="shimmer"):
|
| 144 |
+
"""Initialize the speech assistant"""
|
| 145 |
+
self.assistant = agents.VoiceAssistant(
|
| 146 |
+
openai_client=openai_client,
|
| 147 |
+
model="gpt-4o",
|
| 148 |
+
voice=voice,
|
| 149 |
+
input_device=self.input_device,
|
| 150 |
+
output_device=self.output_device,
|
| 151 |
+
initial_message=SPEECH_PATHOLOGIST_PROMPT,
|
| 152 |
+
real_time=True, # Enable real-time processing
|
| 153 |
+
)
|
| 154 |
+
|
| 155 |
+
# Add transcript and response callbacks
|
| 156 |
+
self.assistant.on_transcript = self.on_transcript
|
| 157 |
+
self.assistant.on_response = self.on_response
|
| 158 |
+
|
| 159 |
+
def on_transcript(self, transcript):
|
| 160 |
+
"""Handle transcript from user"""
|
| 161 |
+
self.transcript.append(f"Student: {transcript.text}")
|
| 162 |
+
|
| 163 |
+
# Basic analysis of speech for CASL-2 categories
|
| 164 |
+
self.analyze_speech(transcript.text)
|
| 165 |
+
|
| 166 |
+
return True
|
| 167 |
+
|
| 168 |
+
def on_response(self, response):
|
| 169 |
+
"""Handle response from assistant"""
|
| 170 |
+
self.transcript.append(f"Speech Pathologist: {response.text}")
|
| 171 |
+
return True
|
| 172 |
+
|
| 173 |
+
def analyze_speech(self, text):
|
| 174 |
+
"""Analyze speech for CASL-2 categories"""
|
| 175 |
+
# Simple heuristic analysis - in a real app, this would use more sophisticated NLP
|
| 176 |
+
|
| 177 |
+
# Lexical/Semantic: check vocabulary diversity
|
| 178 |
+
words = text.lower().split()
|
| 179 |
+
unique_words = set(words)
|
| 180 |
+
if len(unique_words) / max(1, len(words)) > 0.7:
|
| 181 |
+
self.current_assessment["lexical_semantic"] += 1
|
| 182 |
+
self.notes.append("Lexical/Semantic: Good vocabulary diversity")
|
| 183 |
+
|
| 184 |
+
# Syntactic: check for sentence complexity
|
| 185 |
+
sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
|
| 186 |
+
avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
|
| 187 |
+
if avg_words > 7:
|
| 188 |
+
self.current_assessment["syntactic"] += 1
|
| 189 |
+
self.notes.append("Syntactic: Complex sentence structures used")
|
| 190 |
+
|
| 191 |
+
# Supralinguistic: check for figurative language (very basic check)
|
| 192 |
+
figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
|
| 193 |
+
if any(marker in text.lower() for marker in figurative_markers):
|
| 194 |
+
self.current_assessment["supralinguistic"] += 1
|
| 195 |
+
self.notes.append("Supralinguistic: Potential figurative language detected")
|
| 196 |
+
|
| 197 |
+
# Pragmatic: basic check for conversational elements
|
| 198 |
+
pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
|
| 199 |
+
if any(marker in text.lower() for marker in pragmatic_markers):
|
| 200 |
+
self.current_assessment["pragmatic"] += 1
|
| 201 |
+
self.notes.append("Pragmatic: Appropriate social language detected")
|
| 202 |
+
|
| 203 |
+
async def start_assistant(self, voice_model, student_id):
|
| 204 |
+
"""Start the assistant in a background task"""
|
| 205 |
+
await self.initialize_assistant(voice_model)
|
| 206 |
+
|
| 207 |
+
self.is_running = True
|
| 208 |
+
|
| 209 |
+
# Add student info to transcript
|
| 210 |
+
student_info = f" for {student_id}" if student_id else ""
|
| 211 |
+
self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
|
| 212 |
+
|
| 213 |
+
# Run the assistant in a background task
|
| 214 |
+
self.assistant_task = asyncio.create_task(self.assistant.run())
|
| 215 |
+
|
| 216 |
+
return "Session active. The AI will introduce itself."
|
| 217 |
+
|
| 218 |
+
def stop_assistant(self):
|
| 219 |
+
"""Stop the assistant"""
|
| 220 |
+
if self.assistant_task and not self.assistant_task.done():
|
| 221 |
+
self.assistant_task.cancel()
|
| 222 |
+
|
| 223 |
+
self.input_device.stop()
|
| 224 |
+
self.is_running = False
|
| 225 |
+
|
| 226 |
+
# Add ending to transcript
|
| 227 |
+
self.transcript.append("Session ended.")
|
| 228 |
+
return "Session stopped."
|
| 229 |
+
|
| 230 |
+
async def process_audio(self, audio):
|
| 231 |
+
"""Process audio from Gradio interface"""
|
| 232 |
+
if not self.is_running or audio is None:
|
| 233 |
+
return None, self.get_transcript(), self.get_assessment_html()
|
| 234 |
+
|
| 235 |
+
# Add audio to input device
|
| 236 |
+
await self.input_device.add_audio(audio)
|
| 237 |
+
|
| 238 |
+
# Check for assistant output
|
| 239 |
+
output_audio = await self.output_device.get_latest_audio()
|
| 240 |
+
|
| 241 |
+
return output_audio, self.get_transcript(), self.get_assessment_html()
|
| 242 |
+
|
| 243 |
+
def get_transcript(self):
|
| 244 |
+
"""Get the current transcript"""
|
| 245 |
+
return "\n".join(self.transcript)
|
| 246 |
+
|
| 247 |
+
def get_assessment_html(self):
|
| 248 |
+
"""Get HTML representation of the current assessment"""
|
| 249 |
+
html = """
|
| 250 |
+
<div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
|
| 251 |
+
<h3>CASL-2 Assessment Progress</h3>
|
| 252 |
+
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
|
| 253 |
+
"""
|
| 254 |
+
|
| 255 |
+
for category, value in self.current_assessment.items():
|
| 256 |
+
category_name = category.replace('_', ' ').title()
|
| 257 |
+
progress_width = min(100, value * 10)
|
| 258 |
+
html += f"""
|
| 259 |
+
<div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
|
| 260 |
+
<div><strong>{category_name}</strong></div>
|
| 261 |
+
<div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
|
| 262 |
+
<div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
|
| 263 |
+
</div>
|
| 264 |
+
<div style="margin-top: 5px;">{value} points</div>
|
| 265 |
+
</div>
|
| 266 |
+
"""
|
| 267 |
+
|
| 268 |
+
html += """
|
| 269 |
+
</div>
|
| 270 |
+
"""
|
| 271 |
+
|
| 272 |
+
# Add recent notes
|
| 273 |
+
if self.notes:
|
| 274 |
+
html += """
|
| 275 |
+
<div style="margin-top: 15px;">
|
| 276 |
+
<h4>Recent Observations</h4>
|
| 277 |
+
<ul style="margin-top: 5px;">
|
| 278 |
+
"""
|
| 279 |
+
|
| 280 |
+
for note in self.notes[-5:]:
|
| 281 |
+
html += f"<li>{note}</li>"
|
| 282 |
+
|
| 283 |
+
html += """
|
| 284 |
+
</ul>
|
| 285 |
+
</div>
|
| 286 |
+
"""
|
| 287 |
+
|
| 288 |
+
html += "</div>"
|
| 289 |
+
return html
|
| 290 |
+
|
| 291 |
+
def add_note(self, note):
|
| 292 |
+
"""Add a custom note"""
|
| 293 |
+
if note.strip():
|
| 294 |
+
self.notes.append(note)
|
| 295 |
+
return f"Note added: {note}"
|
| 296 |
+
return "Note was empty, not added."
|
| 297 |
+
|
| 298 |
+
def save_session(self, student_id):
|
| 299 |
+
"""Save session to file"""
|
| 300 |
+
if not student_id:
|
| 301 |
+
student_id = "anonymous"
|
| 302 |
+
|
| 303 |
+
# Create session data directory if it doesn't exist
|
| 304 |
+
os.makedirs("session_data", exist_ok=True)
|
| 305 |
+
|
| 306 |
+
# Save transcript
|
| 307 |
+
timestamp = time.strftime("%Y%m%d-%H%M%S")
|
| 308 |
+
filename = f"session_data/{student_id}_{timestamp}.txt"
|
| 309 |
+
|
| 310 |
+
with open(filename, "w") as f:
|
| 311 |
+
f.write("\n".join(self.transcript))
|
| 312 |
+
f.write("\n\n--- ASSESSMENT NOTES ---\n")
|
| 313 |
+
for note in self.notes:
|
| 314 |
+
f.write(f"- {note}\n")
|
| 315 |
+
f.write("\n--- CASL-2 SCORES ---\n")
|
| 316 |
+
for category, score in self.current_assessment.items():
|
| 317 |
+
f.write(f"{category.replace('_', ' ').title()}: {score}\n")
|
| 318 |
+
|
| 319 |
+
return f"Session saved to {filename}"
|
| 320 |
+
|
| 321 |
+
|
| 322 |
+
# Create the speech pathology assistant
|
| 323 |
+
speech_assistant = SpeechPathologistAssistant()
|
| 324 |
+
|
| 325 |
+
|
| 326 |
+
async def start_session(voice_model, student_id):
|
| 327 |
+
"""Start the speech pathology session"""
|
| 328 |
+
status = await speech_assistant.start_assistant(voice_model, student_id)
|
| 329 |
+
return status, None, speech_assistant.get_transcript(), speech_assistant.get_assessment_html()
|
| 330 |
+
|
| 331 |
+
|
| 332 |
+
def stop_session():
|
| 333 |
+
"""Stop the speech pathology session"""
|
| 334 |
+
return speech_assistant.stop_assistant(), None, speech_assistant.get_transcript(), speech_assistant.get_assessment_html()
|
| 335 |
+
|
| 336 |
+
|
| 337 |
+
async def process_mic_input(audio, progress=gr.Progress()):
|
| 338 |
+
"""Process microphone input"""
|
| 339 |
+
progress(0, desc="Processing speech...")
|
| 340 |
+
audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
|
| 341 |
+
progress(1, desc="Done")
|
| 342 |
+
return audio_output, transcript, assessment
|
| 343 |
+
|
| 344 |
+
|
| 345 |
+
def add_note(note):
|
| 346 |
+
"""Add a note to the session"""
|
| 347 |
+
result = speech_assistant.add_note(note)
|
| 348 |
+
return "", result, speech_assistant.get_assessment_html()
|
| 349 |
+
|
| 350 |
+
|
| 351 |
+
def save_session(student_id):
|
| 352 |
+
"""Save the current session"""
|
| 353 |
+
return speech_assistant.save_session(student_id)
|
| 354 |
+
|
| 355 |
+
|
| 356 |
+
# Create Gradio Interface
|
| 357 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 358 |
+
gr.Markdown("# CASL-2 Speech Pathology Assistant")
|
| 359 |
+
gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
|
| 360 |
+
|
| 361 |
+
with gr.Row():
|
| 362 |
+
with gr.Column(scale=1):
|
| 363 |
+
student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
|
| 364 |
+
voice_select = gr.Dropdown(
|
| 365 |
+
["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
| 366 |
+
value="shimmer",
|
| 367 |
+
label="Assistant Voice"
|
| 368 |
+
)
|
| 369 |
+
start_button = gr.Button("Start Session", variant="primary")
|
| 370 |
+
stop_button = gr.Button("Stop Session", variant="stop")
|
| 371 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 372 |
+
|
| 373 |
+
with gr.Accordion("SLP Tools", open=True):
|
| 374 |
+
note_input = gr.Textbox(
|
| 375 |
+
label="Add Assessment Note",
|
| 376 |
+
placeholder="Enter observation or assessment note here..."
|
| 377 |
+
)
|
| 378 |
+
note_button = gr.Button("Add Note")
|
| 379 |
+
note_status = gr.Textbox(label="Note Status")
|
| 380 |
+
save_button = gr.Button("Save Session")
|
| 381 |
+
save_status = gr.Textbox(label="Save Status")
|
| 382 |
+
|
| 383 |
+
with gr.Column(scale=2):
|
| 384 |
+
audio_output = gr.Audio(label="AI Speech", autoplay=True)
|
| 385 |
+
audio_input = gr.Audio(
|
| 386 |
+
label="Speak to the AI",
|
| 387 |
+
type="microphone",
|
| 388 |
+
source="microphone",
|
| 389 |
+
streaming=True
|
| 390 |
+
)
|
| 391 |
+
|
| 392 |
+
with gr.Row():
|
| 393 |
+
with gr.Column(scale=1):
|
| 394 |
+
assessment_html = gr.HTML(label="Assessment Progress")
|
| 395 |
+
with gr.Column(scale=1):
|
| 396 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 397 |
+
|
| 398 |
+
with gr.Accordion("About This Application", open=False):
|
| 399 |
+
gr.Markdown("""
|
| 400 |
+
### About CASL-2 Speech Pathology Assistant
|
| 401 |
+
|
| 402 |
+
This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
|
| 403 |
+
|
| 404 |
+
- **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
|
| 405 |
+
- **Syntactic Skills**: Grammar and sentence structure
|
| 406 |
+
- **Supralinguistic Skills**: Higher-level language beyond literal meanings
|
| 407 |
+
- **Pragmatic Skills**: Social use of language (less emphasis for younger students)
|
| 408 |
+
|
| 409 |
+
The AI will provide structured assessments and exercises to help evaluate speech patterns.
|
| 410 |
+
|
| 411 |
+
### How to Use
|
| 412 |
+
|
| 413 |
+
1. Optionally enter a Student ID to track sessions
|
| 414 |
+
2. Select the AI voice you prefer
|
| 415 |
+
3. Click "Start Session" to begin
|
| 416 |
+
4. The AI will introduce itself and begin the assessment
|
| 417 |
+
5. Speak into your microphone when it's your turn
|
| 418 |
+
6. View the transcript to track the conversation
|
| 419 |
+
7. SLPs can add assessment notes as needed
|
| 420 |
+
8. Save the session when finished
|
| 421 |
+
9. Click "Stop Session" when done
|
| 422 |
+
|
| 423 |
+
### For Speech-Language Pathologists
|
| 424 |
+
|
| 425 |
+
This tool is designed to supplement, not replace, professional SLP services. SLPs can:
|
| 426 |
+
|
| 427 |
+
- Add custom notes during the session
|
| 428 |
+
- Save session data for later reference
|
| 429 |
+
- Track progress across multiple sessions
|
| 430 |
+
- Use the AI as a consistent assessment tool
|
| 431 |
+
""")
|
| 432 |
+
|
| 433 |
+
# Setup event handlers
|
| 434 |
+
start_button.click(
|
| 435 |
+
fn=lambda voice, student: asyncio.run(start_session(voice, student)),
|
| 436 |
+
inputs=[voice_select, student_id],
|
| 437 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 438 |
+
)
|
| 439 |
+
stop_button.click(
|
| 440 |
+
fn=stop_session,
|
| 441 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 442 |
+
)
|
| 443 |
+
note_button.click(
|
| 444 |
+
fn=add_note,
|
| 445 |
+
inputs=note_input,
|
| 446 |
+
outputs=[note_input, note_status, assessment_html]
|
| 447 |
+
)
|
| 448 |
+
save_button.click(
|
| 449 |
+
fn=save_session,
|
| 450 |
+
inputs=student_id,
|
| 451 |
+
outputs=save_status
|
| 452 |
+
)
|
| 453 |
+
|
| 454 |
+
# Setup audio processing
|
| 455 |
+
audio_input.stream(
|
| 456 |
+
fn=lambda audio: asyncio.run(process_mic_input(audio)),
|
| 457 |
+
inputs=audio_input,
|
| 458 |
+
outputs=[audio_output, transcript, assessment_html]
|
| 459 |
+
)
|
| 460 |
+
|
| 461 |
+
|
| 462 |
+
def main(share=True):
|
| 463 |
+
"""Main function to launch the app"""
|
| 464 |
+
app.launch(share=share)
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
# Entry point for the application
|
| 468 |
+
if __name__ == "__main__":
|
| 469 |
+
main()
|
app_main.py
ADDED
|
@@ -0,0 +1,469 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
CASL Voice Bot - Speech Pathology Assistant
|
| 5 |
+
Main application file that can be used for both local deployment and Hugging Face Spaces
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import asyncio
|
| 10 |
+
import gradio as gr
|
| 11 |
+
import logging
|
| 12 |
+
import tempfile
|
| 13 |
+
import queue
|
| 14 |
+
import threading
|
| 15 |
+
import time
|
| 16 |
+
from dotenv import load_dotenv
|
| 17 |
+
from openai import AsyncOpenAI
|
| 18 |
+
|
| 19 |
+
# Load environment variables
|
| 20 |
+
load_dotenv()
|
| 21 |
+
|
| 22 |
+
# Set up logging
|
| 23 |
+
logging.basicConfig(level=logging.INFO)
|
| 24 |
+
logger = logging.getLogger(__name__)
|
| 25 |
+
|
| 26 |
+
# Initialize OpenAI client with API key from environment
|
| 27 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 28 |
+
|
| 29 |
+
# Speech Pathologist Agent Prompt
|
| 30 |
+
SPEECH_PATHOLOGIST_PROMPT = """
|
| 31 |
+
You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
|
| 32 |
+
Your are working with a student with speech impediments typically with ASD
|
| 33 |
+
You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
|
| 34 |
+
Each domain from the CASL-2 framework can be analyzed using the sample:
|
| 35 |
+
Lexical/Semantic Skills:
|
| 36 |
+
This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
|
| 37 |
+
Key Subtests:
|
| 38 |
+
Antonyms: Identifying words with opposite meanings.
|
| 39 |
+
Synonyms: Identifying words with similar meanings.
|
| 40 |
+
Idiomatic Language: Understanding and interpreting idioms and figurative language.
|
| 41 |
+
Evaluate vocabulary diversity (type-token ratio).
|
| 42 |
+
Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
|
| 43 |
+
Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
|
| 44 |
+
Syntactic Skills:
|
| 45 |
+
This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
|
| 46 |
+
Key Subtests:
|
| 47 |
+
Sentence Expression: Producing grammatically correct sentences based on prompts.
|
| 48 |
+
Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
|
| 49 |
+
Examine sentence structure for grammatical accuracy.
|
| 50 |
+
Identify errors in verb tense, subject-verb agreement, or sentence complexity.
|
| 51 |
+
Note the use of clauses, conjunctions, and varied sentence types.
|
| 52 |
+
Supralinguistic Skills:
|
| 53 |
+
This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
|
| 54 |
+
Key Subtests:
|
| 55 |
+
Inferences: Understanding information that is not explicitly stated.
|
| 56 |
+
Meaning from Context: Deriving meaning from surrounding text or dialogue.
|
| 57 |
+
Nonliteral Language: Interpreting figurative language, such as metaphors or irony
|
| 58 |
+
Look for use or understanding of figurative language, idioms, or humor.
|
| 59 |
+
Assess ability to handle ambiguous or implied meanings in context.
|
| 60 |
+
Identify advanced language use for abstract or hypothetical ideas.
|
| 61 |
+
Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
|
| 62 |
+
This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
|
| 63 |
+
Key Subtests:
|
| 64 |
+
Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
|
| 65 |
+
|
| 66 |
+
Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
|
| 67 |
+
"""
|
| 68 |
+
|
| 69 |
+
# Custom audio processing for Gradio interface
|
| 70 |
+
class AudioProcessor:
|
| 71 |
+
def __init__(self):
|
| 72 |
+
self.transcript = []
|
| 73 |
+
self.is_active = False
|
| 74 |
+
self.voice_model = "shimmer" # Default voice
|
| 75 |
+
self.notes = []
|
| 76 |
+
self.current_assessment = {
|
| 77 |
+
"lexical_semantic": 0,
|
| 78 |
+
"syntactic": 0,
|
| 79 |
+
"supralinguistic": 0,
|
| 80 |
+
"pragmatic": 0
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
async def process_speech(self, audio_data, openai_client):
|
| 84 |
+
"""Process speech using OpenAI's API"""
|
| 85 |
+
if not self.is_active or audio_data is None:
|
| 86 |
+
return None, "\n".join(self.transcript), self.get_assessment_html()
|
| 87 |
+
|
| 88 |
+
# Prepare audio file for OpenAI
|
| 89 |
+
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
|
| 90 |
+
temp_file.close()
|
| 91 |
+
|
| 92 |
+
try:
|
| 93 |
+
# Save audio data to temporary file
|
| 94 |
+
sample_rate, audio_array = audio_data
|
| 95 |
+
import scipy.io.wavfile
|
| 96 |
+
scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
|
| 97 |
+
|
| 98 |
+
# Transcribe audio using OpenAI
|
| 99 |
+
with open(temp_file.name, "rb") as audio_file:
|
| 100 |
+
transcript_response = await openai_client.audio.transcriptions.create(
|
| 101 |
+
file=audio_file,
|
| 102 |
+
model="whisper-1"
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
user_text = transcript_response.text
|
| 106 |
+
if user_text.strip():
|
| 107 |
+
self.transcript.append(f"Student: {user_text}")
|
| 108 |
+
|
| 109 |
+
# Analyze speech for CASL-2 categories
|
| 110 |
+
self.analyze_speech(user_text)
|
| 111 |
+
|
| 112 |
+
# Generate assistant response
|
| 113 |
+
chat_response = await openai_client.chat.completions.create(
|
| 114 |
+
model="gpt-4o",
|
| 115 |
+
messages=[
|
| 116 |
+
{"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
|
| 117 |
+
{"role": "user", "content": user_text}
|
| 118 |
+
]
|
| 119 |
+
)
|
| 120 |
+
|
| 121 |
+
assistant_text = chat_response.choices[0].message.content
|
| 122 |
+
self.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 123 |
+
|
| 124 |
+
# Generate speech from text
|
| 125 |
+
speech_response = await openai_client.audio.speech.create(
|
| 126 |
+
model="tts-1",
|
| 127 |
+
voice=self.voice_model,
|
| 128 |
+
input=assistant_text
|
| 129 |
+
)
|
| 130 |
+
|
| 131 |
+
# Save speech to temporary file
|
| 132 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 133 |
+
response_temp_file.close()
|
| 134 |
+
|
| 135 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 136 |
+
|
| 137 |
+
# Load audio data for Gradio
|
| 138 |
+
import soundfile as sf
|
| 139 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 140 |
+
|
| 141 |
+
# Clean up
|
| 142 |
+
os.unlink(response_temp_file.name)
|
| 143 |
+
|
| 144 |
+
return (sample_rate, audio_data), "\n".join(self.transcript), self.get_assessment_html()
|
| 145 |
+
|
| 146 |
+
except Exception as e:
|
| 147 |
+
logger.error(f"Error processing speech: {e}")
|
| 148 |
+
self.transcript.append(f"Error: {str(e)}")
|
| 149 |
+
finally:
|
| 150 |
+
# Clean up temp file
|
| 151 |
+
os.unlink(temp_file.name)
|
| 152 |
+
|
| 153 |
+
return None, "\n".join(self.transcript), self.get_assessment_html()
|
| 154 |
+
|
| 155 |
+
def analyze_speech(self, text):
|
| 156 |
+
"""Analyze speech for CASL-2 categories"""
|
| 157 |
+
# Simple heuristic analysis - in a real app, this would use more sophisticated NLP
|
| 158 |
+
|
| 159 |
+
# Lexical/Semantic: check vocabulary diversity
|
| 160 |
+
words = text.lower().split()
|
| 161 |
+
unique_words = set(words)
|
| 162 |
+
if len(unique_words) / max(1, len(words)) > 0.7:
|
| 163 |
+
self.current_assessment["lexical_semantic"] += 1
|
| 164 |
+
self.notes.append("Lexical/Semantic: Good vocabulary diversity")
|
| 165 |
+
|
| 166 |
+
# Syntactic: check for sentence complexity
|
| 167 |
+
sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
|
| 168 |
+
avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
|
| 169 |
+
if avg_words > 7:
|
| 170 |
+
self.current_assessment["syntactic"] += 1
|
| 171 |
+
self.notes.append("Syntactic: Complex sentence structures used")
|
| 172 |
+
|
| 173 |
+
# Supralinguistic: check for figurative language (very basic check)
|
| 174 |
+
figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
|
| 175 |
+
if any(marker in text.lower() for marker in figurative_markers):
|
| 176 |
+
self.current_assessment["supralinguistic"] += 1
|
| 177 |
+
self.notes.append("Supralinguistic: Potential figurative language detected")
|
| 178 |
+
|
| 179 |
+
# Pragmatic: basic check for conversational elements
|
| 180 |
+
pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
|
| 181 |
+
if any(marker in text.lower() for marker in pragmatic_markers):
|
| 182 |
+
self.current_assessment["pragmatic"] += 1
|
| 183 |
+
self.notes.append("Pragmatic: Appropriate social language detected")
|
| 184 |
+
|
| 185 |
+
def get_assessment_html(self):
|
| 186 |
+
"""Get HTML representation of the current assessment"""
|
| 187 |
+
html = """
|
| 188 |
+
<div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
|
| 189 |
+
<h3>CASL-2 Assessment Progress</h3>
|
| 190 |
+
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
|
| 191 |
+
"""
|
| 192 |
+
|
| 193 |
+
for category, value in self.current_assessment.items():
|
| 194 |
+
category_name = category.replace('_', ' ').title()
|
| 195 |
+
progress_width = min(100, value * 10)
|
| 196 |
+
html += f"""
|
| 197 |
+
<div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
|
| 198 |
+
<div><strong>{category_name}</strong></div>
|
| 199 |
+
<div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
|
| 200 |
+
<div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
|
| 201 |
+
</div>
|
| 202 |
+
<div style="margin-top: 5px;">{value} points</div>
|
| 203 |
+
</div>
|
| 204 |
+
"""
|
| 205 |
+
|
| 206 |
+
html += """
|
| 207 |
+
</div>
|
| 208 |
+
"""
|
| 209 |
+
|
| 210 |
+
# Add recent notes
|
| 211 |
+
if self.notes:
|
| 212 |
+
html += """
|
| 213 |
+
<div style="margin-top: 15px;">
|
| 214 |
+
<h4>Recent Observations</h4>
|
| 215 |
+
<ul style="margin-top: 5px;">
|
| 216 |
+
"""
|
| 217 |
+
|
| 218 |
+
for note in self.notes[-5:]:
|
| 219 |
+
html += f"<li>{note}</li>"
|
| 220 |
+
|
| 221 |
+
html += """
|
| 222 |
+
</ul>
|
| 223 |
+
</div>
|
| 224 |
+
"""
|
| 225 |
+
|
| 226 |
+
html += "</div>"
|
| 227 |
+
return html
|
| 228 |
+
|
| 229 |
+
def start_session(self, voice_model, student_id):
|
| 230 |
+
"""Start a new session"""
|
| 231 |
+
self.is_active = True
|
| 232 |
+
self.voice_model = voice_model if voice_model else "shimmer"
|
| 233 |
+
self.transcript = []
|
| 234 |
+
self.notes = []
|
| 235 |
+
self.current_assessment = {
|
| 236 |
+
"lexical_semantic": 0,
|
| 237 |
+
"syntactic": 0,
|
| 238 |
+
"supralinguistic": 0,
|
| 239 |
+
"pragmatic": 0
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
student_info = f" for {student_id}" if student_id else ""
|
| 243 |
+
self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
|
| 244 |
+
return "Session active. Please wait for the AI to introduce itself."
|
| 245 |
+
|
| 246 |
+
def stop_session(self):
|
| 247 |
+
"""Stop the current session"""
|
| 248 |
+
self.is_active = False
|
| 249 |
+
self.transcript.append("Session ended.")
|
| 250 |
+
return "Session stopped."
|
| 251 |
+
|
| 252 |
+
def add_note(self, note):
|
| 253 |
+
"""Add a custom note"""
|
| 254 |
+
if note.strip():
|
| 255 |
+
self.notes.append(note)
|
| 256 |
+
return f"Note added: {note}"
|
| 257 |
+
return "Note was empty, not added."
|
| 258 |
+
|
| 259 |
+
|
| 260 |
+
# Create the audio processor instance
|
| 261 |
+
audio_processor = AudioProcessor()
|
| 262 |
+
|
| 263 |
+
|
| 264 |
+
async def start_session(voice_model, student_id):
|
| 265 |
+
"""Start the speech pathology session"""
|
| 266 |
+
status = audio_processor.start_session(voice_model, student_id)
|
| 267 |
+
|
| 268 |
+
# Generate initial AI introduction
|
| 269 |
+
try:
|
| 270 |
+
# Generate assistant response
|
| 271 |
+
chat_response = await openai_client.chat.completions.create(
|
| 272 |
+
model="gpt-4o",
|
| 273 |
+
messages=[
|
| 274 |
+
{"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
|
| 275 |
+
{"role": "user", "content": "Hello"} # Initial trigger
|
| 276 |
+
]
|
| 277 |
+
)
|
| 278 |
+
|
| 279 |
+
assistant_text = chat_response.choices[0].message.content
|
| 280 |
+
audio_processor.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 281 |
+
|
| 282 |
+
# Generate speech from text
|
| 283 |
+
speech_response = await openai_client.audio.speech.create(
|
| 284 |
+
model="tts-1",
|
| 285 |
+
voice=audio_processor.voice_model,
|
| 286 |
+
input=assistant_text
|
| 287 |
+
)
|
| 288 |
+
|
| 289 |
+
# Save speech to temporary file
|
| 290 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 291 |
+
response_temp_file.close()
|
| 292 |
+
|
| 293 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 294 |
+
|
| 295 |
+
# Load audio data for Gradio
|
| 296 |
+
import soundfile as sf
|
| 297 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 298 |
+
|
| 299 |
+
# Clean up
|
| 300 |
+
os.unlink(response_temp_file.name)
|
| 301 |
+
|
| 302 |
+
return status, (sample_rate, audio_data), "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 303 |
+
|
| 304 |
+
except Exception as e:
|
| 305 |
+
logger.error(f"Error starting session: {e}")
|
| 306 |
+
audio_processor.transcript.append(f"Error: {str(e)}")
|
| 307 |
+
return status, None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 308 |
+
|
| 309 |
+
|
| 310 |
+
def stop_session():
|
| 311 |
+
"""Stop the speech pathology session"""
|
| 312 |
+
return audio_processor.stop_session(), None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 313 |
+
|
| 314 |
+
|
| 315 |
+
async def process_mic_input(audio, progress=gr.Progress()):
|
| 316 |
+
"""Process microphone input"""
|
| 317 |
+
if audio is None or not audio_processor.is_active:
|
| 318 |
+
return None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 319 |
+
|
| 320 |
+
progress(0, desc="Processing speech...")
|
| 321 |
+
audio_output, transcript, assessment = await audio_processor.process_speech(audio, openai_client)
|
| 322 |
+
progress(1, desc="Done")
|
| 323 |
+
return audio_output, transcript, assessment
|
| 324 |
+
|
| 325 |
+
|
| 326 |
+
def add_note(note):
|
| 327 |
+
"""Add a note to the session"""
|
| 328 |
+
result = audio_processor.add_note(note)
|
| 329 |
+
return "", result, audio_processor.get_assessment_html()
|
| 330 |
+
|
| 331 |
+
|
| 332 |
+
def save_session(student_id):
|
| 333 |
+
"""Save session to file"""
|
| 334 |
+
if not student_id:
|
| 335 |
+
student_id = "anonymous"
|
| 336 |
+
|
| 337 |
+
# Create session data directory if it doesn't exist
|
| 338 |
+
os.makedirs("session_data", exist_ok=True)
|
| 339 |
+
|
| 340 |
+
# Save transcript
|
| 341 |
+
timestamp = time.strftime("%Y%m%d-%H%M%S")
|
| 342 |
+
filename = f"session_data/{student_id}_{timestamp}.txt"
|
| 343 |
+
|
| 344 |
+
with open(filename, "w") as f:
|
| 345 |
+
f.write("\n".join(audio_processor.transcript))
|
| 346 |
+
f.write("\n\n--- ASSESSMENT NOTES ---\n")
|
| 347 |
+
for note in audio_processor.notes:
|
| 348 |
+
f.write(f"- {note}\n")
|
| 349 |
+
f.write("\n--- CASL-2 SCORES ---\n")
|
| 350 |
+
for category, score in audio_processor.current_assessment.items():
|
| 351 |
+
f.write(f"{category.replace('_', ' ').title()}: {score}\n")
|
| 352 |
+
|
| 353 |
+
return f"Session saved to {filename}"
|
| 354 |
+
|
| 355 |
+
|
| 356 |
+
# Create Gradio Interface
|
| 357 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 358 |
+
gr.Markdown("# CASL-2 Speech Pathology Assistant")
|
| 359 |
+
gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
|
| 360 |
+
|
| 361 |
+
with gr.Row():
|
| 362 |
+
with gr.Column(scale=1):
|
| 363 |
+
student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
|
| 364 |
+
voice_select = gr.Dropdown(
|
| 365 |
+
["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
| 366 |
+
value="shimmer",
|
| 367 |
+
label="Assistant Voice"
|
| 368 |
+
)
|
| 369 |
+
start_button = gr.Button("Start Session", variant="primary")
|
| 370 |
+
stop_button = gr.Button("Stop Session", variant="stop")
|
| 371 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 372 |
+
|
| 373 |
+
with gr.Accordion("SLP Tools", open=True):
|
| 374 |
+
note_input = gr.Textbox(
|
| 375 |
+
label="Add Assessment Note",
|
| 376 |
+
placeholder="Enter observation or assessment note here..."
|
| 377 |
+
)
|
| 378 |
+
note_button = gr.Button("Add Note")
|
| 379 |
+
note_status = gr.Textbox(label="Note Status")
|
| 380 |
+
save_button = gr.Button("Save Session")
|
| 381 |
+
save_status = gr.Textbox(label="Save Status")
|
| 382 |
+
|
| 383 |
+
with gr.Column(scale=2):
|
| 384 |
+
audio_output = gr.Audio(label="AI Speech", autoplay=True)
|
| 385 |
+
audio_input = gr.Audio(
|
| 386 |
+
label="Speak to the AI",
|
| 387 |
+
type="microphone",
|
| 388 |
+
source="microphone",
|
| 389 |
+
streaming=True
|
| 390 |
+
)
|
| 391 |
+
|
| 392 |
+
with gr.Row():
|
| 393 |
+
with gr.Column(scale=1):
|
| 394 |
+
assessment_html = gr.HTML(label="Assessment Progress")
|
| 395 |
+
with gr.Column(scale=1):
|
| 396 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 397 |
+
|
| 398 |
+
with gr.Accordion("About This Application", open=False):
|
| 399 |
+
gr.Markdown("""
|
| 400 |
+
### About CASL-2 Speech Pathology Assistant
|
| 401 |
+
|
| 402 |
+
This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
|
| 403 |
+
|
| 404 |
+
- **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
|
| 405 |
+
- **Syntactic Skills**: Grammar and sentence structure
|
| 406 |
+
- **Supralinguistic Skills**: Higher-level language beyond literal meanings
|
| 407 |
+
- **Pragmatic Skills**: Social use of language (less emphasis for younger students)
|
| 408 |
+
|
| 409 |
+
The AI will provide structured assessments and exercises to help evaluate speech patterns.
|
| 410 |
+
|
| 411 |
+
### How to Use
|
| 412 |
+
|
| 413 |
+
1. Optionally enter a Student ID to track sessions
|
| 414 |
+
2. Select the AI voice you prefer
|
| 415 |
+
3. Click "Start Session" to begin
|
| 416 |
+
4. The AI will introduce itself and begin the assessment
|
| 417 |
+
5. Speak into your microphone when it's your turn
|
| 418 |
+
6. View the transcript to track the conversation
|
| 419 |
+
7. SLPs can add assessment notes as needed
|
| 420 |
+
8. Save the session when finished
|
| 421 |
+
9. Click "Stop Session" when done
|
| 422 |
+
|
| 423 |
+
### For Speech-Language Pathologists
|
| 424 |
+
|
| 425 |
+
This tool is designed to supplement, not replace, professional SLP services. SLPs can:
|
| 426 |
+
|
| 427 |
+
- Add custom notes during the session
|
| 428 |
+
- Save session data for later reference
|
| 429 |
+
- Track progress across multiple sessions
|
| 430 |
+
- Use the AI as a consistent assessment tool
|
| 431 |
+
""")
|
| 432 |
+
|
| 433 |
+
# Setup event handlers
|
| 434 |
+
start_button.click(
|
| 435 |
+
fn=lambda voice, student: asyncio.run(start_session(voice, student)),
|
| 436 |
+
inputs=[voice_select, student_id],
|
| 437 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 438 |
+
)
|
| 439 |
+
stop_button.click(
|
| 440 |
+
fn=stop_session,
|
| 441 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 442 |
+
)
|
| 443 |
+
note_button.click(
|
| 444 |
+
fn=add_note,
|
| 445 |
+
inputs=note_input,
|
| 446 |
+
outputs=[note_input, note_status, assessment_html]
|
| 447 |
+
)
|
| 448 |
+
save_button.click(
|
| 449 |
+
fn=save_session,
|
| 450 |
+
inputs=student_id,
|
| 451 |
+
outputs=save_status
|
| 452 |
+
)
|
| 453 |
+
|
| 454 |
+
# Setup audio processing
|
| 455 |
+
audio_input.stream(
|
| 456 |
+
fn=lambda audio: asyncio.run(process_mic_input(audio)),
|
| 457 |
+
inputs=audio_input,
|
| 458 |
+
outputs=[audio_output, transcript, assessment_html]
|
| 459 |
+
)
|
| 460 |
+
|
| 461 |
+
|
| 462 |
+
def main(share=True):
|
| 463 |
+
"""Main function to launch the app"""
|
| 464 |
+
app.launch(share=share)
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
# Entry point for the application
|
| 468 |
+
if __name__ == "__main__":
|
| 469 |
+
main()
|
app_ui.py
ADDED
|
@@ -0,0 +1,469 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
CASL Voice Bot - Speech Pathology Assistant
|
| 5 |
+
Unified UI application for both local deployment and Hugging Face Spaces
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import asyncio
|
| 10 |
+
import gradio as gr
|
| 11 |
+
import logging
|
| 12 |
+
import tempfile
|
| 13 |
+
import queue
|
| 14 |
+
import threading
|
| 15 |
+
import time
|
| 16 |
+
from dotenv import load_dotenv
|
| 17 |
+
from openai import AsyncOpenAI
|
| 18 |
+
|
| 19 |
+
# Load environment variables
|
| 20 |
+
load_dotenv()
|
| 21 |
+
|
| 22 |
+
# Set up logging
|
| 23 |
+
logging.basicConfig(level=logging.INFO)
|
| 24 |
+
logger = logging.getLogger(__name__)
|
| 25 |
+
|
| 26 |
+
# Initialize OpenAI client with API key from environment
|
| 27 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 28 |
+
|
| 29 |
+
# Speech Pathologist Agent Prompt
|
| 30 |
+
SPEECH_PATHOLOGIST_PROMPT = """
|
| 31 |
+
You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
|
| 32 |
+
Your are working with a student with speech impediments typically with ASD
|
| 33 |
+
You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
|
| 34 |
+
Each domain from the CASL-2 framework can be analyzed using the sample:
|
| 35 |
+
Lexical/Semantic Skills:
|
| 36 |
+
This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
|
| 37 |
+
Key Subtests:
|
| 38 |
+
Antonyms: Identifying words with opposite meanings.
|
| 39 |
+
Synonyms: Identifying words with similar meanings.
|
| 40 |
+
Idiomatic Language: Understanding and interpreting idioms and figurative language.
|
| 41 |
+
Evaluate vocabulary diversity (type-token ratio).
|
| 42 |
+
Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
|
| 43 |
+
Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
|
| 44 |
+
Syntactic Skills:
|
| 45 |
+
This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
|
| 46 |
+
Key Subtests:
|
| 47 |
+
Sentence Expression: Producing grammatically correct sentences based on prompts.
|
| 48 |
+
Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
|
| 49 |
+
Examine sentence structure for grammatical accuracy.
|
| 50 |
+
Identify errors in verb tense, subject-verb agreement, or sentence complexity.
|
| 51 |
+
Note the use of clauses, conjunctions, and varied sentence types.
|
| 52 |
+
Supralinguistic Skills:
|
| 53 |
+
This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
|
| 54 |
+
Key Subtests:
|
| 55 |
+
Inferences: Understanding information that is not explicitly stated.
|
| 56 |
+
Meaning from Context: Deriving meaning from surrounding text or dialogue.
|
| 57 |
+
Nonliteral Language: Interpreting figurative language, such as metaphors or irony
|
| 58 |
+
Look for use or understanding of figurative language, idioms, or humor.
|
| 59 |
+
Assess ability to handle ambiguous or implied meanings in context.
|
| 60 |
+
Identify advanced language use for abstract or hypothetical ideas.
|
| 61 |
+
Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
|
| 62 |
+
This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
|
| 63 |
+
Key Subtests:
|
| 64 |
+
Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
|
| 65 |
+
|
| 66 |
+
Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
|
| 67 |
+
"""
|
| 68 |
+
|
| 69 |
+
# Custom audio processing for Gradio interface
|
| 70 |
+
class AudioProcessor:
|
| 71 |
+
def __init__(self):
|
| 72 |
+
self.transcript = []
|
| 73 |
+
self.is_active = False
|
| 74 |
+
self.voice_model = "shimmer" # Default voice
|
| 75 |
+
self.notes = []
|
| 76 |
+
self.current_assessment = {
|
| 77 |
+
"lexical_semantic": 0,
|
| 78 |
+
"syntactic": 0,
|
| 79 |
+
"supralinguistic": 0,
|
| 80 |
+
"pragmatic": 0
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
async def process_speech(self, audio_data, openai_client):
|
| 84 |
+
"""Process speech using OpenAI's API"""
|
| 85 |
+
if not self.is_active or audio_data is None:
|
| 86 |
+
return None, "\n".join(self.transcript), self.get_assessment_html()
|
| 87 |
+
|
| 88 |
+
# Prepare audio file for OpenAI
|
| 89 |
+
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
|
| 90 |
+
temp_file.close()
|
| 91 |
+
|
| 92 |
+
try:
|
| 93 |
+
# Save audio data to temporary file
|
| 94 |
+
sample_rate, audio_array = audio_data
|
| 95 |
+
import scipy.io.wavfile
|
| 96 |
+
scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
|
| 97 |
+
|
| 98 |
+
# Transcribe audio using OpenAI
|
| 99 |
+
with open(temp_file.name, "rb") as audio_file:
|
| 100 |
+
transcript_response = await openai_client.audio.transcriptions.create(
|
| 101 |
+
file=audio_file,
|
| 102 |
+
model="whisper-1"
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
user_text = transcript_response.text
|
| 106 |
+
if user_text.strip():
|
| 107 |
+
self.transcript.append(f"Student: {user_text}")
|
| 108 |
+
|
| 109 |
+
# Analyze speech for CASL-2 categories
|
| 110 |
+
self.analyze_speech(user_text)
|
| 111 |
+
|
| 112 |
+
# Generate assistant response
|
| 113 |
+
chat_response = await openai_client.chat.completions.create(
|
| 114 |
+
model="gpt-4o",
|
| 115 |
+
messages=[
|
| 116 |
+
{"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
|
| 117 |
+
{"role": "user", "content": user_text}
|
| 118 |
+
]
|
| 119 |
+
)
|
| 120 |
+
|
| 121 |
+
assistant_text = chat_response.choices[0].message.content
|
| 122 |
+
self.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 123 |
+
|
| 124 |
+
# Generate speech from text
|
| 125 |
+
speech_response = await openai_client.audio.speech.create(
|
| 126 |
+
model="tts-1",
|
| 127 |
+
voice=self.voice_model,
|
| 128 |
+
input=assistant_text
|
| 129 |
+
)
|
| 130 |
+
|
| 131 |
+
# Save speech to temporary file
|
| 132 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 133 |
+
response_temp_file.close()
|
| 134 |
+
|
| 135 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 136 |
+
|
| 137 |
+
# Load audio data for Gradio
|
| 138 |
+
import soundfile as sf
|
| 139 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 140 |
+
|
| 141 |
+
# Clean up
|
| 142 |
+
os.unlink(response_temp_file.name)
|
| 143 |
+
|
| 144 |
+
return (sample_rate, audio_data), "\n".join(self.transcript), self.get_assessment_html()
|
| 145 |
+
|
| 146 |
+
except Exception as e:
|
| 147 |
+
logger.error(f"Error processing speech: {e}")
|
| 148 |
+
self.transcript.append(f"Error: {str(e)}")
|
| 149 |
+
finally:
|
| 150 |
+
# Clean up temp file
|
| 151 |
+
os.unlink(temp_file.name)
|
| 152 |
+
|
| 153 |
+
return None, "\n".join(self.transcript), self.get_assessment_html()
|
| 154 |
+
|
| 155 |
+
def analyze_speech(self, text):
|
| 156 |
+
"""Analyze speech for CASL-2 categories"""
|
| 157 |
+
# Simple heuristic analysis - in a real app, this would use more sophisticated NLP
|
| 158 |
+
|
| 159 |
+
# Lexical/Semantic: check vocabulary diversity
|
| 160 |
+
words = text.lower().split()
|
| 161 |
+
unique_words = set(words)
|
| 162 |
+
if len(unique_words) / max(1, len(words)) > 0.7:
|
| 163 |
+
self.current_assessment["lexical_semantic"] += 1
|
| 164 |
+
self.notes.append("Lexical/Semantic: Good vocabulary diversity")
|
| 165 |
+
|
| 166 |
+
# Syntactic: check for sentence complexity
|
| 167 |
+
sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
|
| 168 |
+
avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
|
| 169 |
+
if avg_words > 7:
|
| 170 |
+
self.current_assessment["syntactic"] += 1
|
| 171 |
+
self.notes.append("Syntactic: Complex sentence structures used")
|
| 172 |
+
|
| 173 |
+
# Supralinguistic: check for figurative language (very basic check)
|
| 174 |
+
figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
|
| 175 |
+
if any(marker in text.lower() for marker in figurative_markers):
|
| 176 |
+
self.current_assessment["supralinguistic"] += 1
|
| 177 |
+
self.notes.append("Supralinguistic: Potential figurative language detected")
|
| 178 |
+
|
| 179 |
+
# Pragmatic: basic check for conversational elements
|
| 180 |
+
pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
|
| 181 |
+
if any(marker in text.lower() for marker in pragmatic_markers):
|
| 182 |
+
self.current_assessment["pragmatic"] += 1
|
| 183 |
+
self.notes.append("Pragmatic: Appropriate social language detected")
|
| 184 |
+
|
| 185 |
+
def get_assessment_html(self):
|
| 186 |
+
"""Get HTML representation of the current assessment"""
|
| 187 |
+
html = """
|
| 188 |
+
<div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
|
| 189 |
+
<h3>CASL-2 Assessment Progress</h3>
|
| 190 |
+
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
|
| 191 |
+
"""
|
| 192 |
+
|
| 193 |
+
for category, value in self.current_assessment.items():
|
| 194 |
+
category_name = category.replace('_', ' ').title()
|
| 195 |
+
progress_width = min(100, value * 10)
|
| 196 |
+
html += f"""
|
| 197 |
+
<div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
|
| 198 |
+
<div><strong>{category_name}</strong></div>
|
| 199 |
+
<div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
|
| 200 |
+
<div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
|
| 201 |
+
</div>
|
| 202 |
+
<div style="margin-top: 5px;">{value} points</div>
|
| 203 |
+
</div>
|
| 204 |
+
"""
|
| 205 |
+
|
| 206 |
+
html += """
|
| 207 |
+
</div>
|
| 208 |
+
"""
|
| 209 |
+
|
| 210 |
+
# Add recent notes
|
| 211 |
+
if self.notes:
|
| 212 |
+
html += """
|
| 213 |
+
<div style="margin-top: 15px;">
|
| 214 |
+
<h4>Recent Observations</h4>
|
| 215 |
+
<ul style="margin-top: 5px;">
|
| 216 |
+
"""
|
| 217 |
+
|
| 218 |
+
for note in self.notes[-5:]:
|
| 219 |
+
html += f"<li>{note}</li>"
|
| 220 |
+
|
| 221 |
+
html += """
|
| 222 |
+
</ul>
|
| 223 |
+
</div>
|
| 224 |
+
"""
|
| 225 |
+
|
| 226 |
+
html += "</div>"
|
| 227 |
+
return html
|
| 228 |
+
|
| 229 |
+
def start_session(self, voice_model, student_id):
|
| 230 |
+
"""Start a new session"""
|
| 231 |
+
self.is_active = True
|
| 232 |
+
self.voice_model = voice_model if voice_model else "shimmer"
|
| 233 |
+
self.transcript = []
|
| 234 |
+
self.notes = []
|
| 235 |
+
self.current_assessment = {
|
| 236 |
+
"lexical_semantic": 0,
|
| 237 |
+
"syntactic": 0,
|
| 238 |
+
"supralinguistic": 0,
|
| 239 |
+
"pragmatic": 0
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
student_info = f" for {student_id}" if student_id else ""
|
| 243 |
+
self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
|
| 244 |
+
return "Session active. Please wait for the AI to introduce itself."
|
| 245 |
+
|
| 246 |
+
def stop_session(self):
|
| 247 |
+
"""Stop the current session"""
|
| 248 |
+
self.is_active = False
|
| 249 |
+
self.transcript.append("Session ended.")
|
| 250 |
+
return "Session stopped."
|
| 251 |
+
|
| 252 |
+
def add_note(self, note):
|
| 253 |
+
"""Add a custom note"""
|
| 254 |
+
if note.strip():
|
| 255 |
+
self.notes.append(note)
|
| 256 |
+
return f"Note added: {note}"
|
| 257 |
+
return "Note was empty, not added."
|
| 258 |
+
|
| 259 |
+
|
| 260 |
+
# Create the audio processor instance
|
| 261 |
+
audio_processor = AudioProcessor()
|
| 262 |
+
|
| 263 |
+
|
| 264 |
+
async def start_session(voice_model, student_id):
|
| 265 |
+
"""Start the speech pathology session"""
|
| 266 |
+
status = audio_processor.start_session(voice_model, student_id)
|
| 267 |
+
|
| 268 |
+
# Generate initial AI introduction
|
| 269 |
+
try:
|
| 270 |
+
# Generate assistant response
|
| 271 |
+
chat_response = await openai_client.chat.completions.create(
|
| 272 |
+
model="gpt-4o",
|
| 273 |
+
messages=[
|
| 274 |
+
{"role": "system", "content": SPEECH_PATHOLOGIST_PROMPT},
|
| 275 |
+
{"role": "user", "content": "Hello"} # Initial trigger
|
| 276 |
+
]
|
| 277 |
+
)
|
| 278 |
+
|
| 279 |
+
assistant_text = chat_response.choices[0].message.content
|
| 280 |
+
audio_processor.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 281 |
+
|
| 282 |
+
# Generate speech from text
|
| 283 |
+
speech_response = await openai_client.audio.speech.create(
|
| 284 |
+
model="tts-1",
|
| 285 |
+
voice=audio_processor.voice_model,
|
| 286 |
+
input=assistant_text
|
| 287 |
+
)
|
| 288 |
+
|
| 289 |
+
# Save speech to temporary file
|
| 290 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 291 |
+
response_temp_file.close()
|
| 292 |
+
|
| 293 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 294 |
+
|
| 295 |
+
# Load audio data for Gradio
|
| 296 |
+
import soundfile as sf
|
| 297 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 298 |
+
|
| 299 |
+
# Clean up
|
| 300 |
+
os.unlink(response_temp_file.name)
|
| 301 |
+
|
| 302 |
+
return status, (sample_rate, audio_data), "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 303 |
+
|
| 304 |
+
except Exception as e:
|
| 305 |
+
logger.error(f"Error starting session: {e}")
|
| 306 |
+
audio_processor.transcript.append(f"Error: {str(e)}")
|
| 307 |
+
return status, None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 308 |
+
|
| 309 |
+
|
| 310 |
+
def stop_session():
|
| 311 |
+
"""Stop the speech pathology session"""
|
| 312 |
+
return audio_processor.stop_session(), None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 313 |
+
|
| 314 |
+
|
| 315 |
+
async def process_mic_input(audio, progress=gr.Progress()):
|
| 316 |
+
"""Process microphone input"""
|
| 317 |
+
if audio is None or not audio_processor.is_active:
|
| 318 |
+
return None, "\n".join(audio_processor.transcript), audio_processor.get_assessment_html()
|
| 319 |
+
|
| 320 |
+
progress(0, desc="Processing speech...")
|
| 321 |
+
audio_output, transcript, assessment = await audio_processor.process_speech(audio, openai_client)
|
| 322 |
+
progress(1, desc="Done")
|
| 323 |
+
return audio_output, transcript, assessment
|
| 324 |
+
|
| 325 |
+
|
| 326 |
+
def add_note(note):
|
| 327 |
+
"""Add a note to the session"""
|
| 328 |
+
result = audio_processor.add_note(note)
|
| 329 |
+
return "", result, audio_processor.get_assessment_html()
|
| 330 |
+
|
| 331 |
+
|
| 332 |
+
def save_session(student_id):
|
| 333 |
+
"""Save session to file"""
|
| 334 |
+
if not student_id:
|
| 335 |
+
student_id = "anonymous"
|
| 336 |
+
|
| 337 |
+
# Create session data directory if it doesn't exist
|
| 338 |
+
os.makedirs("session_data", exist_ok=True)
|
| 339 |
+
|
| 340 |
+
# Save transcript
|
| 341 |
+
timestamp = time.strftime("%Y%m%d-%H%M%S")
|
| 342 |
+
filename = f"session_data/{student_id}_{timestamp}.txt"
|
| 343 |
+
|
| 344 |
+
with open(filename, "w") as f:
|
| 345 |
+
f.write("\n".join(audio_processor.transcript))
|
| 346 |
+
f.write("\n\n--- ASSESSMENT NOTES ---\n")
|
| 347 |
+
for note in audio_processor.notes:
|
| 348 |
+
f.write(f"- {note}\n")
|
| 349 |
+
f.write("\n--- CASL-2 SCORES ---\n")
|
| 350 |
+
for category, score in audio_processor.current_assessment.items():
|
| 351 |
+
f.write(f"{category.replace('_', ' ').title()}: {score}\n")
|
| 352 |
+
|
| 353 |
+
return f"Session saved to {filename}"
|
| 354 |
+
|
| 355 |
+
|
| 356 |
+
# Create Gradio Interface
|
| 357 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 358 |
+
gr.Markdown("# CASL-2 Speech Pathology Assistant")
|
| 359 |
+
gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
|
| 360 |
+
|
| 361 |
+
with gr.Row():
|
| 362 |
+
with gr.Column(scale=1):
|
| 363 |
+
student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
|
| 364 |
+
voice_select = gr.Dropdown(
|
| 365 |
+
["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
| 366 |
+
value="shimmer",
|
| 367 |
+
label="Assistant Voice"
|
| 368 |
+
)
|
| 369 |
+
start_button = gr.Button("Start Session", variant="primary")
|
| 370 |
+
stop_button = gr.Button("Stop Session", variant="stop")
|
| 371 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 372 |
+
|
| 373 |
+
with gr.Accordion("SLP Tools", open=True):
|
| 374 |
+
note_input = gr.Textbox(
|
| 375 |
+
label="Add Assessment Note",
|
| 376 |
+
placeholder="Enter observation or assessment note here..."
|
| 377 |
+
)
|
| 378 |
+
note_button = gr.Button("Add Note")
|
| 379 |
+
note_status = gr.Textbox(label="Note Status")
|
| 380 |
+
save_button = gr.Button("Save Session")
|
| 381 |
+
save_status = gr.Textbox(label="Save Status")
|
| 382 |
+
|
| 383 |
+
with gr.Column(scale=2):
|
| 384 |
+
audio_output = gr.Audio(label="AI Speech", autoplay=True)
|
| 385 |
+
audio_input = gr.Audio(
|
| 386 |
+
label="Speak to the AI",
|
| 387 |
+
type="microphone",
|
| 388 |
+
source="microphone",
|
| 389 |
+
streaming=True
|
| 390 |
+
)
|
| 391 |
+
|
| 392 |
+
with gr.Row():
|
| 393 |
+
with gr.Column(scale=1):
|
| 394 |
+
assessment_html = gr.HTML(label="Assessment Progress")
|
| 395 |
+
with gr.Column(scale=1):
|
| 396 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 397 |
+
|
| 398 |
+
with gr.Accordion("About This Application", open=False):
|
| 399 |
+
gr.Markdown("""
|
| 400 |
+
### About CASL-2 Speech Pathology Assistant
|
| 401 |
+
|
| 402 |
+
This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
|
| 403 |
+
|
| 404 |
+
- **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
|
| 405 |
+
- **Syntactic Skills**: Grammar and sentence structure
|
| 406 |
+
- **Supralinguistic Skills**: Higher-level language beyond literal meanings
|
| 407 |
+
- **Pragmatic Skills**: Social use of language (less emphasis for younger students)
|
| 408 |
+
|
| 409 |
+
The AI will provide structured assessments and exercises to help evaluate speech patterns.
|
| 410 |
+
|
| 411 |
+
### How to Use
|
| 412 |
+
|
| 413 |
+
1. Optionally enter a Student ID to track sessions
|
| 414 |
+
2. Select the AI voice you prefer
|
| 415 |
+
3. Click "Start Session" to begin
|
| 416 |
+
4. The AI will introduce itself and begin the assessment
|
| 417 |
+
5. Speak into your microphone when it's your turn
|
| 418 |
+
6. View the transcript to track the conversation
|
| 419 |
+
7. SLPs can add assessment notes as needed
|
| 420 |
+
8. Save the session when finished
|
| 421 |
+
9. Click "Stop Session" when done
|
| 422 |
+
|
| 423 |
+
### For Speech-Language Pathologists
|
| 424 |
+
|
| 425 |
+
This tool is designed to supplement, not replace, professional SLP services. SLPs can:
|
| 426 |
+
|
| 427 |
+
- Add custom notes during the session
|
| 428 |
+
- Save session data for later reference
|
| 429 |
+
- Track progress across multiple sessions
|
| 430 |
+
- Use the AI as a consistent assessment tool
|
| 431 |
+
""")
|
| 432 |
+
|
| 433 |
+
# Setup event handlers
|
| 434 |
+
start_button.click(
|
| 435 |
+
fn=lambda voice, student: asyncio.run(start_session(voice, student)),
|
| 436 |
+
inputs=[voice_select, student_id],
|
| 437 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 438 |
+
)
|
| 439 |
+
stop_button.click(
|
| 440 |
+
fn=stop_session,
|
| 441 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 442 |
+
)
|
| 443 |
+
note_button.click(
|
| 444 |
+
fn=add_note,
|
| 445 |
+
inputs=note_input,
|
| 446 |
+
outputs=[note_input, note_status, assessment_html]
|
| 447 |
+
)
|
| 448 |
+
save_button.click(
|
| 449 |
+
fn=save_session,
|
| 450 |
+
inputs=student_id,
|
| 451 |
+
outputs=save_status
|
| 452 |
+
)
|
| 453 |
+
|
| 454 |
+
# Setup audio processing
|
| 455 |
+
audio_input.stream(
|
| 456 |
+
fn=lambda audio: asyncio.run(process_mic_input(audio)),
|
| 457 |
+
inputs=audio_input,
|
| 458 |
+
outputs=[audio_output, transcript, assessment_html]
|
| 459 |
+
)
|
| 460 |
+
|
| 461 |
+
|
| 462 |
+
def main(share=True):
|
| 463 |
+
"""Main function to launch the app"""
|
| 464 |
+
app.launch(share=share)
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
# Entry point for the application
|
| 468 |
+
if __name__ == "__main__":
|
| 469 |
+
main()
|
huggingface_requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
python-dotenv>=1.0.0
|
| 2 |
+
openai>=1.3.0
|
| 3 |
+
gradio>=4.0.0
|
| 4 |
+
soundfile>=0.12.1
|
| 5 |
+
scipy>=1.10.0
|
| 6 |
+
asyncio>=3.4.3
|
| 7 |
+
numpy>=1.24.0
|
implementations/common/__init__.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Common utilities for CASL Voice Bot
|
| 3 |
+
"""
|
implementations/common/casl_utils.py
ADDED
|
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Common utilities for CASL Voice Bot implementations
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import time
|
| 9 |
+
|
| 10 |
+
class CASLAssessment:
|
| 11 |
+
"""CASL-2 assessment tracker"""
|
| 12 |
+
|
| 13 |
+
def __init__(self):
|
| 14 |
+
"""Initialize the assessment tracker"""
|
| 15 |
+
self.notes = []
|
| 16 |
+
self.current_assessment = {
|
| 17 |
+
"lexical_semantic": 0,
|
| 18 |
+
"syntactic": 0,
|
| 19 |
+
"supralinguistic": 0,
|
| 20 |
+
"pragmatic": 0
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
def analyze_speech(self, text):
|
| 24 |
+
"""Analyze speech for CASL-2 categories"""
|
| 25 |
+
# Simple heuristic analysis - in a real app, this would use more sophisticated NLP
|
| 26 |
+
|
| 27 |
+
# Lexical/Semantic: check vocabulary diversity
|
| 28 |
+
words = text.lower().split()
|
| 29 |
+
unique_words = set(words)
|
| 30 |
+
if len(unique_words) / max(1, len(words)) > 0.7:
|
| 31 |
+
self.current_assessment["lexical_semantic"] += 1
|
| 32 |
+
self.notes.append("Lexical/Semantic: Good vocabulary diversity")
|
| 33 |
+
|
| 34 |
+
# Syntactic: check for sentence complexity
|
| 35 |
+
sentences = [s.strip() for s in text.replace("!", ".").replace("?", ".").split(".") if s.strip()]
|
| 36 |
+
avg_words = sum(len(s.split()) for s in sentences) / max(1, len(sentences))
|
| 37 |
+
if avg_words > 7:
|
| 38 |
+
self.current_assessment["syntactic"] += 1
|
| 39 |
+
self.notes.append("Syntactic: Complex sentence structures used")
|
| 40 |
+
|
| 41 |
+
# Supralinguistic: check for figurative language (very basic check)
|
| 42 |
+
figurative_markers = ["like", "as", "than", "seems", "appears", "metaphor", "imagine"]
|
| 43 |
+
if any(marker in text.lower() for marker in figurative_markers):
|
| 44 |
+
self.current_assessment["supralinguistic"] += 1
|
| 45 |
+
self.notes.append("Supralinguistic: Potential figurative language detected")
|
| 46 |
+
|
| 47 |
+
# Pragmatic: basic check for conversational elements
|
| 48 |
+
pragmatic_markers = ["hello", "hi", "thanks", "thank you", "please", "excuse me", "sorry"]
|
| 49 |
+
if any(marker in text.lower() for marker in pragmatic_markers):
|
| 50 |
+
self.current_assessment["pragmatic"] += 1
|
| 51 |
+
self.notes.append("Pragmatic: Appropriate social language detected")
|
| 52 |
+
|
| 53 |
+
def get_assessment_html(self):
|
| 54 |
+
"""Get HTML representation of the current assessment"""
|
| 55 |
+
html = """
|
| 56 |
+
<div style="padding: 15px; background-color: #f8f9fa; border-radius: 5px;">
|
| 57 |
+
<h3>CASL-2 Assessment Progress</h3>
|
| 58 |
+
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px;">
|
| 59 |
+
"""
|
| 60 |
+
|
| 61 |
+
for category, value in self.current_assessment.items():
|
| 62 |
+
category_name = category.replace('_', ' ').title()
|
| 63 |
+
progress_width = min(100, value * 10)
|
| 64 |
+
html += f"""
|
| 65 |
+
<div style="padding: 10px; background-color: white; border-radius: 5px; box-shadow: 0 1px 3px rgba(0,0,0,0.1);">
|
| 66 |
+
<div><strong>{category_name}</strong></div>
|
| 67 |
+
<div style="margin-top: 5px; height: 15px; background-color: #eee; border-radius: 7px; overflow: hidden;">
|
| 68 |
+
<div style="width: {progress_width}%; height: 100%; background-color: #4CAF50;"></div>
|
| 69 |
+
</div>
|
| 70 |
+
<div style="margin-top: 5px;">{value} points</div>
|
| 71 |
+
</div>
|
| 72 |
+
"""
|
| 73 |
+
|
| 74 |
+
html += """
|
| 75 |
+
</div>
|
| 76 |
+
"""
|
| 77 |
+
|
| 78 |
+
# Add recent notes
|
| 79 |
+
if self.notes:
|
| 80 |
+
html += """
|
| 81 |
+
<div style="margin-top: 15px;">
|
| 82 |
+
<h4>Recent Observations</h4>
|
| 83 |
+
<ul style="margin-top: 5px;">
|
| 84 |
+
"""
|
| 85 |
+
|
| 86 |
+
for note in self.notes[-5:]:
|
| 87 |
+
html += f"<li>{note}</li>"
|
| 88 |
+
|
| 89 |
+
html += """
|
| 90 |
+
</ul>
|
| 91 |
+
</div>
|
| 92 |
+
"""
|
| 93 |
+
|
| 94 |
+
html += "</div>"
|
| 95 |
+
return html
|
| 96 |
+
|
| 97 |
+
def add_note(self, note):
|
| 98 |
+
"""Add a custom note"""
|
| 99 |
+
if note.strip():
|
| 100 |
+
self.notes.append(note)
|
| 101 |
+
return f"Note added: {note}"
|
| 102 |
+
return "Note was empty, not added."
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
def save_session_data(transcript, assessment, student_id=None):
|
| 106 |
+
"""Save session data to a file"""
|
| 107 |
+
if not student_id:
|
| 108 |
+
student_id = "anonymous"
|
| 109 |
+
|
| 110 |
+
# Create session data directory if it doesn't exist
|
| 111 |
+
os.makedirs("session_data", exist_ok=True)
|
| 112 |
+
|
| 113 |
+
# Save transcript
|
| 114 |
+
timestamp = time.strftime("%Y%m%d-%H%M%S")
|
| 115 |
+
filename = f"session_data/{student_id}_{timestamp}.txt"
|
| 116 |
+
|
| 117 |
+
with open(filename, "w") as f:
|
| 118 |
+
f.write("\n".join(transcript))
|
| 119 |
+
f.write("\n\n--- ASSESSMENT NOTES ---\n")
|
| 120 |
+
for note in assessment.notes:
|
| 121 |
+
f.write(f"- {note}\n")
|
| 122 |
+
f.write("\n--- CASL-2 SCORES ---\n")
|
| 123 |
+
for category, score in assessment.current_assessment.items():
|
| 124 |
+
f.write(f"{category.replace('_', ' ').title()}: {score}\n")
|
| 125 |
+
|
| 126 |
+
return f"Session saved to {filename}"
|
| 127 |
+
|
| 128 |
+
|
| 129 |
+
# Common prompt used by all implementations
|
| 130 |
+
CASL_PROMPT = """
|
| 131 |
+
You are a speech pathologist, a healthcare professional who specializes in evaluating, diagnosing, and treating communication disorders, including speech, language, cognitive-communication, voice, and fluency disorders. Your role is to help patients improve their speech and communication skills through various therapeutic techniques and exercises.
|
| 132 |
+
Your are working with a student with speech impediments typically with ASD
|
| 133 |
+
You have to be rigid to help them stay on the right track. YOu have to start with some sort of intro activity and can not rely on teh student at all to complete your thoughts. You pick a place to start and assess teh speech from there.
|
| 134 |
+
Each domain from the CASL-2 framework can be analyzed using the sample:
|
| 135 |
+
Lexical/Semantic Skills:
|
| 136 |
+
This category focuses on vocabulary knowledge, word meanings, and the ability to use words contextually. It measures both receptive and expressive language abilities related to word use.
|
| 137 |
+
Key Subtests:
|
| 138 |
+
Antonyms: Identifying words with opposite meanings.
|
| 139 |
+
Synonyms: Identifying words with similar meanings.
|
| 140 |
+
Idiomatic Language: Understanding and interpreting idioms and figurative language.
|
| 141 |
+
Evaluate vocabulary diversity (type-token ratio).
|
| 142 |
+
Note word-finding difficulties, incorrect word choices, or over-reliance on fillers (e.g., "like," "stuff").
|
| 143 |
+
Assess use of specific vs. vague language (e.g., "car" vs. "sedan").
|
| 144 |
+
Syntactic Skills:
|
| 145 |
+
This category evaluates understanding and use of grammar and sentence structure. It focuses on the ability to comprehend and produce grammatically correct sentences.
|
| 146 |
+
Key Subtests:
|
| 147 |
+
Sentence Expression: Producing grammatically correct sentences based on prompts.
|
| 148 |
+
Grammaticality Judgment: Identifying whether a sentence is grammatically correct.
|
| 149 |
+
Examine sentence structure for grammatical accuracy.
|
| 150 |
+
Identify errors in verb tense, subject-verb agreement, or sentence complexity.
|
| 151 |
+
Note the use of clauses, conjunctions, and varied sentence types.
|
| 152 |
+
Supralinguistic Skills:
|
| 153 |
+
This subcategory assesses higher-level language skills that go beyond literal meanings, such as understanding implied meanings, sarcasm, and complex verbal reasoning.
|
| 154 |
+
Key Subtests:
|
| 155 |
+
Inferences: Understanding information that is not explicitly stated.
|
| 156 |
+
Meaning from Context: Deriving meaning from surrounding text or dialogue.
|
| 157 |
+
Nonliteral Language: Interpreting figurative language, such as metaphors or irony
|
| 158 |
+
Look for use or understanding of figurative language, idioms, or humor.
|
| 159 |
+
Assess ability to handle ambiguous or implied meanings in context.
|
| 160 |
+
Identify advanced language use for abstract or hypothetical ideas.
|
| 161 |
+
Pragmatic Skills(focus less on this as it is not typically necessary for the age range you will be dealing with):
|
| 162 |
+
This category measures the ability to use language effectively in social contexts. It evaluates understanding of conversational rules, turn-taking, and adapting communication to different social situations.
|
| 163 |
+
Key Subtests:
|
| 164 |
+
Pragmatic Language Test: Assessing appropriateness of responses in social scenarios.
|
| 165 |
+
|
| 166 |
+
Begin by introducing yourself as a speech pathologist and start with a simple vocabulary assessment activity. Be encouraging but structured in your approach.
|
| 167 |
+
"""
|
implementations/direct/__init__.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Direct OpenAI API implementation of CASL Voice Bot
|
| 3 |
+
"""
|
implementations/direct/app.py
ADDED
|
@@ -0,0 +1,334 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
CASL Voice Bot - Speech Pathology Assistant
|
| 5 |
+
Direct OpenAI API implementation (no LiveKit)
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import asyncio
|
| 10 |
+
import gradio as gr
|
| 11 |
+
import logging
|
| 12 |
+
import sys
|
| 13 |
+
import tempfile
|
| 14 |
+
import time
|
| 15 |
+
from dotenv import load_dotenv
|
| 16 |
+
from openai import AsyncOpenAI
|
| 17 |
+
|
| 18 |
+
# Add parent directory to path to import common utilities
|
| 19 |
+
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
|
| 20 |
+
from implementations.common.casl_utils import CASLAssessment, save_session_data, CASL_PROMPT
|
| 21 |
+
|
| 22 |
+
# Load environment variables
|
| 23 |
+
load_dotenv()
|
| 24 |
+
|
| 25 |
+
# Set up logging
|
| 26 |
+
logging.basicConfig(level=logging.INFO)
|
| 27 |
+
logger = logging.getLogger(__name__)
|
| 28 |
+
|
| 29 |
+
# Initialize OpenAI client
|
| 30 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
class SpeechPathologistAssistant:
|
| 34 |
+
"""Speech pathologist assistant using direct OpenAI API"""
|
| 35 |
+
|
| 36 |
+
def __init__(self):
|
| 37 |
+
self.transcript = []
|
| 38 |
+
self.is_running = False
|
| 39 |
+
self.assessment = CASLAssessment()
|
| 40 |
+
self.voice_model = "shimmer"
|
| 41 |
+
self.student_id = None
|
| 42 |
+
|
| 43 |
+
async def start_session(self, voice_model, student_id):
|
| 44 |
+
"""Start a new session"""
|
| 45 |
+
self.is_running = True
|
| 46 |
+
self.voice_model = voice_model if voice_model else "shimmer"
|
| 47 |
+
self.student_id = student_id
|
| 48 |
+
self.transcript = []
|
| 49 |
+
self.assessment = CASLAssessment()
|
| 50 |
+
|
| 51 |
+
# Add student info to transcript
|
| 52 |
+
student_info = f" for {student_id}" if student_id else ""
|
| 53 |
+
self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
|
| 54 |
+
|
| 55 |
+
# Generate initial AI message
|
| 56 |
+
initial_audio = await self.generate_initial_message()
|
| 57 |
+
|
| 58 |
+
return "Session active. The AI will introduce itself.", initial_audio, self.get_transcript(), self.assessment.get_assessment_html()
|
| 59 |
+
|
| 60 |
+
async def generate_initial_message(self):
|
| 61 |
+
"""Generate initial AI message"""
|
| 62 |
+
# Generate assistant response
|
| 63 |
+
chat_response = await openai_client.chat.completions.create(
|
| 64 |
+
model="gpt-4o",
|
| 65 |
+
messages=[
|
| 66 |
+
{"role": "system", "content": CASL_PROMPT},
|
| 67 |
+
{"role": "user", "content": "Hello"} # Initial trigger
|
| 68 |
+
]
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
assistant_text = chat_response.choices[0].message.content
|
| 72 |
+
self.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 73 |
+
|
| 74 |
+
# Generate speech from text
|
| 75 |
+
speech_response = await openai_client.audio.speech.create(
|
| 76 |
+
model="tts-1",
|
| 77 |
+
voice=self.voice_model,
|
| 78 |
+
input=assistant_text
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
# Save speech to temporary file
|
| 82 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 83 |
+
response_temp_file.close()
|
| 84 |
+
|
| 85 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 86 |
+
|
| 87 |
+
# Load audio data for Gradio
|
| 88 |
+
import soundfile as sf
|
| 89 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 90 |
+
|
| 91 |
+
# Clean up
|
| 92 |
+
os.unlink(response_temp_file.name)
|
| 93 |
+
|
| 94 |
+
return (sample_rate, audio_data)
|
| 95 |
+
|
| 96 |
+
def stop_session(self):
|
| 97 |
+
"""Stop the current session"""
|
| 98 |
+
self.is_running = False
|
| 99 |
+
self.transcript.append("Session ended.")
|
| 100 |
+
return "Session stopped.", None, self.get_transcript(), self.assessment.get_assessment_html()
|
| 101 |
+
|
| 102 |
+
async def process_audio(self, audio):
|
| 103 |
+
"""Process audio from Gradio interface"""
|
| 104 |
+
if not self.is_running or audio is None:
|
| 105 |
+
return None, self.get_transcript(), self.assessment.get_assessment_html()
|
| 106 |
+
|
| 107 |
+
# Prepare audio file for OpenAI
|
| 108 |
+
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
|
| 109 |
+
temp_file.close()
|
| 110 |
+
|
| 111 |
+
try:
|
| 112 |
+
# Save audio data to temporary file
|
| 113 |
+
sample_rate, audio_array = audio
|
| 114 |
+
import scipy.io.wavfile
|
| 115 |
+
scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
|
| 116 |
+
|
| 117 |
+
# Transcribe audio using OpenAI
|
| 118 |
+
with open(temp_file.name, "rb") as audio_file:
|
| 119 |
+
transcript_response = await openai_client.audio.transcriptions.create(
|
| 120 |
+
file=audio_file,
|
| 121 |
+
model="whisper-1"
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
user_text = transcript_response.text
|
| 125 |
+
if user_text.strip():
|
| 126 |
+
self.transcript.append(f"Student: {user_text}")
|
| 127 |
+
|
| 128 |
+
# Analyze speech for CASL-2 categories
|
| 129 |
+
self.assessment.analyze_speech(user_text)
|
| 130 |
+
|
| 131 |
+
# Generate assistant response
|
| 132 |
+
chat_response = await openai_client.chat.completions.create(
|
| 133 |
+
model="gpt-4o",
|
| 134 |
+
messages=[
|
| 135 |
+
{"role": "system", "content": CASL_PROMPT},
|
| 136 |
+
{"role": "user", "content": user_text}
|
| 137 |
+
]
|
| 138 |
+
)
|
| 139 |
+
|
| 140 |
+
assistant_text = chat_response.choices[0].message.content
|
| 141 |
+
self.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 142 |
+
|
| 143 |
+
# Generate speech from text
|
| 144 |
+
speech_response = await openai_client.audio.speech.create(
|
| 145 |
+
model="tts-1",
|
| 146 |
+
voice=self.voice_model,
|
| 147 |
+
input=assistant_text
|
| 148 |
+
)
|
| 149 |
+
|
| 150 |
+
# Save speech to temporary file
|
| 151 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 152 |
+
response_temp_file.close()
|
| 153 |
+
|
| 154 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 155 |
+
|
| 156 |
+
# Load audio data for Gradio
|
| 157 |
+
import soundfile as sf
|
| 158 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 159 |
+
|
| 160 |
+
# Clean up
|
| 161 |
+
os.unlink(response_temp_file.name)
|
| 162 |
+
|
| 163 |
+
return (sample_rate, audio_data), self.get_transcript(), self.assessment.get_assessment_html()
|
| 164 |
+
|
| 165 |
+
except Exception as e:
|
| 166 |
+
logger.error(f"Error processing audio: {e}")
|
| 167 |
+
self.transcript.append(f"Error: {str(e)}")
|
| 168 |
+
finally:
|
| 169 |
+
# Clean up temp file
|
| 170 |
+
os.unlink(temp_file.name)
|
| 171 |
+
|
| 172 |
+
return None, self.get_transcript(), self.assessment.get_assessment_html()
|
| 173 |
+
|
| 174 |
+
def get_transcript(self):
|
| 175 |
+
"""Get the current transcript"""
|
| 176 |
+
return "\n".join(self.transcript)
|
| 177 |
+
|
| 178 |
+
def add_note(self, note):
|
| 179 |
+
"""Add a custom note"""
|
| 180 |
+
result = self.assessment.add_note(note)
|
| 181 |
+
return "", result, self.assessment.get_assessment_html()
|
| 182 |
+
|
| 183 |
+
def save_session(self, student_id=None):
|
| 184 |
+
"""Save session to file"""
|
| 185 |
+
student_id = student_id or self.student_id
|
| 186 |
+
return save_session_data(self.transcript, self.assessment, student_id)
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
# Create the speech pathology assistant
|
| 190 |
+
speech_assistant = SpeechPathologistAssistant()
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
async def start_session(voice_model, student_id):
|
| 194 |
+
"""Start the speech pathology session"""
|
| 195 |
+
return await speech_assistant.start_session(voice_model, student_id)
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
def stop_session():
|
| 199 |
+
"""Stop the speech pathology session"""
|
| 200 |
+
return speech_assistant.stop_session()
|
| 201 |
+
|
| 202 |
+
|
| 203 |
+
async def process_mic_input(audio, progress=gr.Progress()):
|
| 204 |
+
"""Process microphone input"""
|
| 205 |
+
progress(0, desc="Processing speech...")
|
| 206 |
+
audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
|
| 207 |
+
progress(1, desc="Done")
|
| 208 |
+
return audio_output, transcript, assessment
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
def add_note(note):
|
| 212 |
+
"""Add a note to the session"""
|
| 213 |
+
return speech_assistant.add_note(note)
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
def save_session(student_id):
|
| 217 |
+
"""Save the current session"""
|
| 218 |
+
return speech_assistant.save_session(student_id)
|
| 219 |
+
|
| 220 |
+
|
| 221 |
+
# Create Gradio Interface
|
| 222 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 223 |
+
gr.Markdown("# CASL-2 Speech Pathology Assistant")
|
| 224 |
+
gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
|
| 225 |
+
|
| 226 |
+
with gr.Row():
|
| 227 |
+
with gr.Column(scale=1):
|
| 228 |
+
student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
|
| 229 |
+
voice_select = gr.Dropdown(
|
| 230 |
+
["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
| 231 |
+
value="shimmer",
|
| 232 |
+
label="Assistant Voice"
|
| 233 |
+
)
|
| 234 |
+
start_button = gr.Button("Start Session", variant="primary")
|
| 235 |
+
stop_button = gr.Button("Stop Session", variant="stop")
|
| 236 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 237 |
+
|
| 238 |
+
with gr.Accordion("SLP Tools", open=True):
|
| 239 |
+
note_input = gr.Textbox(
|
| 240 |
+
label="Add Assessment Note",
|
| 241 |
+
placeholder="Enter observation or assessment note here..."
|
| 242 |
+
)
|
| 243 |
+
note_button = gr.Button("Add Note")
|
| 244 |
+
note_status = gr.Textbox(label="Note Status")
|
| 245 |
+
save_button = gr.Button("Save Session")
|
| 246 |
+
save_status = gr.Textbox(label="Save Status")
|
| 247 |
+
|
| 248 |
+
with gr.Column(scale=2):
|
| 249 |
+
audio_output = gr.Audio(label="AI Speech", autoplay=True)
|
| 250 |
+
audio_input = gr.Audio(
|
| 251 |
+
label="Speak to the AI",
|
| 252 |
+
type="microphone",
|
| 253 |
+
source="microphone",
|
| 254 |
+
streaming=True
|
| 255 |
+
)
|
| 256 |
+
|
| 257 |
+
with gr.Row():
|
| 258 |
+
with gr.Column(scale=1):
|
| 259 |
+
assessment_html = gr.HTML(label="Assessment Progress")
|
| 260 |
+
with gr.Column(scale=1):
|
| 261 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 262 |
+
|
| 263 |
+
with gr.Accordion("About This Application", open=False):
|
| 264 |
+
gr.Markdown("""
|
| 265 |
+
### About CASL-2 Speech Pathology Assistant
|
| 266 |
+
|
| 267 |
+
This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
|
| 268 |
+
|
| 269 |
+
- **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
|
| 270 |
+
- **Syntactic Skills**: Grammar and sentence structure
|
| 271 |
+
- **Supralinguistic Skills**: Higher-level language beyond literal meanings
|
| 272 |
+
- **Pragmatic Skills**: Social use of language (less emphasis for younger students)
|
| 273 |
+
|
| 274 |
+
The AI will provide structured assessments and exercises to help evaluate speech patterns.
|
| 275 |
+
|
| 276 |
+
### How to Use
|
| 277 |
+
|
| 278 |
+
1. Optionally enter a Student ID to track sessions
|
| 279 |
+
2. Select the AI voice you prefer
|
| 280 |
+
3. Click "Start Session" to begin
|
| 281 |
+
4. The AI will introduce itself and begin the assessment
|
| 282 |
+
5. Speak into your microphone when it's your turn
|
| 283 |
+
6. View the transcript to track the conversation
|
| 284 |
+
7. SLPs can add notes throughout the session
|
| 285 |
+
8. Save the session when finished
|
| 286 |
+
9. Click "Stop Session" when done
|
| 287 |
+
|
| 288 |
+
### For Speech-Language Pathologists
|
| 289 |
+
|
| 290 |
+
This tool is designed to supplement, not replace, professional SLP services. SLPs can:
|
| 291 |
+
|
| 292 |
+
- Add custom notes during the session
|
| 293 |
+
- Save session data for later reference
|
| 294 |
+
- Track progress across multiple sessions
|
| 295 |
+
- Use the AI as a consistent assessment tool
|
| 296 |
+
""")
|
| 297 |
+
|
| 298 |
+
# Setup event handlers
|
| 299 |
+
start_button.click(
|
| 300 |
+
fn=lambda voice, student: asyncio.run(start_session(voice, student)),
|
| 301 |
+
inputs=[voice_select, student_id],
|
| 302 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 303 |
+
)
|
| 304 |
+
stop_button.click(
|
| 305 |
+
fn=stop_session,
|
| 306 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 307 |
+
)
|
| 308 |
+
note_button.click(
|
| 309 |
+
fn=add_note,
|
| 310 |
+
inputs=note_input,
|
| 311 |
+
outputs=[note_input, note_status, assessment_html]
|
| 312 |
+
)
|
| 313 |
+
save_button.click(
|
| 314 |
+
fn=save_session,
|
| 315 |
+
inputs=student_id,
|
| 316 |
+
outputs=save_status
|
| 317 |
+
)
|
| 318 |
+
|
| 319 |
+
# Setup audio processing
|
| 320 |
+
audio_input.stream(
|
| 321 |
+
fn=lambda audio: asyncio.run(process_mic_input(audio)),
|
| 322 |
+
inputs=audio_input,
|
| 323 |
+
outputs=[audio_output, transcript, assessment_html]
|
| 324 |
+
)
|
| 325 |
+
|
| 326 |
+
|
| 327 |
+
def main(share=True):
|
| 328 |
+
"""Main function to launch the app"""
|
| 329 |
+
app.launch(share=share)
|
| 330 |
+
|
| 331 |
+
|
| 332 |
+
# Entry point for the application
|
| 333 |
+
if __name__ == "__main__":
|
| 334 |
+
main()
|
implementations/livekit/__init__.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
LiveKit implementation of CASL Voice Bot
|
| 3 |
+
"""
|
implementations/livekit/app.py
ADDED
|
@@ -0,0 +1,329 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
CASL Voice Bot - Speech Pathology Assistant
|
| 5 |
+
Using LiveKit agents with OpenAI's real-time capabilities
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import os
|
| 9 |
+
import asyncio
|
| 10 |
+
import gradio as gr
|
| 11 |
+
import logging
|
| 12 |
+
import sys
|
| 13 |
+
import time
|
| 14 |
+
from dotenv import load_dotenv
|
| 15 |
+
from livekit import agents
|
| 16 |
+
from openai import AsyncOpenAI
|
| 17 |
+
|
| 18 |
+
# Add parent directory to path to import common utilities
|
| 19 |
+
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
|
| 20 |
+
from implementations.common.casl_utils import CASLAssessment, save_session_data, CASL_PROMPT
|
| 21 |
+
|
| 22 |
+
# Load environment variables
|
| 23 |
+
load_dotenv()
|
| 24 |
+
|
| 25 |
+
# Set up logging
|
| 26 |
+
logging.basicConfig(level=logging.INFO)
|
| 27 |
+
logger = logging.getLogger(__name__)
|
| 28 |
+
|
| 29 |
+
# Initialize OpenAI client
|
| 30 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 31 |
+
|
| 32 |
+
class GradioInputDevice(agents.InputDevice):
|
| 33 |
+
"""Custom input device that works with Gradio"""
|
| 34 |
+
|
| 35 |
+
def __init__(self):
|
| 36 |
+
super().__init__()
|
| 37 |
+
self.audio_queue = asyncio.Queue()
|
| 38 |
+
self.is_active = True
|
| 39 |
+
|
| 40 |
+
async def receive(self) -> agents.AudioChunk:
|
| 41 |
+
"""Receive audio data from the queue"""
|
| 42 |
+
try:
|
| 43 |
+
audio_data = await asyncio.wait_for(self.audio_queue.get(), timeout=0.1)
|
| 44 |
+
return audio_data
|
| 45 |
+
except asyncio.TimeoutError:
|
| 46 |
+
return None
|
| 47 |
+
|
| 48 |
+
async def add_audio(self, audio_data):
|
| 49 |
+
"""Add audio data to the queue"""
|
| 50 |
+
if audio_data is None:
|
| 51 |
+
return
|
| 52 |
+
|
| 53 |
+
# Convert gradio audio format to AudioChunk
|
| 54 |
+
sample_rate, audio_array = audio_data
|
| 55 |
+
audio_chunk = agents.AudioChunk(
|
| 56 |
+
samples=audio_array,
|
| 57 |
+
sample_rate=sample_rate,
|
| 58 |
+
is_last=False
|
| 59 |
+
)
|
| 60 |
+
await self.audio_queue.put(audio_chunk)
|
| 61 |
+
|
| 62 |
+
def stop(self):
|
| 63 |
+
"""Stop the input device"""
|
| 64 |
+
self.is_active = False
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
class GradioOutputDevice(agents.OutputDevice):
|
| 68 |
+
"""Custom output device that works with Gradio"""
|
| 69 |
+
|
| 70 |
+
def __init__(self):
|
| 71 |
+
super().__init__()
|
| 72 |
+
self.output_queue = asyncio.Queue()
|
| 73 |
+
|
| 74 |
+
async def transmit(self, audio_chunk: agents.AudioChunk) -> None:
|
| 75 |
+
"""Transmit audio chunk to the queue"""
|
| 76 |
+
if audio_chunk is not None:
|
| 77 |
+
await self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
|
| 78 |
+
|
| 79 |
+
async def get_latest_audio(self):
|
| 80 |
+
"""Get the latest audio from the queue"""
|
| 81 |
+
try:
|
| 82 |
+
return await asyncio.wait_for(self.output_queue.get(), timeout=0.1)
|
| 83 |
+
except asyncio.TimeoutError:
|
| 84 |
+
return None
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
class SpeechPathologistAssistant:
|
| 88 |
+
"""Speech pathologist assistant using LiveKit agents"""
|
| 89 |
+
|
| 90 |
+
def __init__(self):
|
| 91 |
+
self.input_device = GradioInputDevice()
|
| 92 |
+
self.output_device = GradioOutputDevice()
|
| 93 |
+
self.assistant = None
|
| 94 |
+
self.assistant_task = None
|
| 95 |
+
self.transcript = []
|
| 96 |
+
self.is_running = False
|
| 97 |
+
self.assessment = CASLAssessment()
|
| 98 |
+
|
| 99 |
+
async def initialize_assistant(self, voice="shimmer"):
|
| 100 |
+
"""Initialize the speech assistant"""
|
| 101 |
+
self.assistant = agents.VoiceAssistant(
|
| 102 |
+
openai_client=openai_client,
|
| 103 |
+
model="gpt-4o",
|
| 104 |
+
voice=voice,
|
| 105 |
+
input_device=self.input_device,
|
| 106 |
+
output_device=self.output_device,
|
| 107 |
+
initial_message=CASL_PROMPT,
|
| 108 |
+
real_time=True, # Enable real-time processing
|
| 109 |
+
)
|
| 110 |
+
|
| 111 |
+
# Add transcript and response callbacks
|
| 112 |
+
self.assistant.on_transcript = self.on_transcript
|
| 113 |
+
self.assistant.on_response = self.on_response
|
| 114 |
+
|
| 115 |
+
def on_transcript(self, transcript):
|
| 116 |
+
"""Handle transcript from user"""
|
| 117 |
+
self.transcript.append(f"Student: {transcript.text}")
|
| 118 |
+
|
| 119 |
+
# Basic analysis of speech for CASL-2 categories
|
| 120 |
+
self.assessment.analyze_speech(transcript.text)
|
| 121 |
+
|
| 122 |
+
return True
|
| 123 |
+
|
| 124 |
+
def on_response(self, response):
|
| 125 |
+
"""Handle response from assistant"""
|
| 126 |
+
self.transcript.append(f"Speech Pathologist: {response.text}")
|
| 127 |
+
return True
|
| 128 |
+
|
| 129 |
+
async def start_assistant(self, voice_model, student_id):
|
| 130 |
+
"""Start the assistant in a background task"""
|
| 131 |
+
await self.initialize_assistant(voice_model)
|
| 132 |
+
|
| 133 |
+
self.is_running = True
|
| 134 |
+
|
| 135 |
+
# Add student info to transcript
|
| 136 |
+
student_info = f" for {student_id}" if student_id else ""
|
| 137 |
+
self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
|
| 138 |
+
|
| 139 |
+
# Run the assistant in a background task
|
| 140 |
+
self.assistant_task = asyncio.create_task(self.assistant.run())
|
| 141 |
+
|
| 142 |
+
return "Session active. The AI will introduce itself."
|
| 143 |
+
|
| 144 |
+
def stop_assistant(self):
|
| 145 |
+
"""Stop the assistant"""
|
| 146 |
+
if self.assistant_task and not self.assistant_task.done():
|
| 147 |
+
self.assistant_task.cancel()
|
| 148 |
+
|
| 149 |
+
self.input_device.stop()
|
| 150 |
+
self.is_running = False
|
| 151 |
+
|
| 152 |
+
# Add ending to transcript
|
| 153 |
+
self.transcript.append("Session ended.")
|
| 154 |
+
return "Session stopped."
|
| 155 |
+
|
| 156 |
+
async def process_audio(self, audio):
|
| 157 |
+
"""Process audio from Gradio interface"""
|
| 158 |
+
if not self.is_running or audio is None:
|
| 159 |
+
return None, self.get_transcript(), self.assessment.get_assessment_html()
|
| 160 |
+
|
| 161 |
+
# Add audio to input device
|
| 162 |
+
await self.input_device.add_audio(audio)
|
| 163 |
+
|
| 164 |
+
# Check for assistant output
|
| 165 |
+
output_audio = await self.output_device.get_latest_audio()
|
| 166 |
+
|
| 167 |
+
return output_audio, self.get_transcript(), self.assessment.get_assessment_html()
|
| 168 |
+
|
| 169 |
+
def get_transcript(self):
|
| 170 |
+
"""Get the current transcript"""
|
| 171 |
+
return "\n".join(self.transcript)
|
| 172 |
+
|
| 173 |
+
def add_note(self, note):
|
| 174 |
+
"""Add a custom note"""
|
| 175 |
+
result = self.assessment.add_note(note)
|
| 176 |
+
return "", result, self.assessment.get_assessment_html()
|
| 177 |
+
|
| 178 |
+
def save_session(self, student_id):
|
| 179 |
+
"""Save session to file"""
|
| 180 |
+
return save_session_data(self.transcript, self.assessment, student_id)
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
# Create the speech pathology assistant
|
| 184 |
+
speech_assistant = SpeechPathologistAssistant()
|
| 185 |
+
|
| 186 |
+
|
| 187 |
+
async def start_session(voice_model, student_id):
|
| 188 |
+
"""Start the speech pathology session"""
|
| 189 |
+
status = await speech_assistant.start_assistant(voice_model, student_id)
|
| 190 |
+
return status, None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
def stop_session():
|
| 194 |
+
"""Stop the speech pathology session"""
|
| 195 |
+
return speech_assistant.stop_assistant(), None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
|
| 196 |
+
|
| 197 |
+
|
| 198 |
+
async def process_mic_input(audio, progress=gr.Progress()):
|
| 199 |
+
"""Process microphone input"""
|
| 200 |
+
progress(0, desc="Processing speech...")
|
| 201 |
+
audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
|
| 202 |
+
progress(1, desc="Done")
|
| 203 |
+
return audio_output, transcript, assessment
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
def add_note(note):
|
| 207 |
+
"""Add a note to the session"""
|
| 208 |
+
return speech_assistant.add_note(note)
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
def save_session(student_id):
|
| 212 |
+
"""Save the current session"""
|
| 213 |
+
return speech_assistant.save_session(student_id)
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
# Create Gradio Interface
|
| 217 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 218 |
+
gr.Markdown("# CASL-2 Speech Pathology Assistant")
|
| 219 |
+
gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
|
| 220 |
+
|
| 221 |
+
with gr.Row():
|
| 222 |
+
with gr.Column(scale=1):
|
| 223 |
+
student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
|
| 224 |
+
voice_select = gr.Dropdown(
|
| 225 |
+
["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
| 226 |
+
value="shimmer",
|
| 227 |
+
label="Assistant Voice"
|
| 228 |
+
)
|
| 229 |
+
start_button = gr.Button("Start Session", variant="primary")
|
| 230 |
+
stop_button = gr.Button("Stop Session", variant="stop")
|
| 231 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 232 |
+
|
| 233 |
+
with gr.Accordion("SLP Tools", open=True):
|
| 234 |
+
note_input = gr.Textbox(
|
| 235 |
+
label="Add Assessment Note",
|
| 236 |
+
placeholder="Enter observation or assessment note here..."
|
| 237 |
+
)
|
| 238 |
+
note_button = gr.Button("Add Note")
|
| 239 |
+
note_status = gr.Textbox(label="Note Status")
|
| 240 |
+
save_button = gr.Button("Save Session")
|
| 241 |
+
save_status = gr.Textbox(label="Save Status")
|
| 242 |
+
|
| 243 |
+
with gr.Column(scale=2):
|
| 244 |
+
audio_output = gr.Audio(label="AI Speech", autoplay=True)
|
| 245 |
+
audio_input = gr.Audio(
|
| 246 |
+
label="Speak to the AI",
|
| 247 |
+
type="microphone",
|
| 248 |
+
source="microphone",
|
| 249 |
+
streaming=True
|
| 250 |
+
)
|
| 251 |
+
|
| 252 |
+
with gr.Row():
|
| 253 |
+
with gr.Column(scale=1):
|
| 254 |
+
assessment_html = gr.HTML(label="Assessment Progress")
|
| 255 |
+
with gr.Column(scale=1):
|
| 256 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 257 |
+
|
| 258 |
+
with gr.Accordion("About This Application", open=False):
|
| 259 |
+
gr.Markdown("""
|
| 260 |
+
### About CASL-2 Speech Pathology Assistant
|
| 261 |
+
|
| 262 |
+
This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
|
| 263 |
+
|
| 264 |
+
- **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
|
| 265 |
+
- **Syntactic Skills**: Grammar and sentence structure
|
| 266 |
+
- **Supralinguistic Skills**: Higher-level language beyond literal meanings
|
| 267 |
+
- **Pragmatic Skills**: Social use of language (less emphasis for younger students)
|
| 268 |
+
|
| 269 |
+
The AI will provide structured assessments and exercises to help evaluate speech patterns.
|
| 270 |
+
|
| 271 |
+
### How to Use
|
| 272 |
+
|
| 273 |
+
1. Optionally enter a Student ID to track sessions
|
| 274 |
+
2. Select the AI voice you prefer
|
| 275 |
+
3. Click "Start Session" to begin
|
| 276 |
+
4. The AI will introduce itself and begin the assessment
|
| 277 |
+
5. Speak into your microphone when it's your turn
|
| 278 |
+
6. View the transcript to track the conversation
|
| 279 |
+
7. SLPs can add notes throughout the session
|
| 280 |
+
8. Save the session when finished
|
| 281 |
+
9. Click "Stop Session" when done
|
| 282 |
+
|
| 283 |
+
### For Speech-Language Pathologists
|
| 284 |
+
|
| 285 |
+
This tool is designed to supplement, not replace, professional SLP services. SLPs can:
|
| 286 |
+
|
| 287 |
+
- Add custom notes during the session
|
| 288 |
+
- Save session data for later reference
|
| 289 |
+
- Track progress across multiple sessions
|
| 290 |
+
- Use the AI as a consistent assessment tool
|
| 291 |
+
""")
|
| 292 |
+
|
| 293 |
+
# Setup event handlers
|
| 294 |
+
start_button.click(
|
| 295 |
+
fn=lambda voice, student: asyncio.run(start_session(voice, student)),
|
| 296 |
+
inputs=[voice_select, student_id],
|
| 297 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 298 |
+
)
|
| 299 |
+
stop_button.click(
|
| 300 |
+
fn=stop_session,
|
| 301 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 302 |
+
)
|
| 303 |
+
note_button.click(
|
| 304 |
+
fn=add_note,
|
| 305 |
+
inputs=note_input,
|
| 306 |
+
outputs=[note_input, note_status, assessment_html]
|
| 307 |
+
)
|
| 308 |
+
save_button.click(
|
| 309 |
+
fn=save_session,
|
| 310 |
+
inputs=student_id,
|
| 311 |
+
outputs=save_status
|
| 312 |
+
)
|
| 313 |
+
|
| 314 |
+
# Setup audio processing
|
| 315 |
+
audio_input.stream(
|
| 316 |
+
fn=lambda audio: asyncio.run(process_mic_input(audio)),
|
| 317 |
+
inputs=audio_input,
|
| 318 |
+
outputs=[audio_output, transcript, assessment_html]
|
| 319 |
+
)
|
| 320 |
+
|
| 321 |
+
|
| 322 |
+
def main(share=True):
|
| 323 |
+
"""Main function to launch the app"""
|
| 324 |
+
app.launch(share=share)
|
| 325 |
+
|
| 326 |
+
|
| 327 |
+
# Entry point for the application
|
| 328 |
+
if __name__ == "__main__":
|
| 329 |
+
main()
|
implementations/livekit/livekit_gradio_hf.py
ADDED
|
@@ -0,0 +1,490 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
CASL Voice Bot - Hugging Face Spaces deployment version
|
| 5 |
+
Using LiveKit with Gradio for Hugging Face Spaces compatibility
|
| 6 |
+
|
| 7 |
+
This is a special version optimized for Hugging Face Spaces deployment
|
| 8 |
+
that works with LiveKit's WebRTC capabilities.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import os
|
| 12 |
+
import asyncio
|
| 13 |
+
import gradio as gr
|
| 14 |
+
import logging
|
| 15 |
+
import sys
|
| 16 |
+
import time
|
| 17 |
+
import importlib.util
|
| 18 |
+
from dotenv import load_dotenv
|
| 19 |
+
from openai import AsyncOpenAI
|
| 20 |
+
|
| 21 |
+
# Check if livekit-agents is installed
|
| 22 |
+
try:
|
| 23 |
+
from livekit import agents
|
| 24 |
+
LIVEKIT_AVAILABLE = True
|
| 25 |
+
except ImportError:
|
| 26 |
+
LIVEKIT_AVAILABLE = False
|
| 27 |
+
# Create dummy classes for type hinting
|
| 28 |
+
class agents:
|
| 29 |
+
class InputDevice: pass
|
| 30 |
+
class OutputDevice: pass
|
| 31 |
+
class AudioChunk: pass
|
| 32 |
+
class VoiceAssistant: pass
|
| 33 |
+
|
| 34 |
+
# Add parent directory to path to import common utilities
|
| 35 |
+
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
|
| 36 |
+
from implementations.common.casl_utils import CASLAssessment, save_session_data, CASL_PROMPT
|
| 37 |
+
|
| 38 |
+
# Load environment variables
|
| 39 |
+
load_dotenv()
|
| 40 |
+
|
| 41 |
+
# Set up logging
|
| 42 |
+
logging.basicConfig(level=logging.INFO)
|
| 43 |
+
logger = logging.getLogger(__name__)
|
| 44 |
+
|
| 45 |
+
# Initialize OpenAI client
|
| 46 |
+
openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 47 |
+
|
| 48 |
+
# Hugging Face Spaces compatibility check
|
| 49 |
+
HF_SPACES = os.environ.get("SPACE_ID") is not None
|
| 50 |
+
logger.info(f"Running in Hugging Face Spaces: {HF_SPACES}")
|
| 51 |
+
logger.info(f"LiveKit available: {LIVEKIT_AVAILABLE}")
|
| 52 |
+
|
| 53 |
+
class GradioInputDevice(agents.InputDevice if LIVEKIT_AVAILABLE else object):
|
| 54 |
+
"""Custom input device that works with Gradio"""
|
| 55 |
+
|
| 56 |
+
def __init__(self):
|
| 57 |
+
if LIVEKIT_AVAILABLE:
|
| 58 |
+
super().__init__()
|
| 59 |
+
self.audio_queue = asyncio.Queue()
|
| 60 |
+
self.is_active = True
|
| 61 |
+
|
| 62 |
+
async def receive(self):
|
| 63 |
+
"""Receive audio data from the queue"""
|
| 64 |
+
try:
|
| 65 |
+
audio_data = await asyncio.wait_for(self.audio_queue.get(), timeout=0.1)
|
| 66 |
+
return audio_data
|
| 67 |
+
except asyncio.TimeoutError:
|
| 68 |
+
return None
|
| 69 |
+
|
| 70 |
+
async def add_audio(self, audio_data):
|
| 71 |
+
"""Add audio data to the queue"""
|
| 72 |
+
if audio_data is None:
|
| 73 |
+
return
|
| 74 |
+
|
| 75 |
+
# Convert gradio audio format to AudioChunk
|
| 76 |
+
if LIVEKIT_AVAILABLE:
|
| 77 |
+
sample_rate, audio_array = audio_data
|
| 78 |
+
audio_chunk = agents.AudioChunk(
|
| 79 |
+
samples=audio_array,
|
| 80 |
+
sample_rate=sample_rate,
|
| 81 |
+
is_last=False
|
| 82 |
+
)
|
| 83 |
+
await self.audio_queue.put(audio_chunk)
|
| 84 |
+
else:
|
| 85 |
+
# Store raw audio data if LiveKit is not available
|
| 86 |
+
await self.audio_queue.put(audio_data)
|
| 87 |
+
|
| 88 |
+
def stop(self):
|
| 89 |
+
"""Stop the input device"""
|
| 90 |
+
self.is_active = False
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
class GradioOutputDevice(agents.OutputDevice if LIVEKIT_AVAILABLE else object):
|
| 94 |
+
"""Custom output device that works with Gradio"""
|
| 95 |
+
|
| 96 |
+
def __init__(self):
|
| 97 |
+
if LIVEKIT_AVAILABLE:
|
| 98 |
+
super().__init__()
|
| 99 |
+
self.output_queue = asyncio.Queue()
|
| 100 |
+
|
| 101 |
+
async def transmit(self, audio_chunk):
|
| 102 |
+
"""Transmit audio chunk to the queue"""
|
| 103 |
+
if audio_chunk is not None:
|
| 104 |
+
if LIVEKIT_AVAILABLE:
|
| 105 |
+
await self.output_queue.put((audio_chunk.samples, audio_chunk.sample_rate))
|
| 106 |
+
else:
|
| 107 |
+
# Handle raw audio data if LiveKit is not available
|
| 108 |
+
await self.output_queue.put(audio_chunk)
|
| 109 |
+
|
| 110 |
+
async def get_latest_audio(self):
|
| 111 |
+
"""Get the latest audio from the queue"""
|
| 112 |
+
try:
|
| 113 |
+
return await asyncio.wait_for(self.output_queue.get(), timeout=0.1)
|
| 114 |
+
except asyncio.TimeoutError:
|
| 115 |
+
return None
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
class SpeechPathologistAssistant:
|
| 119 |
+
"""Speech pathologist assistant using LiveKit agents or direct OpenAI API"""
|
| 120 |
+
|
| 121 |
+
def __init__(self):
|
| 122 |
+
self.input_device = GradioInputDevice()
|
| 123 |
+
self.output_device = GradioOutputDevice()
|
| 124 |
+
self.assistant = None
|
| 125 |
+
self.assistant_task = None
|
| 126 |
+
self.transcript = []
|
| 127 |
+
self.is_running = False
|
| 128 |
+
self.assessment = CASLAssessment()
|
| 129 |
+
self.student_id = None
|
| 130 |
+
self.voice_model = "shimmer"
|
| 131 |
+
|
| 132 |
+
async def initialize_assistant(self, voice="shimmer"):
|
| 133 |
+
"""Initialize the speech assistant"""
|
| 134 |
+
self.voice_model = voice
|
| 135 |
+
|
| 136 |
+
if LIVEKIT_AVAILABLE:
|
| 137 |
+
# Use LiveKit VoiceAssistant if available
|
| 138 |
+
self.assistant = agents.VoiceAssistant(
|
| 139 |
+
openai_client=openai_client,
|
| 140 |
+
model="gpt-4o",
|
| 141 |
+
voice=voice,
|
| 142 |
+
input_device=self.input_device,
|
| 143 |
+
output_device=self.output_device,
|
| 144 |
+
initial_message=CASL_PROMPT,
|
| 145 |
+
real_time=True, # Enable real-time processing
|
| 146 |
+
)
|
| 147 |
+
|
| 148 |
+
# Add transcript and response callbacks
|
| 149 |
+
self.assistant.on_transcript = self.on_transcript
|
| 150 |
+
self.assistant.on_response = self.on_response
|
| 151 |
+
else:
|
| 152 |
+
# If LiveKit is not available, we'll use direct OpenAI API
|
| 153 |
+
logger.info("LiveKit not available, using direct OpenAI API")
|
| 154 |
+
|
| 155 |
+
def on_transcript(self, transcript):
|
| 156 |
+
"""Handle transcript from user (for LiveKit)"""
|
| 157 |
+
self.transcript.append(f"Student: {transcript.text}")
|
| 158 |
+
|
| 159 |
+
# Basic analysis of speech for CASL-2 categories
|
| 160 |
+
self.assessment.analyze_speech(transcript.text)
|
| 161 |
+
|
| 162 |
+
return True
|
| 163 |
+
|
| 164 |
+
def on_response(self, response):
|
| 165 |
+
"""Handle response from assistant (for LiveKit)"""
|
| 166 |
+
self.transcript.append(f"Speech Pathologist: {response.text}")
|
| 167 |
+
return True
|
| 168 |
+
|
| 169 |
+
async def start_assistant(self, voice_model, student_id):
|
| 170 |
+
"""Start the assistant in a background task"""
|
| 171 |
+
self.student_id = student_id
|
| 172 |
+
await self.initialize_assistant(voice_model)
|
| 173 |
+
|
| 174 |
+
self.is_running = True
|
| 175 |
+
|
| 176 |
+
# Add student info to transcript
|
| 177 |
+
student_info = f" for {student_id}" if student_id else ""
|
| 178 |
+
self.transcript.append(f"Session started{student_info}. The AI Speech Pathologist will speak first.")
|
| 179 |
+
|
| 180 |
+
if LIVEKIT_AVAILABLE:
|
| 181 |
+
# Run the LiveKit assistant in a background task
|
| 182 |
+
self.assistant_task = asyncio.create_task(self.assistant.run())
|
| 183 |
+
else:
|
| 184 |
+
# For direct OpenAI API, generate initial message
|
| 185 |
+
await self.generate_initial_message()
|
| 186 |
+
|
| 187 |
+
return "Session active. The AI will introduce itself."
|
| 188 |
+
|
| 189 |
+
async def generate_initial_message(self):
|
| 190 |
+
"""Generate initial AI message when not using LiveKit"""
|
| 191 |
+
# Generate assistant response using OpenAI API directly
|
| 192 |
+
chat_response = await openai_client.chat.completions.create(
|
| 193 |
+
model="gpt-4o",
|
| 194 |
+
messages=[
|
| 195 |
+
{"role": "system", "content": CASL_PROMPT},
|
| 196 |
+
{"role": "user", "content": "Hello"} # Initial trigger
|
| 197 |
+
]
|
| 198 |
+
)
|
| 199 |
+
|
| 200 |
+
assistant_text = chat_response.choices[0].message.content
|
| 201 |
+
self.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 202 |
+
|
| 203 |
+
# Generate speech from text
|
| 204 |
+
speech_response = await openai_client.audio.speech.create(
|
| 205 |
+
model="tts-1",
|
| 206 |
+
voice=self.voice_model,
|
| 207 |
+
input=assistant_text
|
| 208 |
+
)
|
| 209 |
+
|
| 210 |
+
# Save speech to temporary file to get audio data
|
| 211 |
+
import tempfile
|
| 212 |
+
import soundfile as sf
|
| 213 |
+
|
| 214 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 215 |
+
response_temp_file.close()
|
| 216 |
+
|
| 217 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 218 |
+
|
| 219 |
+
# Load audio data for Gradio
|
| 220 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 221 |
+
|
| 222 |
+
# Send to output device
|
| 223 |
+
await self.output_device.transmit((audio_data, sample_rate))
|
| 224 |
+
|
| 225 |
+
# Clean up
|
| 226 |
+
os.unlink(response_temp_file.name)
|
| 227 |
+
|
| 228 |
+
def stop_assistant(self):
|
| 229 |
+
"""Stop the assistant"""
|
| 230 |
+
if self.assistant_task and not self.assistant_task.done():
|
| 231 |
+
self.assistant_task.cancel()
|
| 232 |
+
|
| 233 |
+
self.input_device.stop()
|
| 234 |
+
self.is_running = False
|
| 235 |
+
|
| 236 |
+
# Add ending to transcript
|
| 237 |
+
self.transcript.append("Session ended.")
|
| 238 |
+
return "Session stopped."
|
| 239 |
+
|
| 240 |
+
async def process_audio(self, audio):
|
| 241 |
+
"""Process audio from Gradio interface"""
|
| 242 |
+
if not self.is_running or audio is None:
|
| 243 |
+
return None, self.get_transcript(), self.assessment.get_assessment_html()
|
| 244 |
+
|
| 245 |
+
if LIVEKIT_AVAILABLE:
|
| 246 |
+
# Add audio to input device for LiveKit
|
| 247 |
+
await self.input_device.add_audio(audio)
|
| 248 |
+
|
| 249 |
+
# Check for assistant output
|
| 250 |
+
output_audio = await self.output_device.get_latest_audio()
|
| 251 |
+
else:
|
| 252 |
+
# For direct OpenAI API
|
| 253 |
+
output_audio = await self.process_with_direct_api(audio)
|
| 254 |
+
|
| 255 |
+
return output_audio, self.get_transcript(), self.assessment.get_assessment_html()
|
| 256 |
+
|
| 257 |
+
async def process_with_direct_api(self, audio):
|
| 258 |
+
"""Process audio using direct OpenAI API when LiveKit is not available"""
|
| 259 |
+
# Prepare audio file for OpenAI
|
| 260 |
+
import tempfile
|
| 261 |
+
import scipy.io.wavfile
|
| 262 |
+
|
| 263 |
+
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
|
| 264 |
+
temp_file.close()
|
| 265 |
+
|
| 266 |
+
try:
|
| 267 |
+
# Save audio data to temporary file
|
| 268 |
+
sample_rate, audio_array = audio
|
| 269 |
+
scipy.io.wavfile.write(temp_file.name, sample_rate, audio_array)
|
| 270 |
+
|
| 271 |
+
# Transcribe audio using OpenAI
|
| 272 |
+
with open(temp_file.name, "rb") as audio_file:
|
| 273 |
+
transcript_response = await openai_client.audio.transcriptions.create(
|
| 274 |
+
file=audio_file,
|
| 275 |
+
model="whisper-1"
|
| 276 |
+
)
|
| 277 |
+
|
| 278 |
+
user_text = transcript_response.text
|
| 279 |
+
if user_text.strip():
|
| 280 |
+
self.transcript.append(f"Student: {user_text}")
|
| 281 |
+
|
| 282 |
+
# Analyze speech for CASL-2 categories
|
| 283 |
+
self.assessment.analyze_speech(user_text)
|
| 284 |
+
|
| 285 |
+
# Generate assistant response
|
| 286 |
+
chat_response = await openai_client.chat.completions.create(
|
| 287 |
+
model="gpt-4o",
|
| 288 |
+
messages=[
|
| 289 |
+
{"role": "system", "content": CASL_PROMPT},
|
| 290 |
+
{"role": "user", "content": user_text}
|
| 291 |
+
]
|
| 292 |
+
)
|
| 293 |
+
|
| 294 |
+
assistant_text = chat_response.choices[0].message.content
|
| 295 |
+
self.transcript.append(f"Speech Pathologist: {assistant_text}")
|
| 296 |
+
|
| 297 |
+
# Generate speech from text
|
| 298 |
+
speech_response = await openai_client.audio.speech.create(
|
| 299 |
+
model="tts-1",
|
| 300 |
+
voice=self.voice_model,
|
| 301 |
+
input=assistant_text
|
| 302 |
+
)
|
| 303 |
+
|
| 304 |
+
# Save speech to temporary file
|
| 305 |
+
import soundfile as sf
|
| 306 |
+
|
| 307 |
+
response_temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
|
| 308 |
+
response_temp_file.close()
|
| 309 |
+
|
| 310 |
+
speech_response.stream_to_file(response_temp_file.name)
|
| 311 |
+
|
| 312 |
+
# Load audio data for Gradio
|
| 313 |
+
audio_data, sample_rate = sf.read(response_temp_file.name)
|
| 314 |
+
|
| 315 |
+
# Clean up
|
| 316 |
+
os.unlink(response_temp_file.name)
|
| 317 |
+
|
| 318 |
+
return (sample_rate, audio_data)
|
| 319 |
+
except Exception as e:
|
| 320 |
+
logger.error(f"Error processing with direct API: {e}")
|
| 321 |
+
finally:
|
| 322 |
+
# Clean up temp file
|
| 323 |
+
os.unlink(temp_file.name)
|
| 324 |
+
|
| 325 |
+
return None
|
| 326 |
+
|
| 327 |
+
def get_transcript(self):
|
| 328 |
+
"""Get the current transcript"""
|
| 329 |
+
return "\n".join(self.transcript)
|
| 330 |
+
|
| 331 |
+
def add_note(self, note):
|
| 332 |
+
"""Add a custom note"""
|
| 333 |
+
result = self.assessment.add_note(note)
|
| 334 |
+
return "", result, self.assessment.get_assessment_html()
|
| 335 |
+
|
| 336 |
+
def save_session(self, student_id=None):
|
| 337 |
+
"""Save session to file"""
|
| 338 |
+
student_id = student_id or self.student_id
|
| 339 |
+
return save_session_data(self.transcript, self.assessment, student_id)
|
| 340 |
+
|
| 341 |
+
|
| 342 |
+
# Create the speech pathology assistant
|
| 343 |
+
speech_assistant = SpeechPathologistAssistant()
|
| 344 |
+
|
| 345 |
+
|
| 346 |
+
async def start_session(voice_model, student_id):
|
| 347 |
+
"""Start the speech pathology session"""
|
| 348 |
+
status = await speech_assistant.start_assistant(voice_model, student_id)
|
| 349 |
+
return status, None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
|
| 350 |
+
|
| 351 |
+
|
| 352 |
+
def stop_session():
|
| 353 |
+
"""Stop the speech pathology session"""
|
| 354 |
+
return speech_assistant.stop_assistant(), None, speech_assistant.get_transcript(), speech_assistant.assessment.get_assessment_html()
|
| 355 |
+
|
| 356 |
+
|
| 357 |
+
async def process_mic_input(audio, progress=gr.Progress()):
|
| 358 |
+
"""Process microphone input"""
|
| 359 |
+
progress(0, desc="Processing speech...")
|
| 360 |
+
audio_output, transcript, assessment = await speech_assistant.process_audio(audio)
|
| 361 |
+
progress(1, desc="Done")
|
| 362 |
+
return audio_output, transcript, assessment
|
| 363 |
+
|
| 364 |
+
|
| 365 |
+
def add_note(note):
|
| 366 |
+
"""Add a note to the session"""
|
| 367 |
+
return speech_assistant.add_note(note)
|
| 368 |
+
|
| 369 |
+
|
| 370 |
+
def save_session(student_id):
|
| 371 |
+
"""Save the current session"""
|
| 372 |
+
return speech_assistant.save_session(student_id)
|
| 373 |
+
|
| 374 |
+
|
| 375 |
+
# Create Gradio Interface
|
| 376 |
+
with gr.Blocks(title="CASL-2 Speech Pathology Assistant") as app:
|
| 377 |
+
gr.Markdown("# CASL-2 Speech Pathology Assistant")
|
| 378 |
+
gr.Markdown("### AI-powered speech therapy assessment based on the CASL-2 framework")
|
| 379 |
+
|
| 380 |
+
# Show LiveKit availability
|
| 381 |
+
if not LIVEKIT_AVAILABLE:
|
| 382 |
+
gr.Markdown(
|
| 383 |
+
"⚠️ **Notice:** Running without LiveKit agents. Using direct OpenAI API instead. "
|
| 384 |
+
"For best performance, install livekit-agents package."
|
| 385 |
+
)
|
| 386 |
+
|
| 387 |
+
with gr.Row():
|
| 388 |
+
with gr.Column(scale=1):
|
| 389 |
+
student_id = gr.Textbox(label="Student ID (optional)", placeholder="Enter student ID")
|
| 390 |
+
voice_select = gr.Dropdown(
|
| 391 |
+
["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
| 392 |
+
value="shimmer",
|
| 393 |
+
label="Assistant Voice"
|
| 394 |
+
)
|
| 395 |
+
start_button = gr.Button("Start Session", variant="primary")
|
| 396 |
+
stop_button = gr.Button("Stop Session", variant="stop")
|
| 397 |
+
status = gr.Textbox(label="Status", value="Ready to start")
|
| 398 |
+
|
| 399 |
+
with gr.Accordion("SLP Tools", open=True):
|
| 400 |
+
note_input = gr.Textbox(
|
| 401 |
+
label="Add Assessment Note",
|
| 402 |
+
placeholder="Enter observation or assessment note here..."
|
| 403 |
+
)
|
| 404 |
+
note_button = gr.Button("Add Note")
|
| 405 |
+
note_status = gr.Textbox(label="Note Status")
|
| 406 |
+
save_button = gr.Button("Save Session")
|
| 407 |
+
save_status = gr.Textbox(label="Save Status")
|
| 408 |
+
|
| 409 |
+
with gr.Column(scale=2):
|
| 410 |
+
audio_output = gr.Audio(label="AI Speech", autoplay=True)
|
| 411 |
+
audio_input = gr.Audio(
|
| 412 |
+
label="Speak to the AI",
|
| 413 |
+
type="microphone",
|
| 414 |
+
source="microphone",
|
| 415 |
+
streaming=True
|
| 416 |
+
)
|
| 417 |
+
|
| 418 |
+
with gr.Row():
|
| 419 |
+
with gr.Column(scale=1):
|
| 420 |
+
assessment_html = gr.HTML(label="Assessment Progress")
|
| 421 |
+
with gr.Column(scale=1):
|
| 422 |
+
transcript = gr.Textbox(label="Transcript", lines=10)
|
| 423 |
+
|
| 424 |
+
with gr.Accordion("About This Application", open=False):
|
| 425 |
+
gr.Markdown("""
|
| 426 |
+
### About CASL-2 Speech Pathology Assistant
|
| 427 |
+
|
| 428 |
+
This application provides an AI speech pathologist that can assess students using the CASL-2 framework. It focuses on:
|
| 429 |
+
|
| 430 |
+
- **Lexical/Semantic Skills**: Vocabulary knowledge and word usage
|
| 431 |
+
- **Syntactic Skills**: Grammar and sentence structure
|
| 432 |
+
- **Supralinguistic Skills**: Higher-level language beyond literal meanings
|
| 433 |
+
- **Pragmatic Skills**: Social use of language (less emphasis for younger students)
|
| 434 |
+
|
| 435 |
+
The AI will provide structured assessments and exercises to help evaluate speech patterns.
|
| 436 |
+
|
| 437 |
+
### How to Use
|
| 438 |
+
|
| 439 |
+
1. Optionally enter a Student ID to track sessions
|
| 440 |
+
2. Select the AI voice you prefer
|
| 441 |
+
3. Click "Start Session" to begin
|
| 442 |
+
4. The AI will introduce itself and begin the assessment
|
| 443 |
+
5. Speak into your microphone when it's your turn
|
| 444 |
+
6. View the transcript to track the conversation
|
| 445 |
+
7. SLPs can add notes throughout the session
|
| 446 |
+
8. Save the session when finished
|
| 447 |
+
9. Click "Stop Session" when done
|
| 448 |
+
|
| 449 |
+
### For Speech-Language Pathologists
|
| 450 |
+
|
| 451 |
+
This tool is designed to supplement, not replace, professional SLP services. SLPs can:
|
| 452 |
+
|
| 453 |
+
- Add custom notes during the session
|
| 454 |
+
- Save session data for later reference
|
| 455 |
+
- Track progress across multiple sessions
|
| 456 |
+
- Use the AI as a consistent assessment tool
|
| 457 |
+
""")
|
| 458 |
+
|
| 459 |
+
# Setup event handlers
|
| 460 |
+
start_button.click(
|
| 461 |
+
fn=lambda voice, student: asyncio.run(start_session(voice, student)),
|
| 462 |
+
inputs=[voice_select, student_id],
|
| 463 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 464 |
+
)
|
| 465 |
+
stop_button.click(
|
| 466 |
+
fn=stop_session,
|
| 467 |
+
outputs=[status, audio_output, transcript, assessment_html]
|
| 468 |
+
)
|
| 469 |
+
note_button.click(
|
| 470 |
+
fn=add_note,
|
| 471 |
+
inputs=note_input,
|
| 472 |
+
outputs=[note_input, note_status, assessment_html]
|
| 473 |
+
)
|
| 474 |
+
save_button.click(
|
| 475 |
+
fn=save_session,
|
| 476 |
+
inputs=student_id,
|
| 477 |
+
outputs=save_status
|
| 478 |
+
)
|
| 479 |
+
|
| 480 |
+
# Setup audio processing
|
| 481 |
+
audio_input.stream(
|
| 482 |
+
fn=lambda audio: asyncio.run(process_mic_input(audio)),
|
| 483 |
+
inputs=audio_input,
|
| 484 |
+
outputs=[audio_output, transcript, assessment_html]
|
| 485 |
+
)
|
| 486 |
+
|
| 487 |
+
|
| 488 |
+
# Entry point for the application
|
| 489 |
+
if __name__ == "__main__":
|
| 490 |
+
app.launch(share=True)
|
livekit_requirements.txt
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
python-dotenv>=1.0.0
|
| 2 |
+
livekit-agents>=0.7.0
|
| 3 |
+
openai>=1.3.0
|
| 4 |
+
gradio>=4.0.0
|
| 5 |
+
soundfile>=0.12.1
|
| 6 |
+
scipy>=1.10.0
|
| 7 |
+
asyncio>=3.4.3
|
| 8 |
+
numpy>=1.24.0
|
requirements.txt
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
python-dotenv>=1.0.0
|
| 2 |
+
openai>=1.3.0
|
| 3 |
+
gradio>=4.0.0
|
| 4 |
+
soundfile>=0.12.1
|
| 5 |
+
scipy>=1.10.0
|
| 6 |
+
asyncio>=3.4.3
|
| 7 |
+
numpy>=1.24.0
|
| 8 |
+
|
| 9 |
+
# Optional: LiveKit integration (uncomment to use)
|
| 10 |
+
# livekit-agents>=0.7.0
|
run.py
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Main entry point for the CASL Voice Bot application.
|
| 5 |
+
Launches the Gradio web interface.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import argparse
|
| 9 |
+
from app.gradio_app import main
|
| 10 |
+
|
| 11 |
+
if __name__ == "__main__":
|
| 12 |
+
parser = argparse.ArgumentParser(description="CASL Voice Bot - AI Speech Therapist")
|
| 13 |
+
parser.add_argument("--share", action="store_true", help="Share the app publicly (for HuggingFace)")
|
| 14 |
+
args = parser.parse_args()
|
| 15 |
+
|
| 16 |
+
main()
|
run_app.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Main entry point for the CASL Voice Bot application.
|
| 5 |
+
Launches the Gradio web interface.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import argparse
|
| 9 |
+
from app_main import main
|
| 10 |
+
|
| 11 |
+
if __name__ == "__main__":
|
| 12 |
+
parser = argparse.ArgumentParser(description="CASL Voice Bot - AI Speech Therapist")
|
| 13 |
+
parser.add_argument("--share", action="store_true", help="Share the app publicly")
|
| 14 |
+
parser.add_argument("--local", action="store_true", help="Run the app locally without sharing")
|
| 15 |
+
args = parser.parse_args()
|
| 16 |
+
|
| 17 |
+
main(share=not args.local)
|
run_direct.py
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Run the CASL Voice Bot using direct OpenAI API (no LiveKit)
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import argparse
|
| 8 |
+
from implementations.direct.app import main
|
| 9 |
+
|
| 10 |
+
if __name__ == "__main__":
|
| 11 |
+
parser = argparse.ArgumentParser(description="CASL Voice Bot - AI Speech Therapist (Direct API)")
|
| 12 |
+
parser.add_argument("--share", action="store_true", help="Share the app publicly")
|
| 13 |
+
parser.add_argument("--local", action="store_true", help="Run the app locally without sharing")
|
| 14 |
+
args = parser.parse_args()
|
| 15 |
+
|
| 16 |
+
main(share=not args.local)
|
run_livekit.py
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
|
| 3 |
+
"""
|
| 4 |
+
Run the CASL Voice Bot using LiveKit and OpenAI real-time capabilities
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import argparse
|
| 8 |
+
from implementations.livekit.app import main
|
| 9 |
+
|
| 10 |
+
if __name__ == "__main__":
|
| 11 |
+
parser = argparse.ArgumentParser(description="CASL Voice Bot with LiveKit - AI Speech Therapist")
|
| 12 |
+
parser.add_argument("--share", action="store_true", help="Share the app publicly")
|
| 13 |
+
parser.add_argument("--local", action="store_true", help="Run the app locally without sharing")
|
| 14 |
+
args = parser.parse_args()
|
| 15 |
+
|
| 16 |
+
main(share=not args.local)
|