Spaces:
Build error
Build error
Commit ·
76ed5b1
1
Parent(s): 90b1256
updated instructions
Browse files- .env.example +6 -0
- README.md +50 -0
- instructions/Architecture.MD +83 -0
- instructions/Step_By_Step_Instructions.MD +93 -0
- instructions/instructions.MD +8 -4
- requirements.txt +9 -0
- streamlit_app/graph/workflow.py +86 -0
- streamlit_app/main.py +74 -0
- streamlit_app/state/dictionary_manager.py +50 -0
.env.example
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Google API Key for Gemini
|
| 2 |
+
GOOGLE_API_KEY=your_google_api_key
|
| 3 |
+
|
| 4 |
+
# Optional: Coqui TTS model settings
|
| 5 |
+
COQUI_MODEL_NAME=tts_models/en/ljspeech/tacotron2-DDC
|
| 6 |
+
COQUI_VOCODER_NAME=vocoder_models/en/ljspeech/hifigan_v2
|
README.md
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Polyglot - AI Language Learning Assistant
|
| 2 |
+
|
| 3 |
+
An AI-powered language learning assistant that helps users learn foreign languages through interactive chat. The application supports both text and voice interactions, leveraging state-of-the-art AI models for transcription, translation, and speech synthesis.
|
| 4 |
+
|
| 5 |
+
## Features
|
| 6 |
+
|
| 7 |
+
- Voice and text input support
|
| 8 |
+
- Real-time translation
|
| 9 |
+
- Interactive chat with AI language tutor
|
| 10 |
+
- Personal dictionary and phrase storage
|
| 11 |
+
- Text-to-speech for pronunciation practice
|
| 12 |
+
|
| 13 |
+
## Tech Stack
|
| 14 |
+
|
| 15 |
+
- Frontend: Streamlit
|
| 16 |
+
- Backend: LangGraph, LangChain
|
| 17 |
+
- Speech Processing: Whisper (STT), Coqui (TTS)
|
| 18 |
+
- LLM: Google Gemini via LangChain
|
| 19 |
+
|
| 20 |
+
## Setup
|
| 21 |
+
|
| 22 |
+
1. Install dependencies:
|
| 23 |
+
```bash
|
| 24 |
+
pip install -r requirements.txt
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
2. Set up environment variables:
|
| 28 |
+
Create a `.env` file with the following:
|
| 29 |
+
```
|
| 30 |
+
GOOGLE_API_KEY=your_google_api_key
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
3. Run the application:
|
| 34 |
+
```bash
|
| 35 |
+
streamlit run streamlit_app/main.py
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## Project Structure
|
| 39 |
+
|
| 40 |
+
```
|
| 41 |
+
streamlit_app/
|
| 42 |
+
├── graph/ # LangGraph implementation
|
| 43 |
+
├── state/ # Application state management
|
| 44 |
+
├── components/ # Streamlit components
|
| 45 |
+
└── utils/ # Utility functions
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Contributing
|
| 49 |
+
|
| 50 |
+
Feel free to submit issues and enhancement requests!
|
instructions/Architecture.MD
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
**I. Overall Architecture**
|
| 2 |
+
|
| 3 |
+
Polyglot will adopt a modular architecture, separating the frontend (Streamlit app), backend (LangGraph workflows), and external services (Whisper, Coqui, LLM). This promotes maintainability, scalability, and testability.
|
| 4 |
+
|
| 5 |
+
**II. Component Breakdown**
|
| 6 |
+
|
| 7 |
+
1. **Frontend (Streamlit App):**
|
| 8 |
+
* Responsible for user interaction (text input, audio recording, chat display).
|
| 9 |
+
* Handles UI elements like the chat interface, sidebar (dictionary, phrases), and download option.
|
| 10 |
+
* Communicates with the backend via API calls.
|
| 11 |
+
* Key files: `streamlit_app/main.py` (entry point), UI components in dedicated modules.
|
| 12 |
+
|
| 13 |
+
2. **Backend (LangGraph Workflows):**
|
| 14 |
+
* Orchestrates the overall workflow, managing dependencies between different components.
|
| 15 |
+
* Handles intent detection (chat vs. translation).
|
| 16 |
+
* Manages conversation context (multi-turn conversations).
|
| 17 |
+
* Persists user dictionaries.
|
| 18 |
+
* Key files: `streamlit_app/graph/workflow.py`, `streamlit_app/graph/nodes.py` (individual processing steps).
|
| 19 |
+
|
| 20 |
+
3. **Audio Processing:**
|
| 21 |
+
* **Whisper:** Transcribes audio to text. Integrated as an external service called by the backend.
|
| 22 |
+
* **Coqui:** Converts text to speech. Integrated as an external service called by the backend.
|
| 23 |
+
|
| 24 |
+
4. **LLM (LangChain's ChatGoogleGenerativeAI):**
|
| 25 |
+
* Provides translation and conversational AI functionalities.
|
| 26 |
+
* Integrated via LangChain.
|
| 27 |
+
|
| 28 |
+
5. **Data Storage:**
|
| 29 |
+
* User dictionaries (words/phrases categorized by language) will be persisted. The specific database to use is not defined, so I will assume a simple JSON file for now.
|
| 30 |
+
|
| 31 |
+
**III. Workflow**
|
| 32 |
+
|
| 33 |
+
The [instructions.MD](cci:7://file:///home/yoda/Library/Projects/Portfolio/Polyglot/instructions/instructions.MD:0:0-0:0) document describes two main workflows: Voice Input and Text Input. LangGraph will manage the state and transitions between steps in these workflows.
|
| 34 |
+
|
| 35 |
+
**Voice Input Flow:**
|
| 36 |
+
|
| 37 |
+
1. Streamlit captures audio.
|
| 38 |
+
2. Streamlit sends audio to backend.
|
| 39 |
+
3. Backend uses Whisper to transcribe audio to text.
|
| 40 |
+
4. Backend determines intent (chat or translation).
|
| 41 |
+
5. If translation:
|
| 42 |
+
* Backend detects the language.
|
| 43 |
+
* Backend uses LLM to translate text.
|
| 44 |
+
* Backend uses Coqui to convert translated text to speech.
|
| 45 |
+
* Backend stores phrases/words in the dictionary.
|
| 46 |
+
* Backend sends translated text and audio to Streamlit for display.
|
| 47 |
+
6. If chat:
|
| 48 |
+
* Backend uses LLM to generate a response.
|
| 49 |
+
* Backend sends text response to Streamlit for display.
|
| 50 |
+
|
| 51 |
+
**Text Input Flow:**
|
| 52 |
+
|
| 53 |
+
1. Streamlit captures text.
|
| 54 |
+
2. Streamlit sends text to backend.
|
| 55 |
+
3. Backend determines intent (chat or translation).
|
| 56 |
+
4. If translation:
|
| 57 |
+
* Backend detects the language.
|
| 58 |
+
* Backend uses LLM to translate text.
|
| 59 |
+
* Backend uses Coqui to convert translated text to speech.
|
| 60 |
+
* Backend stores phrases/words in the dictionary.
|
| 61 |
+
* Backend sends translated text and audio to Streamlit for display.
|
| 62 |
+
5. If chat:
|
| 63 |
+
* Backend uses LLM to generate a response.
|
| 64 |
+
* Backend sends text response to Streamlit for display.
|
| 65 |
+
|
| 66 |
+
**IV. Key Considerations**
|
| 67 |
+
|
| 68 |
+
* **Error Handling:** Implement robust error handling for API failures in Whisper, Coqui, and LLM processing.
|
| 69 |
+
* **State Management:** LangGraph will be crucial for maintaining conversation context and managing workflow dependencies.
|
| 70 |
+
* **Persistence:** User dictionaries should be persisted to provide a personalized learning experience.
|
| 71 |
+
* **Scalability:** The architecture should be designed to handle a growing number of users and languages.
|
| 72 |
+
|
| 73 |
+
**V. Leveraging Documentation**
|
| 74 |
+
|
| 75 |
+
The [instructions.MD](cci:7://file:///home/yoda/Library/Projects/Portfolio/Polyglot/instructions/instructions.MD:0:0-0:0) file provides links to relevant documentation for Streamlit, Whisper, Coqui, LangGraph, and LangChain. These resources will be essential for implementing the project.
|
| 76 |
+
|
| 77 |
+
* [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
|
| 78 |
+
* [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
|
| 79 |
+
* [LangChain's ChatGoogleGenerativeAI Documentation](https://docs.litellm.ai/docs/)
|
| 80 |
+
* [Streamlit Chat Guide](https://docs.streamlit.io/develop/tutorials/chat-and-llm-apps/build-conversational-apps)
|
| 81 |
+
* [Whisper (Speech-to-Text)](https://github.com/openai/whisper)
|
| 82 |
+
* [Coqui (Text-to-Speech)](https://docs.coqui.ai/en/latest/)
|
| 83 |
+
* [Langgraph - cross-thread persistence](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
|
instructions/Step_By_Step_Instructions.MD
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
**Phase 1: Project Setup and Core Dependencies**
|
| 2 |
+
1. **Create Directory Structure:**
|
| 3 |
+
* Create the following directory structure inside `streamlit_app`:
|
| 4 |
+
```
|
| 5 |
+
streamlit_app/
|
| 6 |
+
├── graph/ # LangGraph implementation
|
| 7 |
+
├── state/ # Application state management
|
| 8 |
+
├── components/ # Streamlit components
|
| 9 |
+
├── utils/ # Utility functions
|
| 10 |
+
```
|
| 11 |
+
|
| 12 |
+
**Phase 2: Frontend (Streamlit App) Development**
|
| 13 |
+
|
| 14 |
+
1. **Create `streamlit_app/main.py`:**
|
| 15 |
+
* Set up the basic Streamlit app structure.
|
| 16 |
+
* Add a title and a simple chat interface with a text input field.
|
| 17 |
+
|
| 18 |
+
2. **Implement Chat Interface:**
|
| 19 |
+
* Create a function to display chat messages.
|
| 20 |
+
* Create a function to handle user input and send it to the backend.
|
| 21 |
+
* Use Streamlit's `st.session_state` to manage chat history.
|
| 22 |
+
|
| 23 |
+
3. **Implement Audio Recording:**
|
| 24 |
+
* Use Streamlit's audio recording component (or a custom component) to allow users to record audio.
|
| 25 |
+
* Handle audio data and send it to the backend.
|
| 26 |
+
|
| 27 |
+
4. **Implement Sidebar:**
|
| 28 |
+
* Add a sidebar with "Dictionary" and "Phrases" buttons.
|
| 29 |
+
* Create placeholder functions for dictionary and phrases views.
|
| 30 |
+
* Implement the "Download" option with email prompt (basic implementation, can be refined later).
|
| 31 |
+
|
| 32 |
+
**Phase 3: Backend (LangGraph) Development**
|
| 33 |
+
|
| 34 |
+
1. **Create `streamlit_app/graph/workflow.py`:**
|
| 35 |
+
* Set up the basic LangGraph workflow.
|
| 36 |
+
* Define the nodes for transcription, intent detection, translation, speech synthesis, and response generation.
|
| 37 |
+
|
| 38 |
+
2. **Implement Transcription Node:**
|
| 39 |
+
* Integrate Whisper to transcribe audio to text.
|
| 40 |
+
* Handle API calls to Whisper.
|
| 41 |
+
|
| 42 |
+
3. **Implement Intent Detection Node:**
|
| 43 |
+
* Use a simple rule-based approach or a small LLM to determine if the user's intent is chat or translation.
|
| 44 |
+
|
| 45 |
+
4. **Implement Translation Node:**
|
| 46 |
+
* Integrate LangChain's ChatGoogleGenerativeAI to translate text.
|
| 47 |
+
* Handle API calls to the LLM.
|
| 48 |
+
|
| 49 |
+
5. **Implement Speech Synthesis Node:**
|
| 50 |
+
* Integrate Coqui to convert translated text to speech.
|
| 51 |
+
* Handle API calls to Coqui.
|
| 52 |
+
|
| 53 |
+
6. **Implement Response Generation Node:**
|
| 54 |
+
* Use LangChain's ChatGoogleGenerativeAI to generate chat responses.
|
| 55 |
+
* Handle API calls to the LLM.
|
| 56 |
+
|
| 57 |
+
7. **Implement Dictionary Storage:**
|
| 58 |
+
* Create a function to store phrases and words in a JSON file, categorized by language.
|
| 59 |
+
|
| 60 |
+
**Phase 4: Integration and Testing**
|
| 61 |
+
|
| 62 |
+
1. **Connect Frontend and Backend:**
|
| 63 |
+
* Implement API endpoints in the backend to receive text and audio data from the frontend.
|
| 64 |
+
* Send data from the frontend to the backend using HTTP requests.
|
| 65 |
+
|
| 66 |
+
2. **Test Voice Input Flow:**
|
| 67 |
+
* Record audio in the frontend and verify that it is transcribed correctly, translated (if applicable), and played back.
|
| 68 |
+
|
| 69 |
+
3. **Test Text Input Flow:**
|
| 70 |
+
* Enter text in the frontend and verify that it is translated (if applicable) and displayed correctly.
|
| 71 |
+
|
| 72 |
+
4. **Test Chat Mode:**
|
| 73 |
+
* Engage in a conversation with the chatbot and verify that it generates appropriate responses.
|
| 74 |
+
|
| 75 |
+
5. **Test Dictionary Functionality:**
|
| 76 |
+
* Add words and phrases to the dictionary and verify that they are stored correctly.
|
| 77 |
+
* Test the "Download" option.
|
| 78 |
+
|
| 79 |
+
**Phase 5: Refinement and Enhancements**
|
| 80 |
+
|
| 81 |
+
1. **Implement Error Handling:**
|
| 82 |
+
* Add error handling for API failures in Whisper, Coqui, and LLM processing.
|
| 83 |
+
|
| 84 |
+
2. **Optimize Performance:**
|
| 85 |
+
* Optimize the performance of the transcription, translation, and speech synthesis processes.
|
| 86 |
+
|
| 87 |
+
3. **Improve UI:**
|
| 88 |
+
* Refine the UI based on user feedback.
|
| 89 |
+
|
| 90 |
+
4. **Implement Future Enhancements:**
|
| 91 |
+
* Implement multi-language support for the UI and translations.
|
| 92 |
+
* Integrate with spaced repetition for better vocabulary retention.
|
| 93 |
+
* Implement speech recognition feedback to correct pronunciation.
|
instructions/instructions.MD
CHANGED
|
@@ -65,21 +65,21 @@ streamlit_app/
|
|
| 65 |
1. User records audio in Streamlit.
|
| 66 |
2. Whisper transcribes audio to text.
|
| 67 |
3. The system checks for intent (chat or translation request).The system checks for intent (chat or translation request).
|
| 68 |
-
|
| 69 |
- The system detects the language to translate text to.
|
| 70 |
- Translate text via LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
|
| 71 |
- Convert translated text to speech using Coqui.
|
| 72 |
- Store phrases/words categorized by language for learning.
|
| 73 |
-
|
| 74 |
- Generate a response using LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
|
| 75 |
- Display text response.
|
| 76 |
|
| 77 |
#### **3.2 Text Input Flow**
|
| 78 |
|
| 79 |
1. User enters text manually.
|
| 80 |
-
2. System detects the language of the input text.
|
| 81 |
3. System determines intent (chat or translation).
|
| 82 |
4. If translation:
|
|
|
|
| 83 |
- LLM translates text (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
|
| 84 |
- Display translated text.
|
| 85 |
- Convert to speech via Coqui.
|
|
@@ -120,6 +120,11 @@ streamlit_app/
|
|
| 120 |
|
| 121 |
- Example: `examples/coqui-app.py`
|
| 122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
- **LLM Processing:**
|
| 124 |
|
| 125 |
- [LangChain's ChatGoogleGenerativeAI Documentation](https://docs.litellm.ai/docs/)
|
|
@@ -128,7 +133,6 @@ streamlit_app/
|
|
| 128 |
|
| 129 |
## Developer Notes
|
| 130 |
|
| 131 |
-
- Ensure **low latency** in voice-to-text and text-to-voice conversion.
|
| 132 |
- Implement **error handling** for API failures in Whisper, Coqui, and LLM processing.
|
| 133 |
- Use **LangGraph** to maintain conversation context and manage workflow dependencies.
|
| 134 |
- **Persist user dictionary** (words/phrases) categorized by language for personalized learning.
|
|
|
|
| 65 |
1. User records audio in Streamlit.
|
| 66 |
2. Whisper transcribes audio to text.
|
| 67 |
3. The system checks for intent (chat or translation request).The system checks for intent (chat or translation request).
|
| 68 |
+
If translation:
|
| 69 |
- The system detects the language to translate text to.
|
| 70 |
- Translate text via LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
|
| 71 |
- Convert translated text to speech using Coqui.
|
| 72 |
- Store phrases/words categorized by language for learning.
|
| 73 |
+
If chat:
|
| 74 |
- Generate a response using LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
|
| 75 |
- Display text response.
|
| 76 |
|
| 77 |
#### **3.2 Text Input Flow**
|
| 78 |
|
| 79 |
1. User enters text manually.
|
|
|
|
| 80 |
3. System determines intent (chat or translation).
|
| 81 |
4. If translation:
|
| 82 |
+
- The system detects the language to translate text to.
|
| 83 |
- LLM translates text (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
|
| 84 |
- Display translated text.
|
| 85 |
- Convert to speech via Coqui.
|
|
|
|
| 120 |
|
| 121 |
- Example: `examples/coqui-app.py`
|
| 122 |
|
| 123 |
+
- **Langgraph**
|
| 124 |
+
- [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
|
| 125 |
+
- [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
|
| 126 |
+
|
| 127 |
+
|
| 128 |
- **LLM Processing:**
|
| 129 |
|
| 130 |
- [LangChain's ChatGoogleGenerativeAI Documentation](https://docs.litellm.ai/docs/)
|
|
|
|
| 133 |
|
| 134 |
## Developer Notes
|
| 135 |
|
|
|
|
| 136 |
- Implement **error handling** for API failures in Whisper, Coqui, and LLM processing.
|
| 137 |
- Use **LangGraph** to maintain conversation context and manage workflow dependencies.
|
| 138 |
- **Persist user dictionary** (words/phrases) categorized by language for personalized learning.
|
requirements.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
streamlit>=1.30.0
|
| 2 |
+
langgraph>=0.0.15
|
| 3 |
+
langchain>=0.1.0
|
| 4 |
+
langchain-google-genai>=0.0.6
|
| 5 |
+
openai-whisper>=20231117
|
| 6 |
+
TTS>=0.22.0 # Coqui TTS
|
| 7 |
+
google-cloud-texttospeech>=2.14.1
|
| 8 |
+
python-dotenv>=1.0.0
|
| 9 |
+
requests>=2.31.0
|
streamlit_app/graph/workflow.py
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import Dict, TypedDict, Annotated
|
| 2 |
+
from langgraph.graph import StateGraph, END
|
| 3 |
+
from langchain_google_genai import ChatGoogleGenerativeAI
|
| 4 |
+
from dotenv import load_dotenv
|
| 5 |
+
import os
|
| 6 |
+
|
| 7 |
+
# Load environment variables
|
| 8 |
+
load_dotenv()
|
| 9 |
+
|
| 10 |
+
# Define state schema
|
| 11 |
+
class State(TypedDict):
|
| 12 |
+
input: str
|
| 13 |
+
input_type: str # "text" or "audio"
|
| 14 |
+
intent: str # "chat" or "translation"
|
| 15 |
+
transcription: str
|
| 16 |
+
translation: str
|
| 17 |
+
response: str
|
| 18 |
+
audio_response: bytes
|
| 19 |
+
|
| 20 |
+
def create_workflow() -> StateGraph:
|
| 21 |
+
# Initialize workflow graph
|
| 22 |
+
workflow = StateGraph(State)
|
| 23 |
+
|
| 24 |
+
# Define nodes (placeholder implementations)
|
| 25 |
+
|
| 26 |
+
def transcribe(state: State) -> State:
|
| 27 |
+
# TODO: Implement Whisper transcription
|
| 28 |
+
if state["input_type"] == "audio":
|
| 29 |
+
state["transcription"] = state["input"] # Placeholder
|
| 30 |
+
return state
|
| 31 |
+
|
| 32 |
+
def detect_intent(state: State) -> Annotated[State, str]:
|
| 33 |
+
# TODO: Implement proper intent detection
|
| 34 |
+
# For now, assume translation if input contains "translate:"
|
| 35 |
+
text = state["transcription"] if state["input_type"] == "audio" else state["input"]
|
| 36 |
+
if text.lower().startswith("translate:"):
|
| 37 |
+
state["intent"] = "translation"
|
| 38 |
+
return state, "translation"
|
| 39 |
+
state["intent"] = "chat"
|
| 40 |
+
return state, "chat"
|
| 41 |
+
|
| 42 |
+
def translate(state: State) -> State:
|
| 43 |
+
# TODO: Implement translation using Gemini
|
| 44 |
+
text = state["transcription"] if state["input_type"] == "audio" else state["input"]
|
| 45 |
+
text = text.replace("translate:", "").strip()
|
| 46 |
+
state["translation"] = f"Translation placeholder: {text}"
|
| 47 |
+
return state
|
| 48 |
+
|
| 49 |
+
def chat_response(state: State) -> State:
|
| 50 |
+
# TODO: Implement chat using Gemini
|
| 51 |
+
text = state["transcription"] if state["input_type"] == "audio" else state["input"]
|
| 52 |
+
state["response"] = f"Chat response placeholder: {text}"
|
| 53 |
+
return state
|
| 54 |
+
|
| 55 |
+
def synthesize_speech(state: State) -> State:
|
| 56 |
+
# TODO: Implement Coqui TTS
|
| 57 |
+
state["audio_response"] = b"" # Placeholder
|
| 58 |
+
return state
|
| 59 |
+
|
| 60 |
+
# Add nodes to graph
|
| 61 |
+
workflow.add_node("transcribe", transcribe)
|
| 62 |
+
workflow.add_node("detect_intent", detect_intent)
|
| 63 |
+
workflow.add_node("translate", translate)
|
| 64 |
+
workflow.add_node("chat_response", chat_response)
|
| 65 |
+
workflow.add_node("synthesize_speech", synthesize_speech)
|
| 66 |
+
|
| 67 |
+
# Define edges
|
| 68 |
+
workflow.add_edge("transcribe", "detect_intent")
|
| 69 |
+
workflow.add_conditional_edges(
|
| 70 |
+
"detect_intent",
|
| 71 |
+
{
|
| 72 |
+
"translation": "translate",
|
| 73 |
+
"chat": "chat_response",
|
| 74 |
+
}
|
| 75 |
+
)
|
| 76 |
+
workflow.add_edge("translate", "synthesize_speech")
|
| 77 |
+
workflow.add_edge("synthesize_speech", END)
|
| 78 |
+
workflow.add_edge("chat_response", END)
|
| 79 |
+
|
| 80 |
+
# Set entry point
|
| 81 |
+
workflow.set_entry_point("transcribe")
|
| 82 |
+
|
| 83 |
+
return workflow
|
| 84 |
+
|
| 85 |
+
# Create singleton instance
|
| 86 |
+
workflow = create_workflow()
|
streamlit_app/main.py
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
from pathlib import Path
|
| 3 |
+
import json
|
| 4 |
+
import os
|
| 5 |
+
from dotenv import load_dotenv
|
| 6 |
+
|
| 7 |
+
# Load environment variables
|
| 8 |
+
load_dotenv()
|
| 9 |
+
|
| 10 |
+
# Initialize session state
|
| 11 |
+
if "messages" not in st.session_state:
|
| 12 |
+
st.session_state.messages = []
|
| 13 |
+
if "dictionary" not in st.session_state:
|
| 14 |
+
st.session_state.dictionary = {}
|
| 15 |
+
|
| 16 |
+
def initialize_app():
|
| 17 |
+
st.set_page_config(
|
| 18 |
+
page_title="Polyglot - AI Language Learning Assistant",
|
| 19 |
+
page_icon="🗣️",
|
| 20 |
+
layout="wide"
|
| 21 |
+
)
|
| 22 |
+
st.title("🗣️ Polyglot - AI Language Learning Assistant")
|
| 23 |
+
|
| 24 |
+
def create_sidebar():
|
| 25 |
+
with st.sidebar:
|
| 26 |
+
st.title("Learning Tools")
|
| 27 |
+
|
| 28 |
+
if st.button("📚 Dictionary"):
|
| 29 |
+
# TODO: Implement dictionary view
|
| 30 |
+
st.session_state.current_view = "dictionary"
|
| 31 |
+
|
| 32 |
+
if st.button("📝 Phrases"):
|
| 33 |
+
# TODO: Implement phrases view
|
| 34 |
+
st.session_state.current_view = "phrases"
|
| 35 |
+
|
| 36 |
+
st.markdown("---")
|
| 37 |
+
if st.button("⬇️ Download Progress"):
|
| 38 |
+
# TODO: Implement download functionality
|
| 39 |
+
st.text_input("Enter your email:", key="download_email")
|
| 40 |
+
|
| 41 |
+
def display_chat_interface():
|
| 42 |
+
# Display chat messages
|
| 43 |
+
for message in st.session_state.messages:
|
| 44 |
+
with st.chat_message(message["role"]):
|
| 45 |
+
st.markdown(message["content"])
|
| 46 |
+
|
| 47 |
+
# Chat input
|
| 48 |
+
if prompt := st.chat_input("Type your message here..."):
|
| 49 |
+
# Add user message to chat history
|
| 50 |
+
st.session_state.messages.append({"role": "user", "content": prompt})
|
| 51 |
+
with st.chat_message("user"):
|
| 52 |
+
st.markdown(prompt)
|
| 53 |
+
|
| 54 |
+
# TODO: Process user input through LangGraph workflow
|
| 55 |
+
# For now, just echo the input
|
| 56 |
+
with st.chat_message("assistant"):
|
| 57 |
+
response = f"Echo: {prompt}"
|
| 58 |
+
st.markdown(response)
|
| 59 |
+
st.session_state.messages.append({"role": "assistant", "content": response})
|
| 60 |
+
|
| 61 |
+
def main():
|
| 62 |
+
initialize_app()
|
| 63 |
+
create_sidebar()
|
| 64 |
+
|
| 65 |
+
# Main chat interface
|
| 66 |
+
display_chat_interface()
|
| 67 |
+
|
| 68 |
+
# Audio input placeholder
|
| 69 |
+
with st.expander("🎤 Voice Input"):
|
| 70 |
+
st.write("Audio recording functionality coming soon...")
|
| 71 |
+
# TODO: Implement audio recording
|
| 72 |
+
|
| 73 |
+
if __name__ == "__main__":
|
| 74 |
+
main()
|
streamlit_app/state/dictionary_manager.py
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
from pathlib import Path
|
| 3 |
+
from typing import Dict, List, Optional
|
| 4 |
+
|
| 5 |
+
class DictionaryManager:
|
| 6 |
+
def __init__(self, storage_path: Optional[str] = None):
|
| 7 |
+
self.storage_path = Path(storage_path or "data/dictionary.json")
|
| 8 |
+
self.storage_path.parent.mkdir(parents=True, exist_ok=True)
|
| 9 |
+
self.dictionary: Dict[str, Dict[str, List[str]]] = self._load_dictionary()
|
| 10 |
+
|
| 11 |
+
def _load_dictionary(self) -> Dict[str, Dict[str, List[str]]]:
|
| 12 |
+
"""Load dictionary from file or create new if doesn't exist"""
|
| 13 |
+
if self.storage_path.exists():
|
| 14 |
+
with open(self.storage_path, 'r', encoding='utf-8') as f:
|
| 15 |
+
return json.load(f)
|
| 16 |
+
return {}
|
| 17 |
+
|
| 18 |
+
def _save_dictionary(self):
|
| 19 |
+
"""Save dictionary to file"""
|
| 20 |
+
with open(self.storage_path, 'w', encoding='utf-8') as f:
|
| 21 |
+
json.dump(self.dictionary, f, ensure_ascii=False, indent=2)
|
| 22 |
+
|
| 23 |
+
def add_entry(self, source_lang: str, target_lang: str, word: str, translation: str):
|
| 24 |
+
"""Add a word and its translation to the dictionary"""
|
| 25 |
+
if source_lang not in self.dictionary:
|
| 26 |
+
self.dictionary[source_lang] = {}
|
| 27 |
+
|
| 28 |
+
if target_lang not in self.dictionary[source_lang]:
|
| 29 |
+
self.dictionary[source_lang][target_lang] = []
|
| 30 |
+
|
| 31 |
+
entry = f"{word} → {translation}"
|
| 32 |
+
if entry not in self.dictionary[source_lang][target_lang]:
|
| 33 |
+
self.dictionary[source_lang][target_lang].append(entry)
|
| 34 |
+
self._save_dictionary()
|
| 35 |
+
|
| 36 |
+
def get_entries(self, source_lang: str, target_lang: str) -> List[str]:
|
| 37 |
+
"""Get all entries for a language pair"""
|
| 38 |
+
return self.dictionary.get(source_lang, {}).get(target_lang, [])
|
| 39 |
+
|
| 40 |
+
def export_dictionary(self) -> str:
|
| 41 |
+
"""Export dictionary as formatted string"""
|
| 42 |
+
output = []
|
| 43 |
+
for source_lang, translations in self.dictionary.items():
|
| 44 |
+
output.append(f"# {source_lang.upper()} Dictionary")
|
| 45 |
+
for target_lang, entries in translations.items():
|
| 46 |
+
output.append(f"\n## Translations to {target_lang.upper()}")
|
| 47 |
+
for entry in entries:
|
| 48 |
+
output.append(f"- {entry}")
|
| 49 |
+
output.append("\n")
|
| 50 |
+
return "\n".join(output)
|