Spaces:

AIEnthusiast369
/

Polyglot

Build error

App Files Files Community

AIEnthusiast369 commited on Feb 11, 2025

Commit

76ed5b1

1 Parent(s): 90b1256

updated instructions

Browse files

Files changed (9) hide show

.env.example +6 -0
README.md +50 -0
instructions/Architecture.MD +83 -0
instructions/Step_By_Step_Instructions.MD +93 -0
instructions/instructions.MD +8 -4
requirements.txt +9 -0
streamlit_app/graph/workflow.py +86 -0
streamlit_app/main.py +74 -0
streamlit_app/state/dictionary_manager.py +50 -0

.env.example ADDED Viewed

	@@ -0,0 +1,6 @@

+# Google API Key for Gemini
+GOOGLE_API_KEY=your_google_api_key
+# Optional: Coqui TTS model settings
+COQUI_MODEL_NAME=tts_models/en/ljspeech/tacotron2-DDC
+COQUI_VOCODER_NAME=vocoder_models/en/ljspeech/hifigan_v2

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+# Polyglot - AI Language Learning Assistant
+An AI-powered language learning assistant that helps users learn foreign languages through interactive chat. The application supports both text and voice interactions, leveraging state-of-the-art AI models for transcription, translation, and speech synthesis.
+## Features
+- Voice and text input support
+- Real-time translation
+- Interactive chat with AI language tutor
+- Personal dictionary and phrase storage
+- Text-to-speech for pronunciation practice
+## Tech Stack
+- Frontend: Streamlit
+- Backend: LangGraph, LangChain
+- Speech Processing: Whisper (STT), Coqui (TTS)
+- LLM: Google Gemini via LangChain
+## Setup
+1. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+2. Set up environment variables:
+Create a `.env` file with the following:
+```
+GOOGLE_API_KEY=your_google_api_key
+```
+3. Run the application:
+```bash
+streamlit run streamlit_app/main.py
+```
+## Project Structure
+```
+streamlit_app/
+├── graph/         # LangGraph implementation
+├── state/         # Application state management
+├── components/    # Streamlit components
+└── utils/         # Utility functions
+```
+## Contributing
+Feel free to submit issues and enhancement requests!

instructions/Architecture.MD ADDED Viewed

	@@ -0,0 +1,83 @@

+**I. Overall Architecture**
+Polyglot will adopt a modular architecture, separating the frontend (Streamlit app), backend (LangGraph workflows), and external services (Whisper, Coqui, LLM). This promotes maintainability, scalability, and testability.
+**II. Component Breakdown**
+1.  **Frontend (Streamlit App):**
+    *   Responsible for user interaction (text input, audio recording, chat display).
+    *   Handles UI elements like the chat interface, sidebar (dictionary, phrases), and download option.
+    *   Communicates with the backend via API calls.
+    *   Key files: `streamlit_app/main.py` (entry point), UI components in dedicated modules.
+2.  **Backend (LangGraph Workflows):**
+    *   Orchestrates the overall workflow, managing dependencies between different components.
+    *   Handles intent detection (chat vs. translation).
+    *   Manages conversation context (multi-turn conversations).
+    *   Persists user dictionaries.
+    *   Key files: `streamlit_app/graph/workflow.py`, `streamlit_app/graph/nodes.py` (individual processing steps).
+3.  **Audio Processing:**
+    *   **Whisper:** Transcribes audio to text.  Integrated as an external service called by the backend.
+    *   **Coqui:** Converts text to speech. Integrated as an external service called by the backend.
+4.  **LLM (LangChain's ChatGoogleGenerativeAI):**
+    *   Provides translation and conversational AI functionalities.
+    *   Integrated via LangChain.
+5.  **Data Storage:**
+    *   User dictionaries (words/phrases categorized by language) will be persisted.  The specific database to use is not defined, so I will assume a simple JSON file for now.
+**III. Workflow**
+The [instructions.MD](cci:7://file:///home/yoda/Library/Projects/Portfolio/Polyglot/instructions/instructions.MD:0:0-0:0) document describes two main workflows: Voice Input and Text Input. LangGraph will manage the state and transitions between steps in these workflows.
+**Voice Input Flow:**
+1.  Streamlit captures audio.
+2.  Streamlit sends audio to backend.
+3.  Backend uses Whisper to transcribe audio to text.
+4.  Backend determines intent (chat or translation).
+5.  If translation:
+    *   Backend detects the language.
+    *   Backend uses LLM to translate text.
+    *   Backend uses Coqui to convert translated text to speech.
+    *   Backend stores phrases/words in the dictionary.
+    *   Backend sends translated text and audio to Streamlit for display.
+6.  If chat:
+    *   Backend uses LLM to generate a response.
+    *   Backend sends text response to Streamlit for display.
+**Text Input Flow:**
+1.  Streamlit captures text.
+2.  Streamlit sends text to backend.
+3.  Backend determines intent (chat or translation).
+4.  If translation:
+    *   Backend detects the language.
+    *   Backend uses LLM to translate text.
+    *   Backend uses Coqui to convert translated text to speech.
+    *   Backend stores phrases/words in the dictionary.
+    *   Backend sends translated text and audio to Streamlit for display.
+5.  If chat:
+    *   Backend uses LLM to generate a response.
+    *   Backend sends text response to Streamlit for display.
+**IV. Key Considerations**
+*   **Error Handling:** Implement robust error handling for API failures in Whisper, Coqui, and LLM processing.
+*   **State Management:** LangGraph will be crucial for maintaining conversation context and managing workflow dependencies.
+*   **Persistence:** User dictionaries should be persisted to provide a personalized learning experience.
+*   **Scalability:** The architecture should be designed to handle a growing number of users and languages.
+**V. Leveraging Documentation**
+The [instructions.MD](cci:7://file:///home/yoda/Library/Projects/Portfolio/Polyglot/instructions/instructions.MD:0:0-0:0) file provides links to relevant documentation for Streamlit, Whisper, Coqui, LangGraph, and LangChain. These resources will be essential for implementing the project.
+* [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
+* [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
+*   [LangChain's ChatGoogleGenerativeAI Documentation](https://docs.litellm.ai/docs/)
+*   [Streamlit Chat Guide](https://docs.streamlit.io/develop/tutorials/chat-and-llm-apps/build-conversational-apps)
+*   [Whisper (Speech-to-Text)](https://github.com/openai/whisper)
+*   [Coqui (Text-to-Speech)](https://docs.coqui.ai/en/latest/)
+*   [Langgraph - cross-thread persistence](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)

instructions/Step_By_Step_Instructions.MD ADDED Viewed

	@@ -0,0 +1,93 @@

+**Phase 1: Project Setup and Core Dependencies**
+1.  **Create Directory Structure:**
+    *   Create the following directory structure inside `streamlit_app`:
+        ```
+        streamlit_app/
+        ├── graph/         # LangGraph implementation
+        ├── state/         # Application state management
+        ├── components/    # Streamlit components
+        ├── utils/         # Utility functions
+        ```
+**Phase 2: Frontend (Streamlit App) Development**
+1.  **Create `streamlit_app/main.py`:**
+    *   Set up the basic Streamlit app structure.
+    *   Add a title and a simple chat interface with a text input field.
+2.  **Implement Chat Interface:**
+    *   Create a function to display chat messages.
+    *   Create a function to handle user input and send it to the backend.
+    *   Use Streamlit's `st.session_state` to manage chat history.
+3.  **Implement Audio Recording:**
+    *   Use Streamlit's audio recording component (or a custom component) to allow users to record audio.
+    *   Handle audio data and send it to the backend.
+4.  **Implement Sidebar:**
+    *   Add a sidebar with "Dictionary" and "Phrases" buttons.
+    *   Create placeholder functions for dictionary and phrases views.
+    *   Implement the "Download" option with email prompt (basic implementation, can be refined later).
+**Phase 3: Backend (LangGraph) Development**
+1.  **Create `streamlit_app/graph/workflow.py`:**
+    *   Set up the basic LangGraph workflow.
+    *   Define the nodes for transcription, intent detection, translation, speech synthesis, and response generation.
+2.  **Implement Transcription Node:**
+    *   Integrate Whisper to transcribe audio to text.
+    *   Handle API calls to Whisper.
+3.  **Implement Intent Detection Node:**
+    *   Use a simple rule-based approach or a small LLM to determine if the user's intent is chat or translation.
+4.  **Implement Translation Node:**
+    *   Integrate LangChain's ChatGoogleGenerativeAI to translate text.
+    *   Handle API calls to the LLM.
+5.  **Implement Speech Synthesis Node:**
+    *   Integrate Coqui to convert translated text to speech.
+    *   Handle API calls to Coqui.
+6.  **Implement Response Generation Node:**
+    *   Use LangChain's ChatGoogleGenerativeAI to generate chat responses.
+    *   Handle API calls to the LLM.
+7.  **Implement Dictionary Storage:**
+    *   Create a function to store phrases and words in a JSON file, categorized by language.
+**Phase 4: Integration and Testing**
+1.  **Connect Frontend and Backend:**
+    *   Implement API endpoints in the backend to receive text and audio data from the frontend.
+    *   Send data from the frontend to the backend using HTTP requests.
+2.  **Test Voice Input Flow:**
+    *   Record audio in the frontend and verify that it is transcribed correctly, translated (if applicable), and played back.
+3.  **Test Text Input Flow:**
+    *   Enter text in the frontend and verify that it is translated (if applicable) and displayed correctly.
+4.  **Test Chat Mode:**
+    *   Engage in a conversation with the chatbot and verify that it generates appropriate responses.
+5.  **Test Dictionary Functionality:**
+    *   Add words and phrases to the dictionary and verify that they are stored correctly.
+    *   Test the "Download" option.
+**Phase 5: Refinement and Enhancements**
+1.  **Implement Error Handling:**
+    *   Add error handling for API failures in Whisper, Coqui, and LLM processing.
+2.  **Optimize Performance:**
+    *   Optimize the performance of the transcription, translation, and speech synthesis processes.
+3.  **Improve UI:**
+    *   Refine the UI based on user feedback.
+4.  **Implement Future Enhancements:**
+    *   Implement multi-language support for the UI and translations.
+    *   Integrate with spaced repetition for better vocabulary retention.
+    *   Implement speech recognition feedback to correct pronunciation.

instructions/instructions.MD CHANGED Viewed

@@ -65,21 +65,21 @@ streamlit_app/
 1. User records audio in Streamlit.
 2. Whisper transcribes audio to text.
 3. The system checks for intent (chat or translation request).The system checks for intent (chat or translation request).
-   ## If translation:
    - The system detects the language to translate text to.
    - Translate text via LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
    - Convert translated text to speech using Coqui.
    - Store phrases/words categorized by language for learning.
-4. If chat:
    - Generate a response using LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
    - Display text response.
 #### **3.2 Text Input Flow**
 1. User enters text manually.
-2. System detects the language of the input text.
 3. System determines intent (chat or translation).
 4. If translation:
    - LLM translates text (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
    - Display translated text.
    - Convert to speech via Coqui.
@@ -120,6 +120,11 @@ streamlit_app/
   - Example: `examples/coqui-app.py`
 - **LLM Processing:**
   - [LangChain's ChatGoogleGenerativeAI Documentation](https://docs.litellm.ai/docs/)
@@ -128,7 +133,6 @@ streamlit_app/
 ## Developer Notes
-- Ensure **low latency** in voice-to-text and text-to-voice conversion.
 - Implement **error handling** for API failures in Whisper, Coqui, and LLM processing.
 - Use **LangGraph** to maintain conversation context and manage workflow dependencies.
 - **Persist user dictionary** (words/phrases) categorized by language for personalized learning.

 1. User records audio in Streamlit.
 2. Whisper transcribes audio to text.
 3. The system checks for intent (chat or translation request).The system checks for intent (chat or translation request).
+   If translation:
    - The system detects the language to translate text to.
    - Translate text via LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
    - Convert translated text to speech using Coqui.
    - Store phrases/words categorized by language for learning.
+  If chat:
    - Generate a response using LLM (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
    - Display text response.
 #### **3.2 Text Input Flow**
 1. User enters text manually.
 3. System determines intent (chat or translation).
 4. If translation:
+   - The system detects the language to translate text to.
    - LLM translates text (LangChain's ChatGoogleGenerativeAI - Gemini-2.0-Flash).
    - Display translated text.
    - Convert to speech via Coqui.
   - Example: `examples/coqui-app.py`
+- **Langgraph**
+  - [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
+  - [How to add cross-thread persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/cross-thread-persistence/)
 - **LLM Processing:**
   - [LangChain's ChatGoogleGenerativeAI Documentation](https://docs.litellm.ai/docs/)
 ## Developer Notes
 - Implement **error handling** for API failures in Whisper, Coqui, and LLM processing.
 - Use **LangGraph** to maintain conversation context and manage workflow dependencies.
 - **Persist user dictionary** (words/phrases) categorized by language for personalized learning.

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+streamlit>=1.30.0
+langgraph>=0.0.15
+langchain>=0.1.0
+langchain-google-genai>=0.0.6
+openai-whisper>=20231117
+TTS>=0.22.0  # Coqui TTS
+google-cloud-texttospeech>=2.14.1
+python-dotenv>=1.0.0
+requests>=2.31.0

streamlit_app/graph/workflow.py ADDED Viewed

	@@ -0,0 +1,86 @@

+from typing import Dict, TypedDict, Annotated
+from langgraph.graph import StateGraph, END
+from langchain_google_genai import ChatGoogleGenerativeAI
+from dotenv import load_dotenv
+import os
+# Load environment variables
+load_dotenv()
+# Define state schema
+class State(TypedDict):
+    input: str
+    input_type: str  # "text" or "audio"
+    intent: str  # "chat" or "translation"
+    transcription: str
+    translation: str
+    response: str
+    audio_response: bytes
+def create_workflow() -> StateGraph:
+    # Initialize workflow graph
+    workflow = StateGraph(State)
+    # Define nodes (placeholder implementations)
+    def transcribe(state: State) -> State:
+        # TODO: Implement Whisper transcription
+        if state["input_type"] == "audio":
+            state["transcription"] = state["input"]  # Placeholder
+        return state
+    def detect_intent(state: State) -> Annotated[State, str]:
+        # TODO: Implement proper intent detection
+        # For now, assume translation if input contains "translate:"
+        text = state["transcription"] if state["input_type"] == "audio" else state["input"]
+        if text.lower().startswith("translate:"):
+            state["intent"] = "translation"
+            return state, "translation"
+        state["intent"] = "chat"
+        return state, "chat"
+    def translate(state: State) -> State:
+        # TODO: Implement translation using Gemini
+        text = state["transcription"] if state["input_type"] == "audio" else state["input"]
+        text = text.replace("translate:", "").strip()
+        state["translation"] = f"Translation placeholder: {text}"
+        return state
+    def chat_response(state: State) -> State:
+        # TODO: Implement chat using Gemini
+        text = state["transcription"] if state["input_type"] == "audio" else state["input"]
+        state["response"] = f"Chat response placeholder: {text}"
+        return state
+    def synthesize_speech(state: State) -> State:
+        # TODO: Implement Coqui TTS
+        state["audio_response"] = b""  # Placeholder
+        return state
+    # Add nodes to graph
+    workflow.add_node("transcribe", transcribe)
+    workflow.add_node("detect_intent", detect_intent)
+    workflow.add_node("translate", translate)
+    workflow.add_node("chat_response", chat_response)
+    workflow.add_node("synthesize_speech", synthesize_speech)
+    # Define edges
+    workflow.add_edge("transcribe", "detect_intent")
+    workflow.add_conditional_edges(
+        "detect_intent",
+        {
+            "translation": "translate",
+            "chat": "chat_response",
+        }
+    )
+    workflow.add_edge("translate", "synthesize_speech")
+    workflow.add_edge("synthesize_speech", END)
+    workflow.add_edge("chat_response", END)
+    # Set entry point
+    workflow.set_entry_point("transcribe")
+    return workflow
+# Create singleton instance
+workflow = create_workflow()

streamlit_app/main.py ADDED Viewed

	@@ -0,0 +1,74 @@

+import streamlit as st
+from pathlib import Path
+import json
+import os
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Initialize session state
+if "messages" not in st.session_state:
+    st.session_state.messages = []
+if "dictionary" not in st.session_state:
+    st.session_state.dictionary = {}
+def initialize_app():
+    st.set_page_config(
+        page_title="Polyglot - AI Language Learning Assistant",
+        page_icon="🗣️",
+        layout="wide"
+    )
+    st.title("🗣️ Polyglot - AI Language Learning Assistant")
+def create_sidebar():
+    with st.sidebar:
+        st.title("Learning Tools")
+        if st.button("📚 Dictionary"):
+            # TODO: Implement dictionary view
+            st.session_state.current_view = "dictionary"
+        if st.button("📝 Phrases"):
+            # TODO: Implement phrases view
+            st.session_state.current_view = "phrases"
+        st.markdown("---")
+        if st.button("⬇️ Download Progress"):
+            # TODO: Implement download functionality
+            st.text_input("Enter your email:", key="download_email")
+def display_chat_interface():
+    # Display chat messages
+    for message in st.session_state.messages:
+        with st.chat_message(message["role"]):
+            st.markdown(message["content"])
+    # Chat input
+    if prompt := st.chat_input("Type your message here..."):
+        # Add user message to chat history
+        st.session_state.messages.append({"role": "user", "content": prompt})
+        with st.chat_message("user"):
+            st.markdown(prompt)
+        # TODO: Process user input through LangGraph workflow
+        # For now, just echo the input
+        with st.chat_message("assistant"):
+            response = f"Echo: {prompt}"
+            st.markdown(response)
+            st.session_state.messages.append({"role": "assistant", "content": response})
+def main():
+    initialize_app()
+    create_sidebar()
+    # Main chat interface
+    display_chat_interface()
+    # Audio input placeholder
+    with st.expander("🎤 Voice Input"):
+        st.write("Audio recording functionality coming soon...")
+        # TODO: Implement audio recording
+if __name__ == "__main__":
+    main()

streamlit_app/state/dictionary_manager.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import json
+from pathlib import Path
+from typing import Dict, List, Optional
+class DictionaryManager:
+    def __init__(self, storage_path: Optional[str] = None):
+        self.storage_path = Path(storage_path or "data/dictionary.json")
+        self.storage_path.parent.mkdir(parents=True, exist_ok=True)
+        self.dictionary: Dict[str, Dict[str, List[str]]] = self._load_dictionary()
+    def _load_dictionary(self) -> Dict[str, Dict[str, List[str]]]:
+        """Load dictionary from file or create new if doesn't exist"""
+        if self.storage_path.exists():
+            with open(self.storage_path, 'r', encoding='utf-8') as f:
+                return json.load(f)
+        return {}
+    def _save_dictionary(self):
+        """Save dictionary to file"""
+        with open(self.storage_path, 'w', encoding='utf-8') as f:
+            json.dump(self.dictionary, f, ensure_ascii=False, indent=2)
+    def add_entry(self, source_lang: str, target_lang: str, word: str, translation: str):
+        """Add a word and its translation to the dictionary"""
+        if source_lang not in self.dictionary:
+            self.dictionary[source_lang] = {}
+        if target_lang not in self.dictionary[source_lang]:
+            self.dictionary[source_lang][target_lang] = []
+        entry = f"{word} → {translation}"
+        if entry not in self.dictionary[source_lang][target_lang]:
+            self.dictionary[source_lang][target_lang].append(entry)
+            self._save_dictionary()
+    def get_entries(self, source_lang: str, target_lang: str) -> List[str]:
+        """Get all entries for a language pair"""
+        return self.dictionary.get(source_lang, {}).get(target_lang, [])
+    def export_dictionary(self) -> str:
+        """Export dictionary as formatted string"""
+        output = []
+        for source_lang, translations in self.dictionary.items():
+            output.append(f"# {source_lang.upper()} Dictionary")
+            for target_lang, entries in translations.items():
+                output.append(f"\n## Translations to {target_lang.upper()}")
+                for entry in entries:
+                    output.append(f"- {entry}")
+            output.append("\n")
+        return "\n".join(output)