yomitalk

Sleeping

KyosukeIchikawa commited on Jun 6, 2025

Commit

5e96c67

1 Parent(s): 03fd559

feat: Implement comprehensive session persistence system

- Add session state serialization and deserialization methods to UserSession
- Implement file-based session persistence with JSON storage
- Update app.py to restore existing sessions on browser reconnection
- Add automatic session saving on all state changes
- Exclude API keys from persistence for security
- Add comprehensive session persistence test suite
- Update documentation with session management architecture
- Add CLAUDE.md reference to design.md
- Add development rule prohibiting --no-verify

Files changed (7) hide show

CLAUDE.md +165 -0
docs/design.md +33 -0
tests/unit/test_cleanup_old_sessions.py +2 -2
tests/unit/test_session_persistence.py +319 -0
tests/unit/test_text_processor.py +8 -3
yomitalk/app.py +23 -2
yomitalk/user_session.py +189 -1

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,165 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## 📖 **Important**: Read the Design Document First
+Before working on this codebase, **read [docs/design.md](docs/design.md)** for comprehensive architectural overview, including:
+- Session management system design
+- Component architecture and patterns
+- Multi-user session isolation
+- State persistence implementation
+- Testing strategies and patterns
+## Essential Commands
+### Setup and Environment
+```bash
+make setup              # Complete setup: deps, VOICEVOX, lint tools, pre-commit
+make venv              # Create virtual environment only
+make install           # Install Python packages only
+make download-voicevox-core  # Download VOICEVOX Core for audio generation
+```
+### Development
+```bash
+make run               # Start the Gradio application on port 7860
+make lint              # Run flake8 and mypy static analysis
+make format            # Auto-format code with black, isort, autoflake, autopep8
+```
+### Testing
+```bash
+make test              # Run all tests (unit + E2E)
+make test-unit         # Run unit tests only
+make test-e2e          # Run E2E tests (sets E2E_TEST_MODE=true)
+make test-staged       # Run tests only for staged files
+```
+### Pre-commit Hooks
+```bash
+make pre-commit-install  # Install pre-commit hooks
+make pre-commit-run     # Run pre-commit hooks manually
+```
+## Architecture Overview
+**📋 For detailed architecture information, see [docs/design.md](docs/design.md)**
+### Session-Based Multi-User Design
+- **UserSession**: Each user gets isolated state with unique session directories
+- **Global Resources**: VOICEVOX Core manager shared across users for performance
+- **Session Cleanup**: Automatic cleanup of sessions older than 1 day
+- **File Isolation**: Per-session temp/output directories under `data/{temp,output}/{session_id}/`
+- **State Persistence**: Automatic save/restore of session state via JSON serialization
+### Component Architecture
+The codebase follows a clean component separation:
+- **TextProcessor** (`yomitalk/components/text_processor.py`): LLM integration and script generation
+- **AudioGenerator** (`yomitalk/components/audio_generator.py`): VOICEVOX integration and audio synthesis
+- **ContentExtractor** (`yomitalk/components/content_extractor.py`): File/URL content extraction
+- **PromptManager** (`yomitalk/prompt_manager.py`): Template-based prompt generation
+### Dual LLM Support
+- **Unified Interface**: Both OpenAI and Gemini models implement the same interface
+- **Runtime Switching**: Users can switch between APIs during their session
+- **Template System**: Jinja2 templates in `yomitalk/templates/` for different document types
+- **Character Mapping**: Dynamic character assignment for dialogue generation
+### Streaming Audio Pipeline
+The audio generation follows a streaming pattern:
+1. **Script Generation**: LLM creates character dialogue
+2. **Character Extraction**: Parse dialogue into character-specific segments
+3. **Streaming Synthesis**: VOICEVOX generates audio chunks yielded immediately
+4. **Final Combination**: In-memory WAV combination for complete audio file
+### Session Persistence System
+- **Automatic Save/Restore**: All user settings persist across browser sessions
+- **Security**: API keys excluded from persistence for security reasons
+- **File Storage**: Session state saved to `data/temp/{session_id}/session_state.json`
+- **Auto-Save Triggers**: Every setting change automatically saves session state
+- **Restoration Info**: Methods to detect missing API keys and session status
+### Key Design Patterns
+- **Session Dependency Injection**: UserSession owns and manages component instances
+- **Enum-Driven Configuration**: Type-safe configuration via Character, DocumentType, PodcastMode enums
+- **Global Singleton**: VOICEVOX Core manager initialized once at startup
+- **Template-Based Generation**: Jinja2 templates for flexible content generation
+## Testing Structure
+### Test Organization
+- **Unit Tests** (`tests/unit/`): Component isolation with mocking
+- **E2E Tests** (`tests/e2e/`): Full user workflows with BDD (Gherkin features)
+- **Playwright Integration**: Browser automation for E2E testing
+- **Test Data**: Isolated test data directories per test type
+### BDD Features
+Located in `tests/e2e/features/`, written in Gherkin syntax:
+- `audio_generation.feature`
+- `file_upload.feature`
+- `script_generation.feature`
+- `text_management.feature`
+- `url_extraction.feature`
+- `voicevox_sharing.feature`
+## Important Implementation Notes
+### VOICEVOX Integration
+- **Global Manager**: One instance shared across all users (expensive to initialize)
+- **Character Support**: Zundamon, Shikoku Metan, Kyushu Sora, Chugoku Usagi, Chubu Tsurugi
+- **English Handling**: Automatic katakana conversion for technical terms
+- **Natural Speech**: Smart word splitting to avoid robotic delivery
+### Session Management
+- **Isolation**: Each user gets completely isolated file system and state
+- **Cleanup**: Automatic cleanup prevents disk space issues
+- **State Persistence**: Audio generation state, LLM configuration maintained per session
+### Error Handling Patterns
+- **Graceful Degradation**: Components fail gracefully with user-friendly messages
+- **Resource Cleanup**: Proper cleanup of session files and temporary data
+- **API Resilience**: Handle LLM API failures and VOICEVOX errors appropriately
+### Development Workflow
+- **TDD Approach**: Write tests before implementation (per project rules)
+- **Trunk-Based Development**: Direct commits to main branch
+- **No --no-verify**: Pre-commit hooks must always run
+- **English Comments**: Code comments and logs in English
+- **Small Commits**: Frequent, small commits preferred
+## Working with This Codebase
+### When Adding Features
+1. **Start with Tests**: Write unit tests first (TDD approach)
+2. **Respect Session Boundaries**: Work within UserSession context
+3. **Use Components**: Leverage existing TextProcessor, AudioGenerator, ContentExtractor
+4. **Follow Templates**: Use PromptManager for any LLM interactions
+5. **Handle Both APIs**: Ensure new features work with both OpenAI and Gemini
+6. **Add Auto-Save**: If your feature modifies session state, add `user_session.auto_save()` calls
+### When Debugging
+1. **Check Session State**: User issues often relate to session-specific state
+2. **Component Boundaries**: Verify component interactions work correctly
+3. **VOICEVOX Status**: Audio issues usually relate to VOICEVOX Core availability
+4. **Template Rendering**: Script generation issues often in Jinja2 templates
+### Development Rules
+- **NEVER use `--no-verify`**: All commits must pass pre-commit hooks
+- **Fix issues properly**: Don't bypass linting, formatting, or type checking
+- **Test before commit**: Ensure all tests pass before committing
+### Performance Considerations
+- **VOICEVOX Shared**: Don't reinitialize VOICEVOX Core per user
+- **Session Cleanup**: Old sessions auto-cleanup, but manual cleanup may be needed
+- **Memory Usage**: Audio generation can be memory-intensive with long content
+- **Streaming**: Use streaming patterns for better user experience
+### File Structure Key Points
+- **Session Directories**: `data/temp/{session_id}/` and `data/output/{session_id}/`
+- **Session State**: `data/temp/{session_id}/session_state.json` for persistence
+- **Templates**: `yomitalk/templates/*.j2` for prompt generation
+- **Components**: `yomitalk/components/` for core functionality
+- **Models**: `yomitalk/models/` for LLM integrations
+- **Common**: `yomitalk/common/` for enums and shared types
+- **Session Management**: `yomitalk/user_session.py` for state persistence

docs/design.md CHANGED Viewed

@@ -16,6 +16,28 @@
 - pytest/pytest-bdd: テスト自動化とBDDによるE2Eテスト
 - playwright: ブラウザ自動化によるE2Eテスト
 ## フォルダ構成
 - yomitalk/ - メインアプリケーションコード
   - common/ - 共通データモデルおよび定義
@@ -30,6 +52,7 @@
   - utils/ - ユーティリティ関数
   - app.py - Gradioアプリ構築
   - prompt_manager.py - プロンプト管理および生成
   - templates/ - テンプレートファイル
     - common_podcast_utils.j2 - 共通のポッドキャスト生成ユーティリティ
     - paper_to_podcast.j2 - 論文解説用テンプレート
@@ -43,7 +66,11 @@
   - favicon.ico - ファビコン
 - data/ - 一時データ保存用
   - temp/ - アップロードされたファイルの一時保存
   - output/ - 生成された音声ファイル
   - logs/ - ログファイル保存用
 - tests/ - テストコード
   - data/ - テスト用データ
@@ -82,6 +109,11 @@
    - OpenAI APIとGoogle Gemini APIの切り替え機能
    - 各APIのモデル選択とパラメータ調整機能
    - トークン使用状況の表示機能
 ## コーディング規則
 - PEP 8準拠のPythonコード
@@ -110,6 +142,7 @@
 - ユニットテストによる各コンポーネントの個別検証
   - テストファイルは `tests/unit/` ディレクトリに配置
   - 各クラス・モジュールごとに独立したテストファイルを作成
 - モックを使用したAPIのテスト（OpenAI API、Gemini API）
 - テスト用のサンプルPDFおよびテキストデータを用意した自動テスト
 - GitHubワークフローによるCI自動実行

 - pytest/pytest-bdd: テスト自動化とBDDによるE2Eテスト
 - playwright: ブラウザ自動化によるE2Eテスト
+## アーキテクチャ概要
+### セッション管理システム
+- **マルチユーザー対応**: 各ユーザーがGradioセッションハッシュに基づく独立したセッション状態を保持
+- **状態の永続化**: ユーザー設定とセッション状態をJSONファイルとして自動保存・復元
+- **セキュリティ配慮**: APIキーは保存せず、セッション復元時に再入力を要求
+- **自動クリーンアップ**: 1日以上古いセッションディレクトリの自動削除
+### コンポーネント設計
+- **UserSession**: セッション管理のコアクラス
+  - 各ユーザーの独立したTextProcessorとAudioGeneratorインスタンスを管理
+  - セッション状態のシリアライゼーション・デシリアライゼーション機能
+  - 音声生成進捗の追跡と復元機能
+- **グローバルリソース管理**: VOICEVOX Coreマネージャーは全ユーザー間で共有
+- **ファイル分離**: ユーザーごとに独立したtempおよびoutputディレクトリ構造
+### 状態管理パターン
+- **Gradio State**: `gr.State()`を使用したセッション状態の管理
+- **自動保存**: 設定変更時の自動セッション保存
+- **復元処理**: アプリケーション開始時の既存セッション検出と復元
+- **エラーハンドリング**: セッション復元失敗時の新規セッション作成
 ## フォルダ構成
 - yomitalk/ - メインアプリケーションコード
   - common/ - 共通データモデルおよび定義
   - utils/ - ユーティリティ関数
   - app.py - Gradioアプリ構築
   - prompt_manager.py - プロンプト管理および生成
+  - user_session.py - ユーザーセッション管理とステート永続化
   - templates/ - テンプレートファイル
     - common_podcast_utils.j2 - 共通のポッドキャスト生成ユーティリティ
     - paper_to_podcast.j2 - 論文解説用テンプレート
   - favicon.ico - ファビコン
 - data/ - 一時データ保存用
   - temp/ - アップロードされたファイルの一時保存
+    - {session_id}/ - ユーザーセッションごとの一時ディレクトリ
+      - session_state.json - セッション状態の永続化ファイル
+      - talks/ - 音声生成パーツの一時保存
   - output/ - 生成された音声ファイル
+    - {session_id}/ - ユーザーセッションごとの出力ディレクトリ
   - logs/ - ログファイル保存用
 - tests/ - テストコード
   - data/ - テスト用データ
    - OpenAI APIとGoogle Gemini APIの切り替え機能
    - 各APIのモデル選択とパラメータ調整機能
    - トークン使用状況の表示機能
+9. セッション状態の永続化
+   - ユーザーの設定やセッション状態の自動保存・復元機能
+   - ブラウザリフレッシュや接続断後の状態継続
+   - API キー以外の全設定（ドキュメントタイプ、モデル設定、キャラクター等）の保持
+   - 音声生成進捗状況の復元機能
 ## コーディング規則
 - PEP 8準拠のPythonコード
 - ユニットテストによる各コンポーネントの個別検証
   - テストファイルは `tests/unit/` ディレクトリに配置
   - 各クラス・モジュールごとに独立したテストファイルを作成
+  - セッション永続化機能のテスト（`test_session_persistence.py`）
 - モックを使用したAPIのテスト（OpenAI API、Gemini API）
 - テスト用のサンプルPDFおよびテキストデータを用意した自動テスト
 - GitHubワークフローによるCI自動実行

tests/unit/test_cleanup_old_sessions.py CHANGED Viewed

@@ -47,8 +47,8 @@ class TestSessionCleanup:
         user_session = UserSession("test_session_cleanup")
         # グローバル変数をパッチしてテスト用ディレクトリを使用
-        with patch("yomitalk.session.BASE_TEMP_DIR", test_temp_dir), patch(
-            "yomitalk.session.BASE_OUTPUT_DIR", test_output_dir
         ):
             # 現在の時刻を取得
             current_time = int(time.time())

         user_session = UserSession("test_session_cleanup")
         # グローバル変数をパッチしてテスト用ディレクトリを使用
+        with patch("yomitalk.user_session.BASE_TEMP_DIR", test_temp_dir), patch(
+            "yomitalk.user_session.BASE_OUTPUT_DIR", test_output_dir
         ):
             # 現在の時刻を取得
             current_time = int(time.time())

tests/unit/test_session_persistence.py ADDED Viewed

	@@ -0,0 +1,319 @@

+"""Test session persistence functionality."""
+import json
+import os
+import shutil
+import tempfile
+from pathlib import Path
+from unittest.mock import patch
+import pytest
+from yomitalk.common import APIType
+from yomitalk.prompt_manager import DocumentType, PodcastMode
+from yomitalk.user_session import UserSession
+class TestSessionPersistence:
+    """Test session persistence functionality."""
+    @pytest.fixture
+    def temp_session_dir(self):
+        """Create a temporary directory for session testing."""
+        temp_dir = Path(tempfile.mkdtemp())
+        yield temp_dir
+        if temp_dir.exists():
+            shutil.rmtree(temp_dir)
+    def test_session_serialization_to_dict(self):
+        """Test session serialization to dictionary."""
+        session = UserSession("test_serialization")
+        # Configure some settings - set API key first to enable API type setting
+        session.text_processor.set_openai_api_key("test_key")
+        session.text_processor.set_api_type(APIType.OPENAI)
+        session.text_processor.set_document_type(DocumentType.PAPER)
+        session.text_processor.set_podcast_mode("section_by_section")
+        session.text_processor.openai_model.set_max_tokens(2000)
+        session.text_processor.openai_model.set_model_name("gpt-4.1-mini")
+        # Serialize to dict
+        session_dict = session.to_dict()
+        # Verify basic structure
+        assert "session_id" in session_dict
+        assert "audio_generation_state" in session_dict
+        assert "text_processor_state" in session_dict
+        assert "last_save_time" in session_dict
+        # Verify session ID
+        assert session_dict["session_id"] == "test_serialization"
+        # Verify text processor state
+        text_state = session_dict["text_processor_state"]
+        assert text_state["current_api_type"] == APIType.OPENAI.value
+        assert text_state["openai_max_tokens"] == 2000
+        assert text_state["openai_model_name"] == "gpt-4.1-mini"
+        # Verify prompt manager state
+        pm_state = text_state["prompt_manager_state"]
+        assert pm_state["current_document_type"] == DocumentType.PAPER.value
+        assert pm_state["current_mode"] == PodcastMode.SECTION_BY_SECTION.value
+    def test_session_deserialization_from_dict(self):
+        """Test session restoration from dictionary."""
+        # Create original session
+        original_session = UserSession("test_deserialization")
+        original_session.text_processor.set_gemini_api_key("test_key")
+        original_session.text_processor.set_api_type(APIType.GEMINI)
+        original_session.text_processor.set_document_type(DocumentType.MANUAL)
+        original_session.text_processor.set_podcast_mode("standard")
+        original_session.text_processor.gemini_model.set_max_tokens(1500)
+        original_session.text_processor.gemini_model.set_model_name(
+            "gemini-2.5-pro-preview-05-06"
+        )
+        # Serialize to dict
+        session_dict = original_session.to_dict()
+        # Restore from dict
+        restored_session = UserSession.from_dict(session_dict)
+        # Verify session was restored correctly
+        assert restored_session.session_id == "test_deserialization"
+        assert restored_session.text_processor.current_api_type == APIType.GEMINI
+        assert (
+            restored_session.text_processor.prompt_manager.current_document_type
+            == DocumentType.MANUAL
+        )
+        assert (
+            restored_session.text_processor.prompt_manager.current_mode
+            == PodcastMode.STANDARD
+        )
+        assert restored_session.text_processor.gemini_model.get_max_tokens() == 1500
+        assert (
+            restored_session.text_processor.gemini_model.model_name
+            == "gemini-2.5-pro-preview-05-06"
+        )
+    def test_session_file_save_and_load(self, temp_session_dir):
+        """Test session save/load to/from file."""
+        # Patch the base directories to use temp directory
+        with patch("yomitalk.user_session.BASE_TEMP_DIR", temp_session_dir):
+            # Create and configure session
+            session = UserSession("test_file_persistence")
+            session.text_processor.set_openai_api_key("test_key")
+            session.text_processor.set_api_type(APIType.OPENAI)
+            session.text_processor.set_document_type(DocumentType.BLOG)
+            session.text_processor.openai_model.set_max_tokens(3000)
+            # Update audio state
+            session.update_audio_generation_state(
+                status="generating", progress=0.5, current_script="Test script content"
+            )
+            # Save to file
+            success = session.save_to_file()
+            assert success is True
+            # Verify file was created
+            session_file = (
+                temp_session_dir / "test_file_persistence" / "session_state.json"
+            )
+            assert session_file.exists()
+            # Verify file content is valid JSON
+            with open(session_file, "r") as f:
+                saved_data = json.load(f)
+            assert saved_data["session_id"] == "test_file_persistence"
+            # Load session from file
+            loaded_session = UserSession.load_from_file("test_file_persistence")
+            assert loaded_session is not None
+            assert loaded_session.session_id == "test_file_persistence"
+            assert loaded_session.text_processor.current_api_type == APIType.OPENAI
+            assert (
+                loaded_session.text_processor.prompt_manager.current_document_type
+                == DocumentType.BLOG
+            )
+            assert loaded_session.text_processor.openai_model.get_max_tokens() == 3000
+            # Verify audio state was restored
+            audio_state = loaded_session.get_audio_generation_status()
+            assert audio_state["status"] == "generating"
+            assert audio_state["progress"] == 0.5
+            assert audio_state["current_script"] == "Test script content"
+    def test_session_load_nonexistent_file(self, temp_session_dir):
+        """Test loading a session that doesn't exist."""
+        with patch("yomitalk.user_session.BASE_TEMP_DIR", temp_session_dir):
+            loaded_session = UserSession.load_from_file("nonexistent_session")
+            assert loaded_session is None
+    def test_session_auto_save(self, temp_session_dir):
+        """Test automatic session saving."""
+        with patch("yomitalk.user_session.BASE_TEMP_DIR", temp_session_dir):
+            session = UserSession("test_auto_save")
+            # Auto-save should be triggered when creating new session
+            session_file = temp_session_dir / "test_auto_save" / "session_state.json"
+            # Note: auto_save is called in the creation process
+            # Trigger auto-save by updating state
+            session.update_audio_generation_state(status="completed")
+            # File should exist after auto-save
+            assert session_file.exists()
+            # Verify content was saved
+            with open(session_file, "r") as f:
+                saved_data = json.load(f)
+            assert saved_data["audio_generation_state"]["status"] == "completed"
+    def test_session_restoration_info(self):
+        """Test session restoration information."""
+        # Clear any environment API keys for this test
+        with patch.dict(
+            os.environ, {"OPENAI_API_KEY": "", "GOOGLE_API_KEY": ""}, clear=False
+        ):
+            session = UserSession("test_restoration_info")
+            # Get restoration info
+            info = session.get_session_restoration_info()
+            # Verify structure
+            assert "session_id" in info
+            assert "missing_api_keys" in info
+            assert "current_api_type" in info
+            assert "has_generated_audio" in info
+            assert "last_save_time" in info
+            # Verify values (initially no API keys or audio)
+            assert info["session_id"] == "test_restoration_info"
+            assert info["missing_api_keys"]["openai"] is True  # No API key set
+            assert info["missing_api_keys"]["gemini"] is True  # No API key set
+            assert info["has_generated_audio"] is False  # No audio generated
+            # Set an API key and check that it's no longer missing
+            session.text_processor.set_openai_api_key("test_key")
+            session.text_processor.set_api_type(APIType.OPENAI)
+            info = session.get_session_restoration_info()
+            assert info["current_api_type"] == APIType.OPENAI.value
+            assert info["missing_api_keys"]["openai"] is False  # Now has key
+            assert info["missing_api_keys"]["gemini"] is True  # Still missing
+    def test_session_needs_api_key_restoration(self):
+        """Test API key restoration detection."""
+        # Clear any environment API keys for this test
+        with patch.dict(
+            os.environ, {"OPENAI_API_KEY": "", "GOOGLE_API_KEY": ""}, clear=False
+        ):
+            session = UserSession("test_api_key_restoration")
+            # Initially both API keys should be missing
+            missing_keys = session.needs_api_key_restoration()
+            assert missing_keys["openai"] is True
+            assert missing_keys["gemini"] is True
+            # Set OpenAI API key
+            session.text_processor.set_openai_api_key("test_openai_key")
+            missing_keys = session.needs_api_key_restoration()
+            assert missing_keys["openai"] is False  # Now has key
+            assert missing_keys["gemini"] is True  # Still missing
+            # Set Gemini API key
+            session.text_processor.set_gemini_api_key("test_gemini_key")
+            missing_keys = session.needs_api_key_restoration()
+            assert missing_keys["openai"] is False  # Has key
+            assert missing_keys["gemini"] is False  # Now has key
+    def test_session_character_mapping_persistence(self, temp_session_dir):
+        """Test character mapping persistence."""
+        with patch("yomitalk.user_session.BASE_TEMP_DIR", temp_session_dir):
+            # Create session and set character mapping
+            session = UserSession("test_character_mapping")
+            session.text_processor.set_character_mapping("ずんだもん", "四国めたん")
+            # Save and reload
+            session.save_to_file()
+            loaded_session = UserSession.load_from_file("test_character_mapping")
+            # Verify character mapping was preserved
+            assert loaded_session is not None
+            char1, char2 = loaded_session.current_character_mapping
+            assert char1 == "ずんだもん"
+            assert char2 == "四国めたん"
+    def test_session_roundtrip_persistence(self, temp_session_dir):
+        """Test complete roundtrip session persistence."""
+        with patch("yomitalk.user_session.BASE_TEMP_DIR", temp_session_dir):
+            # Create session with comprehensive settings
+            original_session = UserSession("test_roundtrip")
+            # Configure all major settings
+            original_session.text_processor.set_gemini_api_key("test_key")
+            original_session.text_processor.set_api_type(APIType.GEMINI)
+            original_session.text_processor.set_document_type(DocumentType.MINUTES)
+            original_session.text_processor.set_podcast_mode("section_by_section")
+            original_session.text_processor.set_character_mapping(
+                "九州そら", "中国うさぎ"
+            )
+            original_session.text_processor.openai_model.set_max_tokens(4000)
+            original_session.text_processor.openai_model.set_model_name("gpt-4.1")
+            original_session.text_processor.gemini_model.set_max_tokens(2500)
+            original_session.text_processor.gemini_model.set_model_name(
+                "gemini-2.5-pro-preview-05-06"
+            )
+            # Update audio generation state
+            original_session.update_audio_generation_state(
+                status="completed",
+                progress=1.0,
+                current_script="完全なスクリプト内容",
+                final_audio_path="/path/to/final/audio.wav",
+            )
+            # Save session
+            save_success = original_session.save_to_file()
+            assert save_success is True
+            # Load session
+            loaded_session = UserSession.load_from_file("test_roundtrip")
+            assert loaded_session is not None
+            # Verify all settings were preserved
+            assert loaded_session.text_processor.current_api_type == APIType.GEMINI
+            assert (
+                loaded_session.text_processor.prompt_manager.current_document_type
+                == DocumentType.MINUTES
+            )
+            assert (
+                loaded_session.text_processor.prompt_manager.current_mode
+                == PodcastMode.SECTION_BY_SECTION
+            )
+            char1, char2 = loaded_session.current_character_mapping
+            assert char1 == "九州そら"
+            assert char2 == "中国うさぎ"
+            assert loaded_session.text_processor.openai_model.get_max_tokens() == 4000
+            assert loaded_session.text_processor.openai_model.model_name == "gpt-4.1"
+            assert loaded_session.text_processor.gemini_model.get_max_tokens() == 2500
+            assert (
+                loaded_session.text_processor.gemini_model.model_name
+                == "gemini-2.5-pro-preview-05-06"
+            )
+            # Verify audio state was preserved
+            audio_state = loaded_session.get_audio_generation_status()
+            assert audio_state["status"] == "completed"
+            assert audio_state["progress"] == 1.0
+            assert audio_state["current_script"] == "完全なスクリプト内容"
+            assert audio_state["final_audio_path"] == "/path/to/final/audio.wav"
+            # Verify restoration info shows correct state
+            restoration_info = loaded_session.get_session_restoration_info()
+            assert restoration_info["current_api_type"] == APIType.GEMINI.value
+            assert restoration_info["has_generated_audio"] is True
+            # Note: API keys are not persisted for security, but the test environment might have them
+            # So we don't assert their absence in this comprehensive test

tests/unit/test_text_processor.py CHANGED Viewed

@@ -70,9 +70,14 @@ class TestTextProcessor:
     def test_set_api_type(self):
         """Test setting API type."""
-        # APIが設定されていない場合
-        assert self.text_processor.set_api_type(APIType.OPENAI) is False
-        assert self.text_processor.set_api_type(APIType.GEMINI) is False
         # APIが設定されている場合をシミュレート
         with patch.object(

     def test_set_api_type(self):
         """Test setting API type."""
+        # Clear environment API keys and APIが設定されていない場合
+        with patch.object(
+            self.text_processor.openai_model, "has_api_key", return_value=False
+        ), patch.object(
+            self.text_processor.gemini_model, "has_api_key", return_value=False
+        ):
+            assert self.text_processor.set_api_type(APIType.OPENAI) is False
+            assert self.text_processor.set_api_type(APIType.GEMINI) is False
         # APIが設定されている場合をシミュレート
         with patch.object(

yomitalk/app.py CHANGED Viewed

@@ -50,9 +50,20 @@ class PaperPodcastApp:
         dummy_session.cleanup_old_sessions()
     def create_user_session(self, request: gr.Request) -> UserSession:
-        """Create a new user session with unique session ID."""
         session_id = request.session_hash
-        return UserSession(session_id)
     def clear_extracted_text(self) -> str:
         """Clear the extracted text area."""
@@ -68,6 +79,7 @@ class PaperPodcastApp:
         logger.debug(
             f"OpenAI API key set for session {user_session.session_id}: {success}"
         )
         return user_session
     def set_gemini_api_key(self, api_key: str, user_session: UserSession):
@@ -80,6 +92,7 @@ class PaperPodcastApp:
         logger.debug(
             f"Gemini API key set for session {user_session.session_id}: {success}"
         )
         return user_session
     def switch_llm_type(
@@ -95,6 +108,7 @@ class PaperPodcastApp:
             logger.debug(
                 f"{api_type.display_name} API key not set for session {user_session.session_id}"
             )
         return user_session
     def extract_file_text(
@@ -1154,6 +1168,7 @@ class PaperPodcastApp:
         """
         success = user_session.text_processor.openai_model.set_model_name(model_name)
         logger.debug(f"OpenAI model set to {model_name}: {success}")
         return user_session
     def set_gemini_model_name(
@@ -1167,6 +1182,7 @@ class PaperPodcastApp:
         """
         success = user_session.text_processor.gemini_model.set_model_name(model_name)
         logger.debug(f"Gemini model set to {model_name}: {success}")
         return user_session
     def get_openai_max_tokens(self, user_session: UserSession) -> int:
@@ -1198,6 +1214,7 @@ class PaperPodcastApp:
         """
         success = user_session.text_processor.openai_model.set_max_tokens(max_tokens)
         logger.debug(f"OpenAI max tokens set to {max_tokens}: {success}")
         return user_session
     def set_gemini_max_tokens(
@@ -1211,6 +1228,7 @@ class PaperPodcastApp:
         """
         success = user_session.text_processor.gemini_model.set_max_tokens(max_tokens)
         logger.debug(f"Gemini max tokens set to {max_tokens}: {success}")
         return user_session
     def get_available_characters(self) -> List[str]:
@@ -1234,6 +1252,7 @@ class PaperPodcastApp:
             character1, character2
         )
         logger.debug(f"Character mapping set: {character1}, {character2}: {success}")
         return user_session
     def update_process_button_state(
@@ -1287,6 +1306,7 @@ class PaperPodcastApp:
             success = user_session.text_processor.set_podcast_mode(podcast_mode.value)
             logger.debug(f"Podcast mode set to {mode}: {success}")
         except ValueError as e:
             logger.error(f"Error setting podcast mode: {str(e)}")
@@ -1379,6 +1399,7 @@ class PaperPodcastApp:
             success = user_session.text_processor.set_document_type(document_type)
             logger.debug(f"Document type set to {doc_type}: {success}")
         except ValueError as e:
             logger.error(f"Error setting document type: {str(e)}")

         dummy_session.cleanup_old_sessions()
     def create_user_session(self, request: gr.Request) -> UserSession:
+        """Create a new user session with unique session ID or restore from saved state."""
         session_id = request.session_hash
+        # Try to load existing session state first
+        existing_session = UserSession.load_from_file(session_id)
+        if existing_session:
+            logger.info(f"Restored existing session: {session_id}")
+            return existing_session
+        # Create new session if no saved state found
+        logger.info(f"Created new session: {session_id}")
+        new_session = UserSession(session_id)
+        new_session.auto_save()  # Save initial state
+        return new_session
     def clear_extracted_text(self) -> str:
         """Clear the extracted text area."""
         logger.debug(
             f"OpenAI API key set for session {user_session.session_id}: {success}"
         )
+        user_session.auto_save()  # Save session state after API key change
         return user_session
     def set_gemini_api_key(self, api_key: str, user_session: UserSession):
         logger.debug(
             f"Gemini API key set for session {user_session.session_id}: {success}"
         )
+        user_session.auto_save()  # Save session state after API key change
         return user_session
     def switch_llm_type(
             logger.debug(
                 f"{api_type.display_name} API key not set for session {user_session.session_id}"
             )
+        user_session.auto_save()  # Save session state after API type change
         return user_session
     def extract_file_text(
         """
         success = user_session.text_processor.openai_model.set_model_name(model_name)
         logger.debug(f"OpenAI model set to {model_name}: {success}")
+        user_session.auto_save()  # Save session state after model name change
         return user_session
     def set_gemini_model_name(
         """
         success = user_session.text_processor.gemini_model.set_model_name(model_name)
         logger.debug(f"Gemini model set to {model_name}: {success}")
+        user_session.auto_save()  # Save session state after model name change
         return user_session
     def get_openai_max_tokens(self, user_session: UserSession) -> int:
         """
         success = user_session.text_processor.openai_model.set_max_tokens(max_tokens)
         logger.debug(f"OpenAI max tokens set to {max_tokens}: {success}")
+        user_session.auto_save()  # Save session state after max tokens change
         return user_session
     def set_gemini_max_tokens(
         """
         success = user_session.text_processor.gemini_model.set_max_tokens(max_tokens)
         logger.debug(f"Gemini max tokens set to {max_tokens}: {success}")
+        user_session.auto_save()  # Save session state after max tokens change
         return user_session
     def get_available_characters(self) -> List[str]:
             character1, character2
         )
         logger.debug(f"Character mapping set: {character1}, {character2}: {success}")
+        user_session.auto_save()  # Save session state after character mapping change
         return user_session
     def update_process_button_state(
             success = user_session.text_processor.set_podcast_mode(podcast_mode.value)
             logger.debug(f"Podcast mode set to {mode}: {success}")
+            user_session.auto_save()  # Save session state after podcast mode change
         except ValueError as e:
             logger.error(f"Error setting podcast mode: {str(e)}")
             success = user_session.text_processor.set_document_type(document_type)
             logger.debug(f"Document type set to {doc_type}: {success}")
+            user_session.auto_save()  # Save session state after document type change
         except ValueError as e:
             logger.error(f"Error setting document type: {str(e)}")

yomitalk/user_session.py CHANGED Viewed

@@ -3,15 +3,17 @@
 This module contains the UserSession class for managing per-user session data.
 """
 import re
 import shutil
 import time
 from pathlib import Path
-from typing import Any, Dict, Tuple
 from yomitalk.common import APIType
 from yomitalk.components.audio_generator import AudioGenerator
 from yomitalk.components.text_processor import TextProcessor
 from yomitalk.utils.logger import logger
 # Global base directories for all users
@@ -349,6 +351,9 @@ class UserSession:
         # Update last update time
         self.audio_generation_state["last_update"] = time.time()
     def reset_audio_generation_state(self) -> None:
         """Reset audio generation state to initial values."""
         self.audio_generation_state = {
@@ -365,6 +370,9 @@ class UserSession:
         }
         logger.debug("Audio generation state reset")
     def is_audio_generation_active(self) -> bool:
         """Check if audio generation is currently active.
@@ -394,3 +402,183 @@ class UserSession:
             self.audio_generation_state["final_audio_path"] is not None
             or len(list(self.audio_generation_state["streaming_parts"])) > 0
         )

 This module contains the UserSession class for managing per-user session data.
 """
+import json
 import re
 import shutil
 import time
 from pathlib import Path
+from typing import Any, Dict, Optional, Tuple
 from yomitalk.common import APIType
 from yomitalk.components.audio_generator import AudioGenerator
 from yomitalk.components.text_processor import TextProcessor
+from yomitalk.prompt_manager import DocumentType, PodcastMode
 from yomitalk.utils.logger import logger
 # Global base directories for all users
         # Update last update time
         self.audio_generation_state["last_update"] = time.time()
+        # Auto-save session state
+        self.auto_save()
     def reset_audio_generation_state(self) -> None:
         """Reset audio generation state to initial values."""
         self.audio_generation_state = {
         }
         logger.debug("Audio generation state reset")
+        # Auto-save session state
+        self.auto_save()
     def is_audio_generation_active(self) -> bool:
         """Check if audio generation is currently active.
             self.audio_generation_state["final_audio_path"] is not None
             or len(list(self.audio_generation_state["streaming_parts"])) > 0
         )
+    def to_dict(self) -> Dict[str, Any]:
+        """Serialize session state to dictionary for persistence.
+        Returns:
+            Dict[str, Any]: Serializable session state
+        """
+        return {
+            "session_id": self.session_id,
+            "audio_generation_state": self.audio_generation_state.copy(),
+            "text_processor_state": {
+                "current_api_type": (
+                    self.text_processor.current_api_type.value
+                    if self.text_processor.current_api_type
+                    else None
+                ),
+                "openai_api_key_set": bool(self.text_processor.openai_model.api_key),
+                "gemini_api_key_set": bool(self.text_processor.gemini_model.api_key),
+                "openai_max_tokens": self.text_processor.openai_model.get_max_tokens(),
+                "gemini_max_tokens": self.text_processor.gemini_model.get_max_tokens(),
+                "openai_model_name": self.text_processor.openai_model.model_name,
+                "gemini_model_name": self.text_processor.gemini_model.model_name,
+                "prompt_manager_state": {
+                    "current_document_type": self.text_processor.prompt_manager.current_document_type.value,
+                    "current_mode": self.text_processor.prompt_manager.current_mode.value,
+                    "char_mapping": self.text_processor.prompt_manager.char_mapping.copy(),
+                },
+            },
+            "last_save_time": time.time(),
+        }
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "UserSession":
+        """Restore session state from dictionary.
+        Args:
+            data: Serialized session state
+        Returns:
+            UserSession: Restored session instance
+        """
+        session = cls(data["session_id"])
+        # Restore audio generation state
+        if "audio_generation_state" in data:
+            session.audio_generation_state.update(data["audio_generation_state"])
+        # Restore text processor state
+        if "text_processor_state" in data:
+            text_state = data["text_processor_state"]
+            # Restore API type
+            if text_state.get("current_api_type"):
+                # Find APIType by value and set directly (bypass API key validation)
+                for api_type in APIType:
+                    if api_type.value == text_state["current_api_type"]:
+                        session.text_processor.current_api_type = api_type
+                        break
+            # Restore model settings
+            if "openai_max_tokens" in text_state:
+                session.text_processor.openai_model.set_max_tokens(
+                    text_state["openai_max_tokens"]
+                )
+            if "gemini_max_tokens" in text_state:
+                session.text_processor.gemini_model.set_max_tokens(
+                    text_state["gemini_max_tokens"]
+                )
+            if "openai_model_name" in text_state:
+                session.text_processor.openai_model.set_model_name(
+                    text_state["openai_model_name"]
+                )
+            if "gemini_model_name" in text_state:
+                session.text_processor.gemini_model.set_model_name(
+                    text_state["gemini_model_name"]
+                )
+            # Restore prompt manager state
+            if "prompt_manager_state" in text_state:
+                pm_state = text_state["prompt_manager_state"]
+                if "current_document_type" in pm_state:
+                    # Find DocumentType by value
+                    for doc_type in DocumentType:
+                        if doc_type.value == pm_state["current_document_type"]:
+                            session.text_processor.prompt_manager.set_document_type(
+                                doc_type
+                            )
+                            break
+                if "current_mode" in pm_state:
+                    # Find PodcastMode by value
+                    for mode in PodcastMode:
+                        if mode.value == pm_state["current_mode"]:
+                            session.text_processor.prompt_manager.set_podcast_mode(mode)
+                            break
+                if "char_mapping" in pm_state:
+                    session.text_processor.prompt_manager.char_mapping = pm_state[
+                        "char_mapping"
+                    ].copy()
+        logger.info(f"Session restored from saved state: {session.session_id}")
+        return session
+    def save_to_file(self) -> bool:
+        """Save session state to file.
+        Returns:
+            bool: True if save was successful
+        """
+        try:
+            session_file = self.get_temp_dir() / "session_state.json"
+            with open(session_file, "w", encoding="utf-8") as f:
+                json.dump(self.to_dict(), f, indent=2, ensure_ascii=False)
+            logger.debug(f"Session state saved to file: {session_file}")
+            return True
+        except Exception as e:
+            logger.error(f"Failed to save session state: {str(e)}")
+            return False
+    @classmethod
+    def load_from_file(cls, session_id: str) -> Optional["UserSession"]:
+        """Load session state from file.
+        Args:
+            session_id: Session ID to load
+        Returns:
+            UserSession: Restored session or None if not found
+        """
+        try:
+            session_file = BASE_TEMP_DIR / session_id / "session_state.json"
+            if not session_file.exists():
+                logger.debug(f"No saved session state found: {session_file}")
+                return None
+            with open(session_file, "r", encoding="utf-8") as f:
+                data = json.load(f)
+            session = cls.from_dict(data)
+            logger.info(f"Session state loaded from file: {session_file}")
+            return session
+        except Exception as e:
+            logger.error(f"Failed to load session state: {str(e)}")
+            return None
+    def auto_save(self) -> None:
+        """Automatically save session state if significant changes occurred."""
+        try:
+            self.save_to_file()
+        except Exception as e:
+            logger.error(f"Auto-save failed for session {self.session_id}: {str(e)}")
+    def needs_api_key_restoration(self) -> Dict[str, bool]:
+        """Check which API keys need to be restored after session reload.
+        Returns:
+            Dict[str, bool]: Dictionary indicating which API keys are missing
+        """
+        return {
+            "openai": not self.text_processor.openai_model.has_api_key(),
+            "gemini": not self.text_processor.gemini_model.has_api_key(),
+        }
+    def get_session_restoration_info(self) -> Dict[str, Any]:
+        """Get information about session restoration status.
+        Returns:
+            Dict[str, Any]: Session restoration information
+        """
+        missing_keys = self.needs_api_key_restoration()
+        return {
+            "session_id": self.session_id,
+            "missing_api_keys": missing_keys,
+            "current_api_type": (
+                self.text_processor.current_api_type.value
+                if self.text_processor.current_api_type
+                else None
+            ),
+            "has_generated_audio": self.has_generated_audio(),
+            "last_save_time": time.time(),
+        }