Spaces:
Running
Running
| # NeuroAnim Improvements Guide | |
| This document outlines improvements made and further recommendations for enhancing the NeuroAnim system's code generation, script writing, and overall quality. | |
| --- | |
| ## β Issues Fixed | |
| ### 1. Audio Generation Problem - RESOLVED | |
| **Problem:** Narration text contained prefixes like "Narration Script:\n\n" which were being sent to TTS, causing poor audio quality or failures. | |
| **Solution Implemented:** | |
| - Added `_clean_narration_text()` method in `orchestrator.py` that strips prefixes and formatting artifacts | |
| - Updated `generate_narration()` in `mcp_servers/creative.py` to return clean text without prefixes | |
| - Improved prompt to explicitly instruct the model not to add labels | |
| **Location:** | |
| - `orchestrator.py` lines 353-389 (new method) | |
| - `mcp_servers/creative.py` lines 558-608 (improved prompt and cleaning) | |
| --- | |
| ## π― Recommendations for Further Improvements | |
| ### 2. Manim Code Generation Quality | |
| #### Current Issues: | |
| - Syntax errors (unclosed parentheses, brackets) | |
| - Invalid color names (DARK_GREEN, LIGHT_BLUE don't exist in Manim) | |
| - Incorrect animation method names (using lowercase instead of capitalized) | |
| - Missing imports or incomplete code blocks | |
| - Using deprecated Manim classes or methods | |
| #### Improvements Made: | |
| β Enhanced prompts in `mcp_servers/creative.py` with explicit requirements: | |
| - List of valid color constants | |
| - Correct animation method names with capitalization | |
| - Use of `MovingCameraScene` for better flexibility | |
| - Syntax validation requirements | |
| #### Additional Recommendations: | |
| **A. Add Code Post-Processing Pipeline** | |
| Create `utils/code_validator.py`: | |
| ```python | |
| import ast | |
| import re | |
| from typing import Dict, List, Optional | |
| class ManimCodeValidator: | |
| """Validate and fix common Manim code issues.""" | |
| VALID_COLORS = { | |
| 'WHITE', 'BLACK', 'GRAY', 'GREY', 'LIGHT_GRAY', 'DARK_GRAY', | |
| 'RED', 'GREEN', 'BLUE', 'YELLOW', 'ORANGE', 'PINK', 'PURPLE', | |
| 'TEAL', 'GOLD', 'MAROON', 'RED_A', 'RED_B', 'RED_C', 'RED_D', | |
| 'RED_E', 'GREEN_A', 'GREEN_B', 'GREEN_C', 'GREEN_D', 'GREEN_E', | |
| 'BLUE_A', 'BLUE_B', 'BLUE_C', 'BLUE_D', 'BLUE_E' | |
| } | |
| INVALID_COLOR_REPLACEMENTS = { | |
| 'DARK_GREEN': 'GREEN_D', | |
| 'LIGHT_GREEN': 'GREEN_A', | |
| 'DARK_BLUE': 'BLUE_D', | |
| 'LIGHT_BLUE': 'BLUE_A', | |
| 'DARK_RED': 'RED_D', | |
| 'LIGHT_RED': 'RED_A', | |
| } | |
| @staticmethod | |
| def validate_syntax(code: str) -> Dict[str, any]: | |
| """Check if code has valid Python syntax.""" | |
| try: | |
| ast.parse(code) | |
| return {"valid": True, "errors": []} | |
| except SyntaxError as e: | |
| return { | |
| "valid": False, | |
| "errors": [f"Syntax error at line {e.lineno}: {e.msg}"] | |
| } | |
| @staticmethod | |
| def fix_colors(code: str) -> str: | |
| """Replace invalid color names with valid ones.""" | |
| for invalid, valid in ManimCodeValidator.INVALID_COLOR_REPLACEMENTS.items(): | |
| code = re.sub(rf'\b{invalid}\b', valid, code) | |
| return code | |
| @staticmethod | |
| def ensure_imports(code: str) -> str: | |
| """Ensure proper Manim imports exist.""" | |
| if 'from manim import' not in code and 'import manim' not in code: | |
| code = 'from manim import *\n\n' + code | |
| return code | |
| @staticmethod | |
| def fix_common_issues(code: str) -> str: | |
| """Apply common fixes to generated code.""" | |
| # Fix colors | |
| code = ManimCodeValidator.fix_colors(code) | |
| # Ensure imports | |
| code = ManimCodeValidator.ensure_imports(code) | |
| # Fix common typos in animation methods | |
| typo_fixes = { | |
| r'\.fadein\(': '.FadeIn(', | |
| r'\.fadeout\(': '.FadeOut(', | |
| r'\.write\(': '.Write(', | |
| r'\.create\(': '.Create(', | |
| r'self\.play\(flash\(': 'self.play(Flash(', | |
| r'self\.play\(indicate\(': 'self.play(Indicate(', | |
| } | |
| for pattern, replacement in typo_fixes.items(): | |
| code = re.sub(pattern, replacement, code, flags=re.IGNORECASE) | |
| return code | |
| ``` | |
| **B. Implement Multi-Stage Validation** | |
| In `orchestrator.py`, enhance `_generate_and_validate_code()`: | |
| ```python | |
| async def _generate_and_validate_code( | |
| self, topic: str, concept_plan: str, max_retries: int = 3 | |
| ) -> str: | |
| """Generate and validate Manim code with multiple checks.""" | |
| from utils.code_validator import ManimCodeValidator | |
| validator = ManimCodeValidator() | |
| for attempt in range(max_retries): | |
| # Generate code | |
| code_result = await self.call_tool(...) | |
| raw_code = self._extract_python_code(code_result["text"]) | |
| # Stage 1: Fix common issues | |
| fixed_code = validator.fix_common_issues(raw_code) | |
| # Stage 2: Syntax validation | |
| syntax_check = validator.validate_syntax(fixed_code) | |
| if not syntax_check["valid"]: | |
| logger.warning(f"Syntax error in attempt {attempt + 1}") | |
| # Retry with error feedback | |
| continue | |
| # Stage 3: Test import (optional, quick check) | |
| try: | |
| compile(fixed_code, '<string>', 'exec') | |
| except Exception as e: | |
| logger.warning(f"Compilation error: {e}") | |
| continue | |
| return fixed_code | |
| raise Exception("Failed to generate valid code after retries") | |
| ``` | |
| **C. Use Few-Shot Examples in Prompts** | |
| Add working examples to the code generation prompt: | |
| ```python | |
| EXAMPLE_CODE = ''' | |
| from manim import * | |
| class ExampleScene(MovingCameraScene): | |
| def construct(self): | |
| # Title | |
| title = Text("Example Animation", font_size=48) | |
| title.to_edge(UP) | |
| self.play(Write(title)) | |
| self.wait(1) | |
| # Create objects | |
| circle = Circle(radius=1, color=BLUE) | |
| square = Square(side_length=2, color=RED) | |
| square.next_to(circle, RIGHT, buff=1) | |
| # Animate | |
| self.play(Create(circle), Create(square)) | |
| self.wait(1) | |
| self.play(circle.animate.shift(RIGHT * 2)) | |
| self.wait(1) | |
| ''' | |
| # Include in prompt: | |
| prompt = f""" | |
| Here's an example of proper Manim code structure: | |
| {EXAMPLE_CODE} | |
| Now generate similar code for: {concept} | |
| ... | |
| """ | |
| ``` | |
| --- | |
| ### 3. Script Writing (Narration) Quality | |
| #### Current Issues: | |
| - Sometimes too technical or too simple for the audience | |
| - Inconsistent pacing | |
| - May include unnecessary conversational elements | |
| - Duration mismatch with actual content | |
| #### Improvements Made: | |
| β Completely rewritten prompt in `mcp_servers/creative.py`: | |
| - Clear instruction to output only spoken text | |
| - Word count guidance based on duration | |
| - Explicit formatting requirements | |
| - Post-processing to remove prefixes | |
| #### Additional Recommendations: | |
| **A. Add Narration Quality Scoring** | |
| Create `utils/narration_analyzer.py`: | |
| ```python | |
| class NarrationAnalyzer: | |
| """Analyze and score narration quality.""" | |
| @staticmethod | |
| def estimate_duration(text: str, wpm: int = 150) -> float: | |
| """Estimate speaking duration in seconds.""" | |
| word_count = len(text.split()) | |
| return (word_count / wpm) * 60 | |
| @staticmethod | |
| def check_reading_level(text: str) -> Dict: | |
| """Analyze text complexity.""" | |
| # Could use textstat library | |
| import textstat | |
| return { | |
| "flesch_reading_ease": textstat.flesch_reading_ease(text), | |
| "grade_level": textstat.flesch_kincaid_grade(text), | |
| "syllable_count": textstat.syllable_count(text), | |
| } | |
| @staticmethod | |
| def validate_audience_match(text: str, audience: str) -> bool: | |
| """Check if text matches target audience.""" | |
| grade_map = { | |
| "elementary": (3, 5), | |
| "middle_school": (6, 8), | |
| "high_school": (9, 12), | |
| "undergraduate": (13, 16), | |
| } | |
| if audience not in grade_map: | |
| return True | |
| min_grade, max_grade = grade_map[audience] | |
| actual_grade = textstat.flesch_kincaid_grade(text) | |
| return min_grade <= actual_grade <= max_grade + 2 | |
| ``` | |
| **B. Implement Iterative Refinement** | |
| ```python | |
| async def generate_refined_narration(self, topic, audience, duration, max_attempts=2): | |
| """Generate narration with quality checks and refinement.""" | |
| analyzer = NarrationAnalyzer() | |
| for attempt in range(max_attempts): | |
| # Generate narration | |
| narration = await self.generate_narration(...) | |
| # Check duration match | |
| estimated_duration = analyzer.estimate_duration(narration) | |
| target_duration = duration * 60 | |
| if abs(estimated_duration - target_duration) > 15: # 15 sec tolerance | |
| feedback = f"Duration mismatch: got {estimated_duration}s, need {target_duration}s" | |
| # Regenerate with feedback | |
| continue | |
| # Check audience match | |
| if not analyzer.validate_audience_match(narration, audience): | |
| feedback = f"Complexity doesn't match {audience} level" | |
| continue | |
| return narration | |
| # Return best attempt even if not perfect | |
| return narration | |
| ``` | |
| **C. Use Structured Output Format** | |
| Modify prompt to request JSON structure: | |
| ```python | |
| prompt = f""" | |
| Generate narration in JSON format: | |
| {{ | |
| "narration": "The actual spoken text...", | |
| "key_points": ["point 1", "point 2"], | |
| "transitions": ["0:00 - Introduction", "0:30 - Main concept"], | |
| "emphasis_words": ["important", "theorem", "result"] | |
| }} | |
| Topic: {concept} | |
| Audience: {target_audience} | |
| Duration: {duration} seconds | |
| """ | |
| # Parse and extract just the narration part | |
| result = json.loads(response) | |
| narration_text = result["narration"] | |
| ``` | |
| --- | |
| ### 4. Overall System Improvements | |
| #### A. Add Caching Layer | |
| Save generated components to avoid regeneration: | |
| ```python | |
| import hashlib | |
| import json | |
| from pathlib import Path | |
| class GenerationCache: | |
| """Cache generated content.""" | |
| def __init__(self, cache_dir: Path = Path("cache")): | |
| self.cache_dir = cache_dir | |
| self.cache_dir.mkdir(exist_ok=True) | |
| def _get_hash(self, topic: str, params: Dict) -> str: | |
| """Generate cache key.""" | |
| key = f"{topic}_{json.dumps(params, sort_keys=True)}" | |
| return hashlib.md5(key.encode()).hexdigest() | |
| def get_narration(self, topic: str, audience: str) -> Optional[str]: | |
| """Retrieve cached narration.""" | |
| key = self._get_hash(topic, {"audience": audience, "type": "narration"}) | |
| cache_file = self.cache_dir / f"{key}.txt" | |
| if cache_file.exists(): | |
| return cache_file.read_text() | |
| return None | |
| def save_narration(self, topic: str, audience: str, content: str): | |
| """Save narration to cache.""" | |
| key = self._get_hash(topic, {"audience": audience, "type": "narration"}) | |
| cache_file = self.cache_dir / f"{key}.txt" | |
| cache_file.write_text(content) | |
| ``` | |
| #### B. Implement Quality Metrics Dashboard | |
| Track generation success rates, error types, average durations: | |
| ```python | |
| class MetricsCollector: | |
| """Collect and report system metrics.""" | |
| def __init__(self): | |
| self.metrics = { | |
| "total_generations": 0, | |
| "successful_generations": 0, | |
| "failed_generations": 0, | |
| "errors": {}, | |
| "average_duration": 0, | |
| } | |
| def record_success(self, duration: float): | |
| self.metrics["total_generations"] += 1 | |
| self.metrics["successful_generations"] += 1 | |
| self._update_average_duration(duration) | |
| def record_failure(self, error_type: str): | |
| self.metrics["total_generations"] += 1 | |
| self.metrics["failed_generations"] += 1 | |
| self.metrics["errors"][error_type] = self.metrics["errors"].get(error_type, 0) + 1 | |
| def get_report(self) -> Dict: | |
| """Get metrics report.""" | |
| success_rate = ( | |
| self.metrics["successful_generations"] / self.metrics["total_generations"] | |
| if self.metrics["total_generations"] > 0 | |
| else 0 | |
| ) | |
| return { | |
| **self.metrics, | |
| "success_rate": success_rate, | |
| } | |
| ``` | |
| #### C. Add Preview Mode | |
| Generate low-quality preview before full render: | |
| ```python | |
| async def generate_preview(self, topic: str, audience: str) -> Dict: | |
| """Generate quick preview without full rendering.""" | |
| # Generate only concept plan and narration | |
| concept = await self.generate_concept(topic, audience) | |
| narration = await self.generate_narration(topic, concept, audience) | |
| # Generate code but don't render | |
| code = await self.generate_code(topic, concept) | |
| return { | |
| "concept": concept, | |
| "narration": narration, | |
| "code": code, | |
| "estimated_duration": len(narration.split()) / 150 * 60, | |
| } | |
| ``` | |
| #### D. Error Recovery Strategies | |
| Implement better fallback mechanisms: | |
| ```python | |
| class GenerationStrategy: | |
| """Handle generation with multiple fallback strategies.""" | |
| async def generate_with_fallback(self, primary_fn, fallback_fn, *args): | |
| """Try primary method, fall back if it fails.""" | |
| try: | |
| return await primary_fn(*args) | |
| except Exception as e: | |
| logger.warning(f"Primary method failed: {e}, trying fallback") | |
| return await fallback_fn(*args) | |
| async def generate_code_resilient(self, topic: str, concept: str): | |
| """Generate code with multiple strategies.""" | |
| strategies = [ | |
| ("Complex with camera", lambda: self.generate_with_camera_scene(topic)), | |
| ("Simple Scene", lambda: self.generate_simple_scene(topic)), | |
| ("Template-based", lambda: self.use_code_template(topic)), | |
| ] | |
| for strategy_name, strategy_fn in strategies: | |
| try: | |
| logger.info(f"Trying strategy: {strategy_name}") | |
| return await strategy_fn() | |
| except Exception as e: | |
| logger.warning(f"Strategy {strategy_name} failed: {e}") | |
| continue | |
| raise Exception("All code generation strategies failed") | |
| ``` | |
| --- | |
| ## π Implementation Priority | |
| ### High Priority (Immediate) | |
| 1. β Fix audio generation (DONE) | |
| 2. β Improve narration prompts (DONE) | |
| 3. β Add Gradio frontend (DONE) | |
| 4. π² Implement code validator with post-processing | |
| 5. π² Add syntax validation before rendering | |
| ### Medium Priority (Next Sprint) | |
| 6. π² Add narration quality analyzer | |
| 7. π² Implement caching layer | |
| 8. π² Add preview mode | |
| 9. π² Enhance error recovery | |
| ### Low Priority (Future) | |
| 10. π² Metrics dashboard | |
| 11. π² Advanced code templates | |
| 12. π² Multi-model ensemble for better quality | |
| 13. π² User feedback loop for iterative improvement | |
| --- | |
| ## π§ͺ Testing Recommendations | |
| ### Unit Tests | |
| ```python | |
| def test_narration_cleaning(): | |
| """Test narration text cleaning.""" | |
| dirty = "Narration Script:\n\nThis is the actual text" | |
| clean = orchestrator._clean_narration_text(dirty) | |
| assert clean == "This is the actual text" | |
| def test_code_validation(): | |
| """Test Manim code validation.""" | |
| invalid_code = "circle = Circle(color=DARK_GREEN)" | |
| fixed = validator.fix_colors(invalid_code) | |
| assert "GREEN_D" in fixed | |
| def test_duration_estimation(): | |
| """Test narration duration estimation.""" | |
| text = "This is a test " * 150 # 150 words | |
| duration = analyzer.estimate_duration(text, wpm=150) | |
| assert 59 <= duration <= 61 # Should be ~60 seconds | |
| ``` | |
| ### Integration Tests | |
| ```python | |
| async def test_full_pipeline(): | |
| """Test complete generation pipeline.""" | |
| orchestrator = NeuroAnimOrchestrator() | |
| await orchestrator.initialize() | |
| result = await orchestrator.generate_animation( | |
| topic="Test Topic", | |
| target_audience="high_school", | |
| animation_length_minutes=1.0 | |
| ) | |
| assert result["success"] | |
| assert Path(result["output_file"]).exists() | |
| assert len(result["narration"]) > 50 | |
| assert "from manim import" in result["manim_code"] | |
| ``` | |
| --- | |
| ## π Success Metrics | |
| Track these to measure improvement: | |
| 1. **Code Generation Success Rate**: % of generated code that renders without errors | |
| 2. **Audio Quality Score**: User ratings or automated speech quality metrics | |
| 3. **Narration Accuracy**: Duration match, audience level match | |
| 4. **End-to-End Success**: % of complete generations without manual intervention | |
| 5. **User Satisfaction**: Feedback scores from Gradio interface | |
| Target Goals: | |
| - Code success rate: >85% | |
| - Audio quality: >4/5 | |
| - Duration accuracy: Β±10 seconds | |
| - End-to-end success: >75% | |
| --- | |
| ## π§ Configuration Best Practices | |
| Create `config.yaml` for easy tuning: | |
| ```yaml | |
| generation: | |
| max_retries: 3 | |
| timeout_seconds: 300 | |
| narration: | |
| words_per_minute: 150 | |
| min_words: 50 | |
| max_words: 1000 | |
| code_generation: | |
| temperature: 0.3 | |
| max_tokens: 2048 | |
| default_scene_class: "MovingCameraScene" | |
| rendering: | |
| quality: "medium" | |
| frame_rate: 30 | |
| format: "mp4" | |
| audio: | |
| primary_provider: "elevenlabs" | |
| fallback_providers: ["huggingface", "gtts"] | |
| default_voice: "rachel" | |
| ``` | |
| --- | |
| ## π Educational Content Guidelines | |
| To maximize educational value: | |
| 1. **Clear Learning Objectives**: Start narration with "In this video, you'll learn..." | |
| 2. **Progressive Complexity**: Build from simple to complex | |
| 3. **Visual-Audio Sync**: Time narration with visual reveals | |
| 4. **Repetition**: Reinforce key concepts 2-3 times | |
| 5. **Real-World Connections**: Include practical applications | |
| 6. **Assessment**: Quiz questions that test understanding, not memorization | |
| --- | |
| ## π Future Enhancements | |
| 1. **Multi-Language Support**: Generate narration in multiple languages | |
| 2. **Custom Voice Cloning**: Use teacher's voice with ElevenLabs | |
| 3. **Interactive Elements**: Clickable annotations in video | |
| 4. **Series Generation**: Create multi-video curriculum | |
| 5. **Adaptive Learning**: Adjust complexity based on quiz results | |
| 6. **Collaborative Editing**: Allow teachers to refine generated content | |
| --- | |
| **Document Version:** 1.0 | |
| **Last Updated:** 2024 | |
| **Status:** Living document - update as improvements are implemented |