manim-mcp / manim_mcp /README.md
bhaveshgoel07's picture
Complete NeuroAnim HF Spaces deployment - all source files
0805c5b
|
raw
history blame
9.51 kB

Manim MCP Server

A comprehensive Model Context Protocol (MCP) server for creating educational STEM animations using Manim. This server combines AI-powered creative tools with rendering and video processing capabilities to streamline the animation creation workflow.

Features

🎨 Creative Tools

  • Concept Planning: AI-powered STEM concept planning with learning objectives and scene flow
  • Code Generation: Intelligent Manim code generation with syntax validation
  • Code Refinement: Automatic code improvement based on errors and feedback
  • Narration Generation: Educational script writing tailored to target audiences
  • Quiz Generation: Automated assessment question creation

🎬 Rendering & Processing

  • Manim Rendering: Full Manim animation rendering with quality controls
  • Video Processing: FFmpeg-based video manipulation and conversion
  • Audio/Video Merging: Seamless integration of narration with animations
  • File Management: Comprehensive file system operations

πŸ€– AI Integration

  • Vision Analysis: Frame-by-frame quality assessment using vision models
  • Text-to-Speech: Natural voice synthesis for narration
  • Multi-Model Support: Flexible model selection for different tasks

Installation

Prerequisites

  • Python 3.12+
  • Manim Community Edition (manim>=0.18.1)
  • FFmpeg (for video processing)
  • HuggingFace API key (for AI features)

Setup

  1. Install the package and dependencies:
pip install mcp huggingface_hub manim pydantic aiohttp httpx numpy Pillow
  1. Set up your environment variables:
export HUGGINGFACE_API_KEY="your_api_key_here"
  1. Run the MCP server:
python manim_mcp/server.py

Usage

As an MCP Server

The server can be integrated into any MCP-compatible client (like Claude Desktop):

{
  "mcpServers": {
    "manim": {
      "command": "python",
      "args": ["path/to/manim_mcp/server.py"],
      "env": {
        "HUGGINGFACE_API_KEY": "your_key"
      }
    }
  }
}

Programmatic Usage

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

# Initialize MCP client
params = StdioServerParameters(
    command="python",
    args=["manim_mcp/server.py"],
    env={"HUGGINGFACE_API_KEY": "your_key"}
)

async with stdio_client(params) as (read, write):
    session = ClientSession(read, write)
    await session.initialize()
    
    # Plan a concept
    result = await session.call_tool("plan_concept", {
        "topic": "Pythagorean Theorem",
        "target_audience": "high_school",
        "animation_length_minutes": 2.0
    })

Available Tools

Planning & Creative

plan_concept

Plan a STEM concept for animation with learning objectives and scene flow.

Parameters:

  • topic (string, required): The STEM topic to animate
  • target_audience (enum, required): elementary | middle_school | high_school | college | general
  • animation_length_minutes (number, optional): Desired length in minutes
  • model (string, optional): HuggingFace model to use

generate_manim_code

Generate complete, runnable Manim Python code.

Parameters:

  • concept (string, required): Animation concept
  • scene_description (string, required): Detailed scene description
  • visual_elements (array, optional): List of visual elements to include
  • previous_code (string, optional): For retry attempts
  • error_message (string, optional): Error from previous attempt

refine_animation

Refine existing Manim code based on feedback.

Parameters:

  • original_code (string, required): Code to refine
  • feedback (string, required): Feedback or error message
  • improvement_goals (array, optional): Specific improvements to make

generate_narration

Generate educational narration scripts.

Parameters:

  • concept (string, required): Animation concept
  • scene_description (string, required): Scene details
  • target_audience (string, required): Target audience level
  • duration_seconds (integer, optional): Script duration

generate_quiz

Generate educational quiz questions.

Parameters:

  • concept (string, required): STEM concept
  • difficulty (enum, required): easy | medium | hard
  • num_questions (integer, required): Number of questions
  • question_types (array, optional): Types of questions

Rendering & Processing

write_manim_file

Write Manim code to a file.

Parameters:

  • filepath (string, required): Destination path
  • code (string, required): Manim code to write

render_manim_animation

Render a Manim animation from a Python file.

Parameters:

  • scene_name (string, required): Scene class name
  • file_path (string, required): Path to Python file
  • output_dir (string, required): Output directory
  • quality (enum, optional): low | medium | high | production_quality
  • format (enum, optional): mp4 | gif | png
  • frame_rate (integer, optional): Frame rate (default: 30)

merge_video_audio

Merge video and audio files.

Parameters:

  • video_file (string, required): Path to video
  • audio_file (string, required): Path to audio
  • output_file (string, required): Output path

process_video_with_ffmpeg

Process videos with custom FFmpeg arguments.

Parameters:

  • input_files (array, required): Input file paths
  • output_file (string, required): Output path
  • ffmpeg_args (array, optional): Additional FFmpeg arguments

check_file_exists

Check file existence and get metadata.

Parameters:

  • filepath (string, required): File path to check

Analysis

analyze_frame

Analyze animation frames using vision models.

Parameters:

  • image_path (string, required): Path to image
  • analysis_type (string, required): Type of analysis
  • context (string, optional): Additional context
  • model (string, optional): Vision model to use

generate_speech

Convert text to speech audio.

Parameters:

  • text (string, required): Text to convert
  • output_path (string, required): Audio output path
  • voice (string, optional): Voice to use
  • model (string, optional): TTS model to use

Complete Workflow Example

Here's a typical animation generation workflow:

  1. Plan the concept
  2. Generate narration script
  3. Generate Manim code
  4. Write code to file
  5. Render the animation
  6. Generate speech audio
  7. Merge video and audio
  8. Generate quiz questions
# 1. Plan concept
plan = await session.call_tool("plan_concept", {
    "topic": "Newton's Laws of Motion",
    "target_audience": "high_school"
})

# 2. Generate narration
narration = await session.call_tool("generate_narration", {
    "concept": "Newton's Laws",
    "scene_description": plan["text"],
    "target_audience": "high_school",
    "duration_seconds": 120
})

# 3. Generate code
code = await session.call_tool("generate_manim_code", {
    "concept": "Newton's Laws",
    "scene_description": plan["text"],
    "visual_elements": ["text", "shapes", "arrows"]
})

# 4-7. Continue workflow...

Configuration

Environment Variables

  • HUGGINGFACE_API_KEY: Required for AI-powered tools
  • ELEVENLABS_API_KEY: Optional for premium TTS (falls back to free alternatives)

Model Selection

By default, the server uses sensible model defaults, but you can specify custom models:

await session.call_tool("generate_manim_code", {
    "concept": "topic",
    "scene_description": "description",
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct"  # Custom model
})

Quality Settings

Rendering quality options:

  • low: 480p15 - Fast, good for testing
  • medium: 720p30 - Balanced quality/speed (default)
  • high: 1080p60 - High quality, slower
  • production_quality: 2160p60 - 4K, very slow

Error Handling

The server includes comprehensive error handling:

  • Syntax validation for generated code
  • Retry logic for code generation failures
  • Graceful fallbacks for AI services
  • Detailed error messages for debugging

Architecture

The server is organized into modular tool categories:

manim_mcp/
β”œβ”€β”€ server.py           # Main MCP server
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ planning.py     # Concept planning
β”‚   β”œβ”€β”€ code_generation.py  # Code generation & refinement
β”‚   β”œβ”€β”€ rendering.py    # Manim rendering
β”‚   β”œβ”€β”€ vision.py       # Frame analysis
β”‚   β”œβ”€β”€ audio.py        # TTS & narration
β”‚   β”œβ”€β”€ video.py        # Video processing
β”‚   └── quiz.py         # Quiz generation

Requirements

  • Python >= 3.12
  • mcp >= 1.0.0
  • huggingface_hub >= 0.25.0
  • manim >= 0.18.1
  • pydantic >= 2.0.0
  • aiohttp >= 3.8.0
  • FFmpeg (system dependency)

Contributing

Contributions are welcome! Areas for improvement:

  • Additional AI model integrations
  • More video processing tools
  • Enhanced error recovery
  • Performance optimizations

License

MIT License - see LICENSE file for details

Support

For issues, questions, or feature requests, please open an issue on the repository.

Credits

Built with:


Version: 0.1.0
Author: NeuroAnim Team
Status: Beta - Ready for production use with active development