Spaces:

mgbam
/

CingenAI

Sleeping

App Files Files Community

mgbam commited on Jun 5, 2025

Commit

17602ca

verified ·

1 Parent(s): 3539a49

Update README.md

Browse files

Files changed (1) hide show

README.md +163 -96

README.md CHANGED Viewed

@@ -11,51 +11,72 @@ pinned: false
 short_description: Streamlit template space
 ---
-# 🎬 CineGen AI: Cinematic Video Generator 🚀
-CineGen AI is your AI-powered pocket film studio, designed to transform your textual ideas into compelling cinematic concepts, storyboards, and preliminary animatics. Leveraging the multimodal understanding and generation capabilities of Google's Gemini, CineGen AI assists creators in rapidly visualizing narratives.
-**Current Stage:** Alpha/Prototyping (Uses placeholder visuals, focuses on script and visual prompt generation with interactive editing)
-## ✨ Key Features
-*   **AI Story Generation:** Input a core idea, genre, and mood, and let Gemini craft a multi-scene story breakdown.
-*   **Intelligent Scene Detailing:** Each scene includes:
-    *   Setting descriptions
-    *   Characters involved
-    *   Key actions & dialogue snippets
-    *   Visual style and camera angle suggestions
-    *   Emotional beats
-*   **Visual Concept Generation:** For each scene, CineGen AI (via Gemini) generates detailed prompts suitable for advanced AI image generators. (Currently visualizes these as placeholder images with text).
-*   **Interactive Storyboarding:**
-    *   **Script Regeneration:** Modify scene details (action, dialogue, mood) and have Gemini rewrite that specific part of the script.
-    *   **Visual Regeneration:** Provide feedback on visual concepts, and Gemini will refine the image generation prompt.
-*   **Conceptual Advanced Controls:**
-    *   **Character Consistency (Foundation):** Define character descriptions to guide visual generation (prompt-based).
-    *   **Style Transfer (Textual Foundation):** Apply textual descriptions of artistic styles to influence visuals.
-    *   **Camera Angle Selection:** Basic camera angle choices to guide prompt generation.
-*   **Animatic Video Creation:** Stitches generated (placeholder) images into a simple video sequence with `moviepy`.
-*   **Modular Architecture:** Built with Python, Streamlit, and a clear separation of concerns for easier expansion.
-*   **Dockerized:** Ready for deployment on platforms like Hugging Face Spaces.
-## 🛠️ Tech Stack
-*   **Core Logic:** Python
-*   **LLM Backend:** Google Gemini API (via `google-generativeai`)
-*   **UI Framework:** Streamlit
-*   **Image Handling:** Pillow
-*   **Video Assembly:** MoviePy
-*   **Containerization:** Docker (for Hugging Face Spaces / portability)
-## ⚙️ Setup & Installation
 1.  **Clone the Repository:**
     ```bash
-    git clone <your-repo-url>
-    cd cinegen-ai
     ```
-2.  **Create Python Virtual Environment (Recommended):**
     ```bash
     python -m venv venv
     source venv/bin/activate  # On Windows: venv\Scripts\activate
@@ -65,70 +86,116 @@ CineGen AI is your AI-powered pocket film studio, designed to transform your tex
     ```bash
     pip install -r requirements.txt
     ```
-4.  **Set Up API Key:**
-    *   Create a `.streamlit/secrets.toml` file in the root of the project.
-    *   Add your Google Gemini API key:
         ```toml
-        GEMINI_API_KEY = "YOUR_ACTUAL_GEMINI_API_KEY"
         ```
-    *   **IMPORTANT:** Do NOT commit your `secrets.toml` file if your repository is public. Add `.streamlit/secrets.toml` to your `.gitignore` file.
-5.  **Font Requirement (for Placeholder Images):**
-    *   The `visual_engine.py` currently tries to use "arial.ttf".
-    *   Ensure this font is available on your system, or modify the `ImageFont.truetype("arial.ttf", 24)` line in `core/visual_engine.py` to point to a valid `.ttf` font file on your system, or let it fall back to `ImageFont.load_default()`.
-    *   On Debian/Ubuntu, you can install common Microsoft fonts: `sudo apt-get update && sudo apt-get install ttf-mscorefonts-installer`
-6.  **Run the Streamlit App:**
-    ```bash
-    streamlit run app.py
-    ```
-## 🐳 Docker & Hugging Face Spaces Deployment
-1.  Ensure Docker is installed and running.
-2.  Build the Docker image (optional, for local testing):
-    ```bash
-    docker build -t cinegen-ai .
-    ```
-3.  Run the Docker container (optional, for local testing):
     ```bash
-    docker run -p 8501:8501 -e GEMINI_API_KEY="YOUR_ACTUAL_GEMINI_API_KEY" cinegen-ai
     ```
-    (Note: Passing API key as env var for local Docker run. For HF Spaces, use their secrets management).
-4.  **For Hugging Face Spaces:**
-    *   Push your code (including the `Dockerfile`) to a GitHub repository.
-    *   Create a new Space on Hugging Face, selecting "Docker" as the SDK.
-    *   Link it to your GitHub repository.
-    *   In the Space settings, add `GEMINI_API_KEY` as a secret.
-    *   The Space will build and deploy your application.
-## 🚀 Usage
-1.  Open the app in your browser (usually `http://localhost:8501`).
-2.  Use the sidebar to input your story idea, genre, mood, and number of scenes.
-3.  Click "Generate Full Story Concept."
-4.  Review the generated scenes and visual concepts.
-5.  Use the "Edit Scene Script" and "Edit Scene Visuals" popovers within each scene to interactively refine content.
-6.  (Optional) Define characters or styles in the sidebar for more guided generation.
-7.  Once satisfied, click "Assemble Animatic Video."
-## 🔮 Future Enhancements (Roadmap to "Wow")
-*   **True AI Image Generation:** Integrate state-of-the-art text-to-image models (e.g., Stable Diffusion, DALL-E 3, Midjourney API) to replace placeholders.
-*   **Advanced Character Consistency:** Implement techniques like LoRAs, textual inversion, or re-identification models for visually consistent characters across scenes.
-*   **Image-Based Style Transfer:** Allow users to upload reference images to define artistic styles.
-*   **AI Sound Design:** Generate or suggest sound effects and background music.
-*   **Direct Video Snippets:** Integrate text-to-video models for dynamic short clips.
-*   **Enhanced Camera Controls & Shot Design:** More granular control over virtual cinematography.
-*   **User Accounts & Project Management.**
-*   **Export Options:** PDF storyboards, FDX/script formats.
-## 📄 License
-Consider a license like MIT or Apache 2.0 if you plan for open collaboration or wish to be permissive. If this is a commercial product, consult with legal counsel for appropriate licensing.
-For now, let's assume:
-**MIT License** (Add the full MIT License text if you choose this)
 ---
-Copyright (c) [Year] [Your Name/Company Name]

 short_description: Streamlit template space
 ---
+# CineGen AI Ultra+ 🎬✨
+**Visionary Cinematic Pre-Production Powered by AI**
+CineGen AI Ultra+ is a Streamlit web application designed to accelerate the creative pre-production process for films, animations, and other visual storytelling projects. It leverages the power of Large Language Models (Google's Gemini) and other AI tools to transform a simple story idea into a rich cinematic treatment, complete with scene breakdowns, visual style suggestions, AI-generated concept art/video clips, and a narrated animatic.
+## Features
+*   **AI Creative Director:** Input a core story idea, genre, and mood.
+*   **Cinematic Treatment Generation:**
+    *   Gemini generates a detailed multi-scene treatment.
+    *   Each scene includes:
+        *   Title, Emotional Beat, Setting Description
+        *   Characters Involved, Character Focus Moment
+        *   Key Plot Beat, Suggested Dialogue Hook
+        *   **Proactive Director's Suggestions (감독 - Gamdok/Director):** Visual Style, Camera Work, Sound Design.
+        *   **Asset Generation Aids:** Suggested Asset Type (Image/Video Clip), Video Motion Description & Duration, Image/Video Keywords, Pexels Search Queries.
+*   **Visual Asset Generation:**
+    *   **Image Generation:** Utilizes DALL-E 3 (via OpenAI API) to generate concept art for scenes based on derived prompts.
+    *   **Stock Footage Fallback:** Uses Pexels API for relevant stock images/videos if AI generation is disabled or fails.
+    *   **Video Clip Generation (Placeholder):** Integrated structure for text-to-video/image-to-video generation using RunwayML API (requires user to implement actual API calls in `core/visual_engine.py`). Placeholder generates dummy video clips.
+*   **Character Definition:** Define key characters with visual descriptions for more consistent AI-generated visuals.
+*   **Global Style Overrides:** Apply global stylistic keywords (e.g., "Hyper-Realistic Gritty Noir," "Vintage Analog Sci-Fi") to influence visual generation.
+*   **AI-Powered Narration:**
+    *   Gemini crafts a narration script based on the generated treatment.
+    *   ElevenLabs API synthesizes the narration into natural-sounding audio.
+    *   Customizable voice ID and narration style.
+*   **Iterative Refinement:**
+    *   Edit scene treatments and regenerate them with AI assistance.
+    *   Refine DALL-E prompts based on feedback and regenerate visuals.
+*   **Cinematic Animatic Assembly:**
+    *   Combines generated visual assets (images/video clips) and the synthesized narration into a downloadable `.mp4` animatic.
+    *   Customizable per-scene duration for pacing control.
+    *   Ken Burns effect on still images and text overlays for scene context.
+*   **Secrets Management:** Securely loads API keys from Streamlit secrets or environment variables.
+## Project Structure
+Use code with caution.
+Markdown
+CineGenAI/
+├── .streamlit/
+│ └── secrets.toml # API Keys and configuration (DO NOT COMMIT if public)
+├── assets/
+│ └── fonts/
+│ └── arial.ttf # Example font file (ensure it's available or update path)
+├── core/
+│ ├── init.py
+│ ├── gemini_handler.py # Manages interactions with the Gemini API
+│ ├── visual_engine.py # Handles image/video generation (DALL-E, Pexels, RunwayML placeholder) and video assembly
+│ └── prompt_engineering.py # Contains functions to craft detailed prompts for Gemini
+├── temp_cinegen_media/ # Temporary directory for generated media (add to .gitignore)
+├── app.py # Main Streamlit application script
+├── Dockerfile # For containerizing the application
+├── Dockerfile.test # (Optional) For testing
+├── requirements.txt # Python dependencies
+├── README.md # This file
+└── .gitattributes # For Git LFS if handling large font files
+## Setup and Installation
 1.  **Clone the Repository:**
     ```bash
+    git clone <repository_url>
+    cd CineGenAI
     ```
+2.  **Create a Virtual Environment (Recommended):**
     ```bash
     python -m venv venv
     source venv/bin/activate  # On Windows: venv\Scripts\activate
     ```bash
     pip install -r requirements.txt
     ```
+    *Note: `MoviePy` might require `ffmpeg` to be installed on your system. On Debian/Ubuntu: `sudo apt-get install ffmpeg`. On macOS with Homebrew: `brew install ffmpeg`.*
+4.  **Set Up API Keys:**
+    You need API keys for the following services:
+    *   Google Gemini API
+    *   OpenAI API (for DALL-E)
+    *   ElevenLabs API (and optionally a specific Voice ID)
+    *   Pexels API
+    *   RunwayML API (if implementing full video generation)
+    Store these keys securely. You have two primary options:
+    *   **Streamlit Secrets (Recommended for Hugging Face Spaces / Streamlit Cloud):**
+        Create a file `.streamlit/secrets.toml` (make sure this file is in your `.gitignore` if your repository is public!) with the following format:
         ```toml
+        GEMINI_API_KEY = "your_gemini_api_key"
+        OPENAI_API_KEY = "your_openai_api_key"
+        ELEVENLABS_API_KEY = "your_elevenlabs_api_key"
+        PEXELS_API_KEY = "your_pexels_api_key"
+        ELEVENLABS_VOICE_ID = "your_elevenlabs_voice_id" # e.g., "Rachel" or a custom ID
+        RUNWAY_API_KEY = "your_runwayml_api_key"
         ```
+    *   **Environment Variables (for local development):**
+        Set the environment variables directly in your terminal or `.env` file (using a library like `python-dotenv` which is not included by default). The application will look for these if Streamlit secrets are not found.
+        ```bash
+        export GEMINI_API_KEY="your_gemini_api_key"
+        export OPENAI_API_KEY="your_openai_api_key"
+        # ... and so on for other keys
+        ```
+5.  **Font:**
+    Ensure the font file specified in `core/visual_engine.py` (e.g., `arial.ttf`) is accessible. The script tries common system paths, but you can place it in `assets/fonts/` and adjust the path in `visual_engine.py` if needed. If using Docker, ensure the font is copied into the image (see `Dockerfile`).
+6.  **RunwayML Implementation (Important):**
+    The current integration for RunwayML in `core/visual_engine.py` (method `_generate_video_clip_with_runwayml`) is a **placeholder**. You will need to:
+    *   Install the official RunwayML SDK if available.
+    *   Implement the actual API calls to RunwayML for text-to-video or image-to-video generation.
+    *   The placeholder currently creates a dummy video clip using MoviePy.
+## Running the Application
+1.  **Local Development:**
     ```bash
+    streamlit run app.py
     ```
+    The application should open in your web browser.
+2.  **Using Docker (Optional):**
+    *   Build the Docker image:
+        ```bash
+        docker build -t cinegen-ai .
+        ```
+    *   Run the Docker container (ensure API keys are passed as environment variables or handled via mounted secrets if not baked into the image for local testing):
+        ```bash
+        docker run -p 8501:8501 \
+          -e GEMINI_API_KEY="your_key" \
+          -e OPENAI_API_KEY="your_key" \
+          # ... other env vars ...
+          cinegen-ai
+        ```
+        Access the app at `http://localhost:8501`.
+## How to Use
+1.  **Input Creative Seed:** Provide your core story idea, select a genre, mood, number of scenes, and AI director style in the sidebar.
+2.  **Generate Treatment:** Click "🌌 Generate Cinematic Treatment". The AI will produce a multi-scene breakdown.
+3.  **Review & Refine:**
+    *   Examine each scene's details, including AI-generated visuals (or placeholders).
+    *   Use the "✏️ Edit Scene X Treatment" and "🎨 Edit Scene X Visual Prompt" popovers to provide feedback and regenerate specific parts of the treatment or visuals.
+    *   Adjust per-scene "Dominant Shot Type" and "Scene Duration" for the animatic.
+4.  **Fine-Tuning (Sidebar):**
+    *   Define characters with visual descriptions.
+    *   Apply global style overrides.
+    *   Set narrator voice ID and narration style.
+5.  **Assemble Animatic:** Once you're satisfied with the treatment, visuals, and narration script (generated automatically), click "🎬 Assemble Narrated Cinematic Animatic".
+6.  **View & Download:** The generated animatic video will appear, and you can download it.
+## Key Technologies
+*   **Python**
+*   **Streamlit:** Web application framework.
+*   **Google Gemini API:** For core text generation (treatment, narration script, prompt refinement).
+*   **OpenAI API (DALL-E 3):** For AI image generation.
+*   **ElevenLabs API:** For text-to-speech narration.
+*   **Pexels API:** For stock image/video fallbacks.
+*   **RunwayML API (Placeholder):** For AI video clip generation.
+*   **MoviePy:** For video processing and animatic assembly.
+*   **Pillow (PIL):** For image manipulation.
+## Future Enhancements / To-Do
+*   Implement full, robust RunwayML API integration.
+*   Option to upload custom seed images for image-to-video generation.
+*   More sophisticated control over Ken Burns effect (pan direction, zoom intensity).
+*   Allow users to upload their own audio for narration or background music.
+*   Advanced shot list generation and export.
+*   Integration with other AI video/image models.
+*   User accounts and project saving.
+*   More granular error handling and user feedback in the UI.
+*   Refine JSON cleaning from Gemini to be even more robust.
+## Contributing
+Contributions, bug reports, and feature requests are welcome! Please open an issue or submit a pull request.
+## License
+This project is currently under [Specify License Here - e.g., MIT, Apache 2.0, or state as private/proprietary].
 ---
+*This README provides a comprehensive overview. Ensure all paths, commands, and API key instructions match your specific project setup.*