Spaces:
Runtime error
Runtime error
| title: "Scriptura" | |
| short_description: "MultiAgent System for Screenplay Creation and Editing" | |
| emoji: 🎞️ | |
| colorFrom: yellow | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.32.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| tag: agent-demo-track | |
| # Scriptura: A Multi-Agent System for Screenplay Creation and Editing | |
| The explanation **video** is available at: https://www.youtube.com/watch?v=I0201ruB1Uo&ab_channel=3DLabFactory. | |
| The screenplay used in the video as sample is available at: https://www.studiobinder.com/blog/best-free-movie-scripts-online/ | |
| ## Introduction | |
| **Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity. | |
| At its heart: | |
| - **Qwen3-32B** serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system. | |
| - **Gemma-3-27B-IT** acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation. | |
| For media generation, Scriptura integrates: | |
| - **MusicGen** models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples. | |
| - **FLUX (black-forest-labs/FLUX.1-dev)** for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow. | |
| Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed. | |
| ## Agent Capabilities | |
| **Input File Parsing** | |
| : - **Formats accepted**: `TXT`, `PDF`, `DOCX`, `JPEG/PNG`, `MP3/WAV` | |
| - **Process**: PDF/DOCX → plain text; OCR on images; speech-to-text on audio. | |
| - **Why it matters**: Provides structured input for all downstream modules. | |
| **Overall Plot Summary** | |
| : - **Model**: `DeepSeek-R1` | |
| - **Output**: 4–6 sentence summary of main narrative threads (timeframe, tone). | |
| - **Mechanics**: API calls to DeepSeek with retry logic for improved coherence. | |
| **Entity & Theme Extraction** | |
| : - **Technique**: Named Entity Recognition (via **DeepSeek**) | |
| - **Extracts**: Characters, locations, key events, recurring themes, narrative tone. | |
| - **Output**: JSON/CSV + ~5-sentence abstract. | |
| **Rights & Licensing Verification** | |
| : - **Web Search ON**: Queries DuckDuckGo API → fetch license info if match. | |
| - **Web Search OFF**: May recognize very famous works internally (e.g. “Harry Potter”) but not guaranteed. | |
| - **If no match & search OFF**: No licensing check. | |
| **Image Generation (Storyboard & Concept Art)** | |
| : - **Model**: `FLUX (black-forest-labs/FLUX.1-dev)` | |
| - **Trigger**: “Generate Image” / storyboard phase. | |
| - **Process**: DeepSeek crafts cinematic prompt → FLUX returns PNG/JPEG + caption. | |
| **Audio Generation (Music & Sound Effects)** | |
| : - **Model**: `MusicGen (facebook/musicgen-melody)` | |
| - **Trigger**: “Generate Audio.” | |
| - **Process**: Send prompt → receive MP3/WAV (standalone audio, no text/images). | |
| **In-Depth Analysis of Key Points** | |
| : - **Extracts**: | |
| - Characters (role, gender, description) | |
| - Locations (interior/exterior, period, geography) | |
| - Plot Points (crucial narrative beats via Story Understanding models) | |
| - **Extras**: Semantic toponym extraction → internal scene maps; detect transitions (“Suddenly,” “Meanwhile”). | |
| **Optional Web Search** | |
| : - **Checkbox** toggles DuckDuckGo API lookups. | |
| - **If Enabled**: search preconfigured sites (free & paid) for scripts, sound effects. | |
| - **Output**: List of links + short summaries. | |
| --- | |
| ## Agent Flow | |
| ```mermaid | |
| flowchart LR | |
| A[Start Agent] --> B[Load Input (text, image, audio)] | |
| B --> C[Preprocessing: PDF/DOCX → text, OCR, audio transcription] | |
| C --> D[Generate Plot Summary (DeepSeek)] | |
| D --> E[Extract Entities & Themes (DeepSeek)] | |
| E --> F {Web Search Enabled?} | |
| F -->|Yes| G[Web Search via DuckDuckGo API] | |
| F -->|No| H[Continue Offline Analysis] | |
| H --> I[Rights & Licensing Check] | |
| I --> J[Deep Analysis: characters, locations, plot points] | |
| J --> K {Image Generation Requested?} | |
| K -->|Yes| L[API Call to FLUX for storyboard/concept art] | |
| K -->|No| M[Skip Image Generation] | |
| M --> N {Audio Generation Requested?} | |
| N -->|Yes| O[API Call to MusicGen for audio tracks] | |
| N -->|No| P[Skip Audio Generation] | |
| L & O --> Q[Final Output: text, JSON/CSV, images, audio] | |
| ``` | |
| --- | |
| ## Deployment & Access and the Code Overview | |
| --- | |
| ## Use Cases | |
| **Independent Writer** | |
| : - Upload a screenplay and quickly get a summary, a list of characters, and locations. | |
| - Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs). | |
| - Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV). | |
| **Film Production Company** | |
| : - Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues. | |
| - Use the web search feature to find reference scripts or specific sound effects from free/paid sources. | |
| - Develop visual storyboards and audio prototypes to share with directors, artists, and investors. | |
| **Translation and Adaptation Agency** | |
| : - Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV). | |
| - Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX). | |
| - Produce reference audio via MusicGen to test culturally appropriate music for the target audience. | |
| **Digital Humanities Course** | |
| : - Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines. | |
| - Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment. | |
| - Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum. | |
| --- | |
| ## Credits | |
| --- | |
| ## Acknowledgements | |
| --- | |
| ### Contributors: | |
| - Code development and implementation made by **luke9705**; | |
| - Ideas creation, testing and videomaking conducted by **OrianIce**; | |
| - Research and testing by **Loren1214**; | |
| - Code revisions by **DDPM**. | |
| --- | |
| ### Sources | |
| - Russell, S., & Norvig, P. (2021). *Artificial Intelligence: A Modern Approach* (3rd ed.). Pearson. | |
| - Cambria, E., & White, B. (2014). *Jumping NLP Curves: A Review of Natural Language Processing Research*. IEEE Computational Intelligence Magazine, 9(2), 48–57. | |
| - Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2022). *Hierarchical Text-Conditional Image Generation with CLIP Latents*. arXiv preprint arXiv:2204.06125. | |