Spaces:
Runtime error
Runtime error
File size: 7,114 Bytes
f16cdd3 c9b71c9 a44ddc0 4a4fd49 f16cdd3 3ee00d0 f16cdd3 eaddcd0 14bf998 eaddcd0 945ea8f 3ee00d0 945ea8f 39b7a18 945ea8f 39b7a18 2ddbac2 bcc38c8 39b7a18 2ddbac2 dbeeec1 945ea8f 39b7a18 bcc38c8 3ee00d0 bcc38c8 3ee00d0 bcc38c8 3ee00d0 bcc38c8 3ee00d0 bcc38c8 3ee00d0 0601340 3ee00d0 0601340 3ee00d0 0601340 3ee00d0 0601340 3ee00d0 0601340 3ee00d0 0601340 3ee00d0 0601340 3ee00d0 bcc38c8 3ee00d0 bcc38c8 3ee00d0 945ea8f 3ee00d0 945ea8f 3ee00d0 945ea8f 3ee00d0 945ea8f 3ee00d0 945ea8f 0601340 3ee00d0 e06b8fa b21d1ac 0601340 3ee00d0 0601340 b21d1ac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
title: "Scriptura"
short_description: "MultiAgent System for Screenplay Creation and Editing"
emoji: 🎞️
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
license: mit
tag: agent-demo-track
---
# Scriptura: A Multi-Agent System for Screenplay Creation and Editing
The explanation **video** is available at: https://www.youtube.com/watch?v=I0201ruB1Uo&ab_channel=3DLabFactory.
The screenplay used in the video as sample is available at: https://www.studiobinder.com/blog/best-free-movie-scripts-online/
## Introduction
**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.
At its heart:
- **Qwen3-32B** serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
- **Gemma-3-27B-IT** acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.
For media generation, Scriptura integrates:
- **MusicGen** models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
- **FLUX (black-forest-labs/FLUX.1-dev)** for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.
Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.
## Agent Capabilities
**Input File Parsing**
: - **Formats accepted**: `TXT`, `PDF`, `DOCX`, `JPEG/PNG`, `MP3/WAV`
- **Process**: PDF/DOCX → plain text; OCR on images; speech-to-text on audio.
- **Why it matters**: Provides structured input for all downstream modules.
**Overall Plot Summary**
: - **Model**: `DeepSeek-R1`
- **Output**: 4–6 sentence summary of main narrative threads (timeframe, tone).
- **Mechanics**: API calls to DeepSeek with retry logic for improved coherence.
**Entity & Theme Extraction**
: - **Technique**: Named Entity Recognition (via **DeepSeek**)
- **Extracts**: Characters, locations, key events, recurring themes, narrative tone.
- **Output**: JSON/CSV + ~5-sentence abstract.
**Rights & Licensing Verification**
: - **Web Search ON**: Queries DuckDuckGo API → fetch license info if match.
- **Web Search OFF**: May recognize very famous works internally (e.g. “Harry Potter”) but not guaranteed.
- **If no match & search OFF**: No licensing check.
**Image Generation (Storyboard & Concept Art)**
: - **Model**: `FLUX (black-forest-labs/FLUX.1-dev)`
- **Trigger**: “Generate Image” / storyboard phase.
- **Process**: DeepSeek crafts cinematic prompt → FLUX returns PNG/JPEG + caption.
**Audio Generation (Music & Sound Effects)**
: - **Model**: `MusicGen (facebook/musicgen-melody)`
- **Trigger**: “Generate Audio.”
- **Process**: Send prompt → receive MP3/WAV (standalone audio, no text/images).
**In-Depth Analysis of Key Points**
: - **Extracts**:
- Characters (role, gender, description)
- Locations (interior/exterior, period, geography)
- Plot Points (crucial narrative beats via Story Understanding models)
- **Extras**: Semantic toponym extraction → internal scene maps; detect transitions (“Suddenly,” “Meanwhile”).
**Optional Web Search**
: - **Checkbox** toggles DuckDuckGo API lookups.
- **If Enabled**: search preconfigured sites (free & paid) for scripts, sound effects.
- **Output**: List of links + short summaries.
---
## Agent Flow
```mermaid
flowchart LR
A[Start Agent] --> B[Load Input (text, image, audio)]
B --> C[Preprocessing: PDF/DOCX → text, OCR, audio transcription]
C --> D[Generate Plot Summary (DeepSeek)]
D --> E[Extract Entities & Themes (DeepSeek)]
E --> F {Web Search Enabled?}
F -->|Yes| G[Web Search via DuckDuckGo API]
F -->|No| H[Continue Offline Analysis]
H --> I[Rights & Licensing Check]
I --> J[Deep Analysis: characters, locations, plot points]
J --> K {Image Generation Requested?}
K -->|Yes| L[API Call to FLUX for storyboard/concept art]
K -->|No| M[Skip Image Generation]
M --> N {Audio Generation Requested?}
N -->|Yes| O[API Call to MusicGen for audio tracks]
N -->|No| P[Skip Audio Generation]
L & O --> Q[Final Output: text, JSON/CSV, images, audio]
```
---
## Deployment & Access and the Code Overview
---
## Use Cases
**Independent Writer**
: - Upload a screenplay and quickly get a summary, a list of characters, and locations.
- Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).
- Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).
**Film Production Company**
: - Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.
- Use the web search feature to find reference scripts or specific sound effects from free/paid sources.
- Develop visual storyboards and audio prototypes to share with directors, artists, and investors.
**Translation and Adaptation Agency**
: - Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).
- Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).
- Produce reference audio via MusicGen to test culturally appropriate music for the target audience.
**Digital Humanities Course**
: - Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.
- Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.
- Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.
---
## Credits
---
## Acknowledgements
---
### Contributors:
- Code development and implementation made by **luke9705**;
- Ideas creation, testing and videomaking conducted by **OrianIce**;
- Research and testing by **Loren1214**;
- Code revisions by **DDPM**.
---
### Sources
- Russell, S., & Norvig, P. (2021). *Artificial Intelligence: A Modern Approach* (3rd ed.). Pearson.
- Cambria, E., & White, B. (2014). *Jumping NLP Curves: A Review of Natural Language Processing Research*. IEEE Computational Intelligence Magazine, 9(2), 48–57.
- Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2022). *Hierarchical Text-Conditional Image Generation with CLIP Latents*. arXiv preprint arXiv:2204.06125.
|