Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,139 +12,155 @@ license: mit
|
|
| 12 |
tag: agent-demo-track
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# Scriptura: A
|
| 16 |
|
| 17 |
-
The explanation
|
| 18 |
|
| 19 |
-
The screenplay used in the video as sample is available
|
| 20 |
|
| 21 |
## Introduction
|
| 22 |
|
| 23 |
**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.
|
| 24 |
|
| 25 |
At its heart:
|
| 26 |
-
|
| 27 |
-
|
|
|
|
| 28 |
|
| 29 |
For media generation, Scriptura integrates:
|
| 30 |
-
|
| 31 |
-
|
|
|
|
| 32 |
|
| 33 |
Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.
|
| 34 |
|
|
|
|
|
|
|
| 35 |
## Agent Capabilities
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
**
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
: - **Web Search ON**: Queries DuckDuckGo API → fetch license info if match.
|
| 54 |
-
- **Web Search OFF**: May recognize very famous works internally (e.g. “Harry Potter”) but not guaranteed.
|
| 55 |
-
- **If no match & search OFF**: No licensing check.
|
| 56 |
-
|
| 57 |
-
**Image Generation (Storyboard & Concept Art)**
|
| 58 |
-
: - **Model**: `FLUX (black-forest-labs/FLUX.1-dev)`
|
| 59 |
-
- **Trigger**: “Generate Image” / storyboard phase.
|
| 60 |
-
- **Process**: DeepSeek crafts cinematic prompt → FLUX returns PNG/JPEG + caption.
|
| 61 |
-
|
| 62 |
-
**Audio Generation (Music & Sound Effects)**
|
| 63 |
-
: - **Model**: `MusicGen (facebook/musicgen-melody)`
|
| 64 |
-
- **Trigger**: “Generate Audio.”
|
| 65 |
-
- **Process**: Send prompt → receive MP3/WAV (standalone audio, no text/images).
|
| 66 |
-
|
| 67 |
-
**In-Depth Analysis of Key Points**
|
| 68 |
-
: - **Extracts**:
|
| 69 |
-
- Characters (role, gender, description)
|
| 70 |
-
- Locations (interior/exterior, period, geography)
|
| 71 |
-
- Plot Points (crucial narrative beats via Story Understanding models)
|
| 72 |
-
- **Extras**: Semantic toponym extraction → internal scene maps; detect transitions (“Suddenly,” “Meanwhile”).
|
| 73 |
-
|
| 74 |
-
**Optional Web Search**
|
| 75 |
-
: - **Checkbox** toggles DuckDuckGo API lookups.
|
| 76 |
-
- **If Enabled**: search preconfigured sites (free & paid) for scripts, sound effects.
|
| 77 |
-
- **Output**: List of links + short summaries.
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
---
|
| 81 |
|
| 82 |
## Agent Flow
|
| 83 |
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
N -->|Yes| O[API Call to MusicGen for audio tracks]
|
| 100 |
-
N -->|No| P[Skip Audio Generation]
|
| 101 |
-
L & O --> Q[Final Output: text, JSON/CSV, images, audio]
|
| 102 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
---
|
| 104 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
---
|
| 107 |
## Use Cases
|
| 108 |
|
| 109 |
**Independent Writer**
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
|
| 114 |
**Film Production Company**
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
|
| 119 |
**Translation and Adaptation Agency**
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
|
| 124 |
**Digital Humanities Course**
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
|
| 129 |
---
|
| 130 |
-
## Credits
|
| 131 |
-
|
| 132 |
|
| 133 |
-
|
| 134 |
-
## Acknowledgements
|
| 135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
|
| 137 |
---
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
-
|
| 142 |
-
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
-
|
| 148 |
-
-
|
| 149 |
-
-
|
| 150 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
tag: agent-demo-track
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Scriptura: A MultiAgent System for Screenplay Creation and Editing
|
| 16 |
|
| 17 |
+
The explanation video is available [here](https://www.youtube.com/watch?v=I0201ruB1Uo)
|
| 18 |
|
| 19 |
+
The screenplay used in the video as sample is available [here](https://www.studiobinder.com/blog/best-free-movie-scripts-online/)
|
| 20 |
|
| 21 |
## Introduction
|
| 22 |
|
| 23 |
**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.
|
| 24 |
|
| 25 |
At its heart:
|
| 26 |
+
|
| 27 |
+
* Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
|
| 28 |
+
* Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.
|
| 29 |
|
| 30 |
For media generation, Scriptura integrates:
|
| 31 |
+
|
| 32 |
+
* MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
|
| 33 |
+
* FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.
|
| 34 |
|
| 35 |
Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.
|
| 36 |
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
## Agent Capabilities
|
| 40 |
|
| 41 |
+
Scriptura provides a rich set of agents and tools to cover the full screenplay production and enrichment pipeline:
|
| 42 |
+
|
| 43 |
+
- **Text Analysis & Summarization**
|
| 44 |
+
- Automatically extracts key themes, character arcs, and plot points
|
| 45 |
+
- Segments and summarizes scenes for rapid iteration
|
| 46 |
+
|
| 47 |
+
- **Multimodal Ingestion**
|
| 48 |
+
- Supports PDF, DOCX, ODT, TXT and image uploads
|
| 49 |
+
- Transcribes audio files using OpenAI Whisper
|
| 50 |
+
|
| 51 |
+
- **Image Generation**
|
| 52 |
+
- On-the-fly storyboard and concept art creation via FLUX (black-forest-labs/FLUX.1-dev)
|
| 53 |
+
|
| 54 |
+
- **Audio Generation**
|
| 55 |
+
- Produces original soundtracks and SFX with MusicGen (AudioCraft spec)
|
| 56 |
+
- Allows sample-conditioned audio generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
- **Captioning & Metadata**
|
| 59 |
+
- Auto-generates captions and descriptions for images using Gemma-3-27B-IT
|
| 60 |
+
|
| 61 |
+
- **Optional Web Research**
|
| 62 |
+
- Queries DuckDuckGo to fetch example scripts, sound samples, or contextual references
|
| 63 |
|
| 64 |
---
|
| 65 |
|
| 66 |
## Agent Flow
|
| 67 |
|
| 68 |
+
Here’s an example flow demonstrating how you could use the agent.
|
| 69 |
+
|
| 70 |
+
<img alt="Flowchart" src="https://www.canva.com/design/DAGphLlng2I/MZ2cOAnS520rFtnhTP5H6A/view?utm_content=DAGphLlng2I&utm_campaign=designshare&utm_medium=link2&utm_source=uniquelinks&utlId=hca1222039d" width="600"/>
|
| 71 |
+
|
| 72 |
+

|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## Code Overview
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
.
|
| 79 |
+
├── app.py # Entry point: defines Gradio interface and routing logic
|
| 80 |
+
├── system_prompt.txt # System-level prompt template for the CodeAgent
|
| 81 |
+
├── requirements.txt # Python dependencies (Gradio, SmolAgents, OpenAI, etc.)
|
| 82 |
+
└── README.md # Project documentation
|
|
|
|
|
|
|
|
|
|
| 83 |
```
|
| 84 |
+
|
| 85 |
+
* **app.py**
|
| 86 |
+
|
| 87 |
+
* **Agent** class: loads Qwen3-32B model, registers all tools
|
| 88 |
+
* **respond()**: orchestrates between Gradio inputs and CodeAgent
|
| 89 |
+
* Decorated `@tool` functions for image download, media generation, transcription, captioning
|
| 90 |
+
* Gradio `ChatInterface` setup with text/file support and “Enable web search” toggle
|
| 91 |
+
|
| 92 |
+
* **system\_prompt.txt**
|
| 93 |
+
|
| 94 |
+
* Injects the agent’s “way of thinking,” including reasoning structure and error handling
|
| 95 |
+
|
| 96 |
+
* **requirements.txt**
|
| 97 |
+
|
| 98 |
+
* Lists all required libraries (Gradio, SmolAgents, OpenAI, HuggingFace, PDFPlumber, etc.)
|
| 99 |
+
|
| 100 |
---
|
| 101 |
+
|
| 102 |
+
## Deployment & Access
|
| 103 |
+
|
| 104 |
+
### Hugging Face Spaces
|
| 105 |
+
|
| 106 |
+
1. Include `app.py`, `system_prompt.txt`, and `requirements.txt` in the root of your Space.
|
| 107 |
+
2. Configure `OPENAI_API_KEY` and `HF_TOKEN` as Secrets in your Space’s settings.
|
| 108 |
+
3. Make sure the Space is set to use **Python 3.10 or higher**.
|
| 109 |
+
4. Select **Gradio** as the SDK (version 5.32.1).
|
| 110 |
+
5. Pin or share the Space link to collaborate with your team.
|
| 111 |
+
|
| 112 |
+
> **Note:** If you choose to clone this repository and run it locally, make sure to set your own `OPENAI_API_KEY` and `HF_TOKEN` environment variables before launching.
|
| 113 |
|
| 114 |
---
|
| 115 |
## Use Cases
|
| 116 |
|
| 117 |
**Independent Writer**
|
| 118 |
+
* Upload a screenplay and quickly get a summary, a list of characters, and locations.
|
| 119 |
+
* Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).
|
| 120 |
+
* Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).
|
| 121 |
|
| 122 |
**Film Production Company**
|
| 123 |
+
* Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.
|
| 124 |
+
* Use the web search feature to find reference scripts or specific sound effects from free/paid sources.
|
| 125 |
+
* Develop visual storyboards and audio prototypes to share with directors, artists, and investors.
|
| 126 |
|
| 127 |
**Translation and Adaptation Agency**
|
| 128 |
+
* Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).
|
| 129 |
+
* Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).
|
| 130 |
+
* Produce reference audio via MusicGen to test culturally appropriate music for the target audience.
|
| 131 |
|
| 132 |
**Digital Humanities Course**
|
| 133 |
+
* Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.
|
| 134 |
+
* Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.
|
| 135 |
+
* Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.
|
| 136 |
|
| 137 |
---
|
|
|
|
|
|
|
| 138 |
|
| 139 |
+
## Contributors:
|
|
|
|
| 140 |
|
| 141 |
+
* Code development and implementation made by luke9705;
|
| 142 |
+
* Ideas creation, testing and videomaking conducted by OrianIce;
|
| 143 |
+
* Research and testing by Loren1214;
|
| 144 |
+
* Code revisions by DDPM.
|
| 145 |
|
| 146 |
---
|
| 147 |
+
## Sources
|
| 148 |
+
The following libraries, models, and tools power Scriptura’s agents and multimodal capabilities:
|
| 149 |
+
|
| 150 |
+
- **Qwen3-32B** – primary orchestrating LLM for high-level reasoning and workflow management
|
| 151 |
+
- **Gradio** – interactive web UI framework
|
| 152 |
+
- **smolagents** – lightweight multi-agent orchestrator from Hugging Face
|
| 153 |
+
- **huggingface_hub** – model & dataset management
|
| 154 |
+
- **duckduckgo-search** – optional web research integration
|
| 155 |
+
- **openai** – Whisper transcription, GPT-based reasoning
|
| 156 |
+
- **anthropic** – Claude-style LLM support
|
| 157 |
+
- **pdfplumber** – PDF text extraction
|
| 158 |
+
- **docx2txt** – DOCX parsing
|
| 159 |
+
- **odfpy** – ODT parsing
|
| 160 |
+
- **pandas** – data handling
|
| 161 |
+
- **Pillow (PIL)** – image processing
|
| 162 |
+
- **requests** – HTTP client for external APIs
|
| 163 |
+
- **numpy** – numerical operations
|
| 164 |
+
- **MusicGen (AudioCraft)** – soundtrack and SFX generation
|
| 165 |
+
- **FLUX (black-forest-labs/FLUX.1-dev)** – on-the-fly image generation
|
| 166 |
+
- **Gemma-3-27B-IT** – multimodal captioning and metadata
|