MultiAgent-System-for-Screenplay-Creation

Runtime error

App Files Files Community

MultiAgent-System-for-Screenplay-Creation / README.md

luke9705

Update README.md

a44ddc0 verified 7 months ago

preview code

raw

history blame

7.11 kB

	---
	title: "Scriptura"
	short_description: "MultiAgent System for Screenplay Creation and Editing"
	emoji: 🎞️
	colorFrom: yellow
	colorTo: blue
	sdk: gradio
	sdk_version: 5.32.1
	app_file: app.py
	pinned: false
	license: mit
	tag: agent-demo-track
	---

	# Scriptura: A Multi-Agent System for Screenplay Creation and Editing

	The explanation video is available at: https://www.youtube.com/watch?v=I0201ruB1Uo&ab_channel=3DLabFactory.

	The screenplay used in the video as sample is available at: https://www.studiobinder.com/blog/best-free-movie-scripts-online/

	## Introduction

	Scriptura is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.

	At its heart:
	- Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
	- Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.

	For media generation, Scriptura integrates:
	- MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
	- FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.

	Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.

	## Agent Capabilities

	Input File Parsing
	: - Formats accepted: `TXT`, `PDF`, `DOCX`, `JPEG/PNG`, `MP3/WAV`
	- Process: PDF/DOCX → plain text; OCR on images; speech-to-text on audio.
	- Why it matters: Provides structured input for all downstream modules.

	Overall Plot Summary
	: - Model: `DeepSeek-R1`
	- Output: 4–6 sentence summary of main narrative threads (timeframe, tone).
	- Mechanics: API calls to DeepSeek with retry logic for improved coherence.

	Entity & Theme Extraction
	: - Technique: Named Entity Recognition (via DeepSeek)
	- Extracts: Characters, locations, key events, recurring themes, narrative tone.
	- Output: JSON/CSV + ~5-sentence abstract.

	Rights & Licensing Verification
	: - Web Search ON: Queries DuckDuckGo API → fetch license info if match.
	- Web Search OFF: May recognize very famous works internally (e.g. “Harry Potter”) but not guaranteed.
	- If no match & search OFF: No licensing check.

	Image Generation (Storyboard & Concept Art)
	: - Model: `FLUX (black-forest-labs/FLUX.1-dev)`
	- Trigger: “Generate Image” / storyboard phase.
	- Process: DeepSeek crafts cinematic prompt → FLUX returns PNG/JPEG + caption.

	Audio Generation (Music & Sound Effects)
	: - Model: `MusicGen (facebook/musicgen-melody)`
	- Trigger: “Generate Audio.”
	- Process: Send prompt → receive MP3/WAV (standalone audio, no text/images).

	In-Depth Analysis of Key Points
	: - Extracts:
	- Characters (role, gender, description)
	- Locations (interior/exterior, period, geography)
	- Plot Points (crucial narrative beats via Story Understanding models)
	- Extras: Semantic toponym extraction → internal scene maps; detect transitions (“Suddenly,” “Meanwhile”).

	Optional Web Search
	: - Checkbox toggles DuckDuckGo API lookups.
	- If Enabled: search preconfigured sites (free & paid) for scripts, sound effects.
	- Output: List of links + short summaries.


	---

	## Agent Flow

	```mermaid
	flowchart LR
	A[Start Agent] --> B[Load Input (text, image, audio)]
	B --> C[Preprocessing: PDF/DOCX → text, OCR, audio transcription]
	C --> D[Generate Plot Summary (DeepSeek)]
	D --> E[Extract Entities & Themes (DeepSeek)]
	E --> F {Web Search Enabled?}
	F -->\|Yes\| G[Web Search via DuckDuckGo API]
	F -->\|No\| H[Continue Offline Analysis]
	H --> I[Rights & Licensing Check]
	I --> J[Deep Analysis: characters, locations, plot points]
	J --> K {Image Generation Requested?}
	K -->\|Yes\| L[API Call to FLUX for storyboard/concept art]
	K -->\|No\| M[Skip Image Generation]
	M --> N {Audio Generation Requested?}
	N -->\|Yes\| O[API Call to MusicGen for audio tracks]
	N -->\|No\| P[Skip Audio Generation]
	L & O --> Q[Final Output: text, JSON/CSV, images, audio]
	```
	---
	## Deployment & Access and the Code Overview

	---
	## Use Cases

	Independent Writer
	: - Upload a screenplay and quickly get a summary, a list of characters, and locations.
	- Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).
	- Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).

	Film Production Company
	: - Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.
	- Use the web search feature to find reference scripts or specific sound effects from free/paid sources.
	- Develop visual storyboards and audio prototypes to share with directors, artists, and investors.

	Translation and Adaptation Agency
	: - Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).
	- Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).
	- Produce reference audio via MusicGen to test culturally appropriate music for the target audience.

	Digital Humanities Course
	: - Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.
	- Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.
	- Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.

	---
	## Credits


	---
	## Acknowledgements


	---
	### Contributors:
	- Code development and implementation made by luke9705;
	- Ideas creation, testing and videomaking conducted by OrianIce;
	- Research and testing by Loren1214;
	- Code revisions by DDPM.

	---
	### Sources

	- Russell, S., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (3rd ed.). Pearson.
	- Cambria, E., & White, B. (2014). Jumping NLP Curves: A Review of Natural Language Processing Research. IEEE Computational Intelligence Magazine, 9(2), 48–57.
	- Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125.