File size: 6,617 Bytes
f16cdd3
3ee00d0
f16cdd3
 
 
 
 
 
 
 
3ee00d0
f16cdd3
 
3ee00d0
945ea8f
3ee00d0
945ea8f
3ee00d0
945ea8f
3ee00d0
 
 
 
 
bcc38c8
3ee00d0
 
 
 
945ea8f
3ee00d0
945ea8f
3ee00d0
bcc38c8
3ee00d0
bcc38c8
3ee00d0
 
 
 
bcc38c8
3ee00d0
 
 
 
bcc38c8
3ee00d0
 
 
 
bcc38c8
3ee00d0
 
 
 
0601340
3ee00d0
 
 
 
0601340
3ee00d0
 
 
 
0601340
3ee00d0
 
 
 
 
 
0601340
3ee00d0
 
 
 
0601340
 
3ee00d0
0601340
3ee00d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0601340
3ee00d0
 
bcc38c8
3ee00d0
 
 
 
bcc38c8
3ee00d0
 
 
 
945ea8f
3ee00d0
 
 
 
945ea8f
3ee00d0
 
 
 
945ea8f
3ee00d0
 
945ea8f
 
3ee00d0
 
945ea8f
0601340
3ee00d0
e06b8fa
 
 
0601340
3ee00d0
0601340
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: Scriptura
emoji: 🏆
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
license: mit
tag: agent-demo-track
---

# Scriptura

## Introduction

**Scriptura** is a multi-agent AI system designed to assist authors in creating screenplays, storyboards, and soundtracks. Its main goal is to automate and accelerate the stages of analysis, summarization, and enrichment of narrative text, allowing screenwriters to focus on the creative aspects.

The core stack includes:
- **DeepSeek (deepseek-ai/DeepSeek-R1)** as the base model for all text operations (analysis, summarization, generation) via APIs managed by Nebius AI.
- **FLUX (black-forest-labs/FLUX.1-dev)** for image generation (storyboards, concept art) integrated into the narrative flow.
- **MusicGen (facebook/musicgen-melody)** to create short audio tracks or sound effects, useful for prototyping or presenting.
- Optional web search (integrated with DuckDuckGo API) to fetch external resources (original scripts, sound effects, reference materials).

**Scriptura** supports inputs in various formats:
- **Text**: TXT, PDF, DOCX (automatically converted to structured plain text)  
- **Images**: JPEG, PNG (for analyzing existing storyboards or screenshots)  
- **Audio**: MP3, WAV (for transcribing dialogue or analyzing uploaded soundtracks)  

There are size and duration checks on uploaded files to prevent excessively large inputs.

---

## Agent Capabilities

**Input File Parsing**  
:  - **Formats accepted**: `TXT`, `PDF`, `DOCX`, `JPEG/PNG`, `MP3/WAV`  
   - **Process**: PDF/DOCX → plain text; OCR on images; speech-to-text on audio.  
   - **Why it matters**: Provides structured input for all downstream modules.

**Overall Plot Summary**  
:  - **Model**: `DeepSeek-R1`  
   - **Output**: 4–6 sentence summary of main narrative threads (timeframe, tone).  
   - **Mechanics**: API calls to DeepSeek with retry logic for improved coherence.

**Entity & Theme Extraction**  
:  - **Technique**: Named Entity Recognition (via **DeepSeek**)  
   - **Extracts**: Characters, locations, key events, recurring themes, narrative tone.  
   - **Output**: JSON/CSV + ~5-sentence abstract.

**Rights & Licensing Verification**  
:  - **Web Search ON**: Queries DuckDuckGo API → fetch license info if match.  
   - **Web Search OFF**: May recognize very famous works internally (e.g. “Harry Potter”) but not guaranteed.  
   - **If no match & search OFF**: No licensing check.

**Image Generation (Storyboard & Concept Art)**  
:  - **Model**: `FLUX (black-forest-labs/FLUX.1-dev)`  
   - **Trigger**: “Generate Image” / storyboard phase.  
   - **Process**: DeepSeek crafts cinematic prompt → FLUX returns PNG/JPEG + caption.

**Audio Generation (Music & Sound Effects)**  
:  - **Model**: `MusicGen (facebook/musicgen-melody)`  
   - **Trigger**: “Generate Audio.”  
   - **Process**: Send prompt → receive MP3/WAV (standalone audio, no text/images).

**In-Depth Analysis of Key Points**  
:  - **Extracts**:  
     - Characters (role, gender, description)  
     - Locations (interior/exterior, period, geography)  
     - Plot Points (crucial narrative beats via Story Understanding models)  
   - **Extras**: Semantic toponym extraction → internal scene maps; detect transitions (“Suddenly,” “Meanwhile”).

**Optional Web Search**  
:  - **Checkbox** toggles DuckDuckGo API lookups.  
   - **If Enabled**: search preconfigured sites (free & paid) for scripts, sound effects.  
   - **Output**: List of links + short summaries.


---

## Agent Flow

```mermaid
flowchart LR
    A[Start Agent] --> B[Load Input (text, image, audio)]
    B --> C[Preprocessing: PDF/DOCX → text, OCR, audio transcription]
    C --> D[Generate Plot Summary (DeepSeek)]
    D --> E[Extract Entities & Themes (DeepSeek)]
    E --> F {Web Search Enabled?}
    F -->|Yes| G[Web Search via DuckDuckGo API]
    F -->|No| H[Continue Offline Analysis]
    H --> I[Rights & Licensing Check]
    I --> J[Deep Analysis: characters, locations, plot points]
    J --> K {Image Generation Requested?}
    K -->|Yes| L[API Call to FLUX for storyboard/concept art]
    K -->|No| M[Skip Image Generation]
    M --> N {Audio Generation Requested?}
    N -->|Yes| O[API Call to MusicGen for audio tracks]
    N -->|No| P[Skip Audio Generation]
    L & O --> Q[Final Output: text, JSON/CSV, images, audio]
```
---
## Deployment & Access and the Code Overview

---
## Use Cases

**Independent Writer**  
:  - Upload a screenplay and quickly get a summary, a list of characters, and locations.  
   - Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).  
   - Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).

**Film Production Company**  
:  - Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.  
   - Use the web search feature to find reference scripts or specific sound effects from free/paid sources.  
   - Develop visual storyboards and audio prototypes to share with directors, artists, and investors.

**Translation and Adaptation Agency**  
:  - Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).  
   - Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).  
   - Produce reference audio via MusicGen to test culturally appropriate music for the target audience.

**Digital Humanities Course**  
:  - Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.  
   - Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.  
   - Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.

---
## Credits


---
## Acknowledgements


---
### Contributors: 
- Code Implementation made by luke9705 and DDPM;
- Ideas creation and testing conducted by OrianIce and Loren1214.

---
### Sources

- Russell, S., & Norvig, P. (2021). *Artificial Intelligence: A Modern Approach* (3rd ed.). Pearson.
- Cambria, E., & White, B. (2014). *Jumping NLP Curves: A Review of Natural Language Processing Research*. IEEE Computational Intelligence Magazine, 9(2), 48–57.
- Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2022). *Hierarchical Text-Conditional Image Generation with CLIP Latents*. arXiv preprint arXiv:2204.06125.
-