File size: 4,011 Bytes
f16cdd3
 
 
 
 
 
 
 
 
 
 
 
945ea8f
 
 
 
 
 
 
 
bcc38c8
 
945ea8f
 
 
 
 
 
 
bcc38c8
 
 
945ea8f
 
 
bcc38c8
 
945ea8f
 
 
 
 
 
 
bcc38c8
 
945ea8f
 
 
 
 
bcc38c8
 
945ea8f
 
 
 
 
 
 
bcc38c8
 
 
945ea8f
bcc38c8
 
945ea8f
 
 
bcc38c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0601340
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bcc38c8
 
945ea8f
 
 
 
 
 
bcc38c8
 
 
0601340
945ea8f
 
 
 
 
 
0601340
e06b8fa
 
 
0601340
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: MultiAgent System For Screenplay Creation
emoji: 🏆
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
license: mit
---

# TODO NAME OF THE AGENT

## Agent capabilities

TODO: BETTER INTRO
The aim of our agent is to support authors in their creative process for scenarios and storyboards.

### Agent Flow

![image/png](https://cdn-uploads.huggingface.co/production/uploads/683ed65c9471bc9e3db5e4be/UkHpAimZUl8wIOP1qDuGB.png)

**A**

Starting the agent

**B**
  
The agent receives as input a text file containing the script, 
either in plain text format or in structured formats (e.g. PDF, DOCX), 
which it then converts into plain text for processing.

**C**

The agent extracts a summary of the overall content of the scenario, 
identifying the main narrative lines and the time frame.
  
This will help creating a big picture version of the draft for the next steps

**D**

The agent will identify the main entities (characters, locations, events) and key themes in the script.

It will also generate a small abstract (~5 sentences) 
with enough details to understand the overall plot and tone.

**E**
  
The agent checks whether the input text matches a known or published script.
  
If it does, 
it will check the license and availability of rights to understand if it is possible to operate on it.
  
In case of any limitations, the agent will warn the user about restrictions.
  
**F**

The agent will perform an analysis of the main points of the sctipt:

- Characters: extract and catalog the names of the characters, 
classifying them by role (protagonist, antagonist, secondary characters), 
gender and age/physical description.

- Locations: Detect the places where the scenes take place 
(interiors, exteriors, historical periods, geographical location) and catalogue them.

- Plot points: Isolate key plot points

- Vibes (Look and Feel): Understand the style (dramatic, comic, thriller, horror) 
and the overall sensation (suspense, irony, melancholy).


**G**

Define the agent goal.

Having achieved a comprehensive summary, the agent will ask for the final goal:

- Remake / Rewrite
- Change of medium (movie, tv series, ...)
- Other purposes (Workshop, Interactive presentation, Didactic analysis, ...)


**H**

Structural proposal.

Coherently with the goal, 
the agent will split the narrative structure into acts and scenes, 
pointing to the reference text as well

**I**

Media generation.

This phase consists of a series of steps focused on creating additional contents, 
to support the textual part of the script:

- Concept art
- Storyboard for narrative keypoints
- Images for plot points


**TODO: add sound and bias analysis?**


**J**

Final deliverable



### Main Techniques

- Transformer-based NLP architectures (BERT, GPT-4) to produce a coherent text synthesis
- Named Entity Recognition (NER) and context analysis, to identify human characters and their roles
- Semantic analysis of textual descriptions, toponym extraction, creation of an internal scene map
- Detection of text patterns (turning expressions such as “Suddenly”, “In the meantime”) 
and classification using a Story Understanding model
- Tone analysis and Sentiment analysis for understanding vibes
- Image generation models (Stable Diffusion, DALL·E 3), with prompts generated by the model


### Code overview


### Use cases

### Contributors: 
- Code Implementation made by luke9705 and DDPM;
- Ideas creation and testing conducted by OrianIce and Loren1214.

### Sources

- Russell, S., & Norvig, P. (2021). *Artificial Intelligence: A Modern Approach* (3rd ed.). Pearson.
- Cambria, E., & White, B. (2014). *Jumping NLP Curves: A Review of Natural Language Processing Research*. IEEE Computational Intelligence Magazine, 9(2), 48–57.
- Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2022). *Hierarchical Text-Conditional Image Generation with CLIP Latents*. arXiv preprint arXiv:2204.06125.
-