Aduc-sdr commited on
Commit
e3d97f4
·
verified ·
1 Parent(s): 844d8b0

Delete engineers

Browse files
engineers/LICENSE DELETED
@@ -1,23 +0,0 @@
1
- # AducSdr: Uma implementação aberta e funcional da arquitetura ADUC-SDR para geração de vídeo coerente.
2
- # Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
3
- #
4
- # Contato:
5
- # Carlos Rodrigues dos Santos
6
- # carlex22@gmail.com
7
- # Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
8
- #
9
- # Repositórios e Projetos Relacionados:
10
- # GitHub: https://github.com/carlex22/Aduc-sdr
11
- #
12
- # This program is free software: you can redistribute it and/or modify
13
- # it under the terms of the GNU Affero General Public License as published by
14
- # the Free Software Foundation, either version 3 of the License, or
15
- # (at your option) any later version.
16
- #
17
- # This program is distributed in the hope that it will be useful,
18
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
19
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20
- # GNU Affero General Public License for more details.
21
- #
22
- # You should have received a copy of the GNU Affero General Public License
23
- # along with this program. If not, see <https://www.gnu.org/licenses/>.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
engineers/NOTICE.md DELETED
@@ -1,76 +0,0 @@
1
- # NOTICE
2
-
3
- Copyright (C) 2025 Carlos Rodrigues dos Santos. All rights reserved.
4
-
5
- ---
6
-
7
- ## Aviso de Propriedade Intelectual e Licenciamento
8
-
9
- ### **Processo de Patenteamento em Andamento (EM PORTUGUÊS):**
10
-
11
- O método e o sistema de orquestração de prompts denominados **ADUC (Automated Discovery and Orchestration of Complex tasks)**, conforme descritos neste documento e implementados neste software, estão atualmente em processo de patenteamento.
12
-
13
- O titular dos direitos, Carlos Rodrigues dos Santos, está buscando proteção legal para as inovações chave da arquitetura ADUC, incluindo, mas não se limitando a:
14
-
15
- * Fragmentação e escalonamento de solicitações que excedem limites de contexto de modelos de IA.
16
- * Distribuição inteligente de sub-tarefas para especialistas heterogêneos.
17
- * Gerenciamento de estado persistido com avaliação iterativa e realimentação para o planejamento de próximas etapas.
18
- * Planejamento e roteamento sensível a custo, latência e requisitos de qualidade.
19
- * O uso de "tokens universais" para comunicação agnóstica a modelos.
20
-
21
- ### **Reconhecimento e Implicações (EM PORTUGUÊS):**
22
-
23
- Ao acessar ou utilizar este software e a arquitetura ADUC aqui implementada, você reconhece:
24
-
25
- 1. A natureza inovadora e a importância da arquitetura ADUC no campo da orquestração de prompts para IA.
26
- 2. Que a essência desta arquitetura, ou suas implementações derivadas, podem estar sujeitas a direitos de propriedade intelectual, incluindo patentes.
27
- 3. Que o uso comercial, a reprodução da lógica central da ADUC em sistemas independentes, ou a exploração direta da invenção sem o devido licenciamento podem infringir os direitos de patente pendente.
28
-
29
- ---
30
-
31
- ### **Patent Pending (IN ENGLISH):**
32
-
33
- The method and system for prompt orchestration named **ADUC (Automated Discovery and Orchestration of Complex tasks)**, as described herein and implemented in this software, are currently in the process of being patented.
34
-
35
- The rights holder, Carlos Rodrigues dos Santos, is seeking legal protection for the key innovations of the ADUC architecture, including, but not limited to:
36
-
37
- * Fragmentation and scaling of requests exceeding AI model context limits.
38
- * Intelligent distribution of sub-tasks to heterogeneous specialists.
39
- * Persistent state management with iterative evaluation and feedback for planning subsequent steps.
40
- * Cost, latency, and quality-aware planning and routing.
41
- * The use of "universal tokens" for model-agnostic communication.
42
-
43
- ### **Acknowledgement and Implications (IN ENGLISH):**
44
-
45
- By accessing or using this software and the ADUC architecture implemented herein, you acknowledge:
46
-
47
- 1. The innovative nature and significance of the ADUC architecture in the field of AI prompt orchestration.
48
- 2. That the essence of this architecture, or its derivative implementations, may be subject to intellectual property rights, including patents.
49
- 3. That commercial use, reproduction of ADUC's core logic in independent systems, or direct exploitation of the invention without proper licensing may infringe upon pending patent rights.
50
-
51
- ---
52
-
53
- ## Licença AGPLv3
54
-
55
- This program is free software: you can redistribute it and/or modify
56
- it under the terms of the GNU Affero General Public License as published by
57
- the Free Software Foundation, either version 3 of the License, or
58
- (at your option) any later version.
59
-
60
- This program is distributed in the hope that it will be useful,
61
- but WITHOUT ANY WARRANTY; without even the implied warranty of
62
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
63
- GNU Affero General Public License for more details.
64
-
65
- You should have received a copy of the GNU Affero General Public License
66
- along with this program. If not, see <https://www.gnu.org/licenses/>.
67
-
68
- ---
69
-
70
- **Contato para Consultas:**
71
-
72
- Para mais informações sobre a arquitetura ADUC, o status do patenteamento, ou para discutir licenciamento para usos comerciais ou não conformes com a AGPLv3, por favor, entre em contato:
73
-
74
- Carlos Rodrigues dos Santos
75
- carlex22@gmail.com
76
- Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
engineers/README.md DELETED
@@ -1,211 +0,0 @@
1
- ---
2
- title: Euia-AducSdr
3
- emoji: 🎥
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: gradio
7
- app_file: app.py
8
- pinned: true
9
- license: agpl-3.0
10
- short_description: Uma implementação aberta e funcional da arquitetura ADUC-SDR
11
- ---
12
-
13
-
14
- ### 🇧🇷 Português
15
-
16
- Uma implementação aberta e funcional da arquitetura ADUC-SDR (Arquitetura de Unificação Compositiva - Escala Dinâmica e Resiliente), projetada para a geração de vídeo coerente de longa duração. Este projeto materializa os princípios de fragmentação, navegação geométrica e um mecanismo de "eco causal 4bits memoria" para garantir a continuidade física e narrativa em sequências de vídeo geradas por múltiplos modelos de IA.
17
-
18
- **Licença:** Este projeto é licenciado sob os termos da **GNU Affero General Public License v3.0**. Isto significa que se você usar este software (ou qualquer trabalho derivado) para fornecer um serviço através de uma rede, você é **obrigado a disponibilizar o código-fonte completo** da sua versão para os usuários desse serviço.
19
-
20
- - **Copyright (C) 4 de Agosto de 2025, Carlos Rodrigues dos Santos**
21
- - Uma cópia completa da licença pode ser encontrada no arquivo [LICENSE](LICENSE).
22
-
23
- ---
24
-
25
- ### 🇬🇧 English
26
-
27
- An open and functional implementation of the ADUC-SDR (Architecture for Compositive Unification - Dynamic and Resilient Scaling) architecture, designed for long-form coherent video generation. This project materializes the principles of fragmentation, geometric navigation, and a "causal echo 4bits memori" mechanism to ensure physical and narrative continuity in video sequences generated by multiple AI models.
28
-
29
- **License:** This project is licensed under the terms of the **GNU Affero General Public License v3.0**. This means that if you use this software (or any derivative work) to provide a service over a network, you are **required to make the complete source code** of your version available to the users of that service.
30
-
31
- - **Copyright (C) August 4, 2025, Carlos Rodrigues dos Santos**
32
- - A full copy of the license can be found in the [LICENSE](LICENSE) file.
33
-
34
- ---
35
-
36
- ## **Aviso de Propriedade Intelectual e Patenteamento**
37
-
38
- ### **Processo de Patenteamento em Andamento (EM PORTUGUÊS):**
39
-
40
- A arquitetura e o método **ADUC (Automated Discovery and Orchestration of Complex tasks)**, conforme descritos neste projeto e nas reivindicações associadas, estão **atualmente em processo de patenteamento**.
41
-
42
- O titular dos direitos, Carlos Rodrigues dos Santos, está buscando proteção legal para as inovações chave da arquitetura ADUC, que incluem, mas não se limitam a:
43
-
44
- * Fragmentação e escalonamento de solicitações que excedem limites de contexto de modelos de IA.
45
- * Distribuição inteligente de sub-tarefas para especialistas heterogêneos.
46
- * Gerenciamento de estado persistido com avaliação iterativa e realimentação para o planejamento de próximas etapas.
47
- * Planejamento e roteamento sensível a custo, latência e requisitos de qualidade.
48
- * O uso de "tokens universais" para comunicação agnóstica a modelos.
49
-
50
- Ao utilizar este software e a arquitetura ADUC aqui implementada, você reconhece a natureza inovadora desta arquitetura e que a **reprodução ou exploração da lógica central da ADUC em sistemas independentes pode infringir direitos de patente pendente.**
51
-
52
- ---
53
-
54
- ### **Patent Pending (IN ENGLISH):**
55
-
56
- The **ADUC (Automated Discovery and Orchestration of Complex tasks)** architecture and method, as described in this project and its associated claims, are **currently in the process of being patented.**
57
-
58
- The rights holder, Carlos Rodrigues dos Santos, is seeking legal protection for the key innovations of the ADUC architecture, including, but not limited to:
59
-
60
- * Fragmentation and scaling of requests exceeding AI model context limits.
61
- * Intelligent distribution of sub-tasks to heterogeneous specialists.
62
- * Persistent state management with iterative evaluation and feedback for planning subsequent steps.
63
- * Cost, latency, and quality-aware planning and routing.
64
- * The use of "universal tokens" for model-agnostic communication.
65
-
66
- By using this software and the ADUC architecture implemented herein, you acknowledge the innovative nature of this architecture and that **the reproduction or exploitation of ADUC's core logic in independent systems may infringe upon pending patent rights.**
67
-
68
- ---
69
-
70
- ### Detalhes Técnicos e Reivindicações da ADUC
71
-
72
- #### 🇧🇷 Definição Curta (para Tese e Patente)
73
-
74
- **ADUC** é um *framework pré-input* e *intermediário* de **gerenciamento de prompts** que:
75
-
76
- 1. **fragmenta** solicitações acima do limite de contexto de qualquer modelo,
77
- 2. **escala linearmente** (processo sequencial com memória persistida),
78
- 3. **distribui** sub-tarefas a **especialistas** (modelos/ferramentas heterogêneos), e
79
- 4. **realimenta** a próxima etapa com avaliação do que foi feito/esperado (LLM diretor).
80
-
81
- Não é um modelo; é uma **camada orquestradora** plugável antes do input de modelos existentes (texto, imagem, áudio, vídeo), usando *tokens universais* e a tecnologia atual.
82
-
83
- #### 🇬🇧 Short Definition (for Thesis and Patent)
84
-
85
- **ADUC** is a *pre-input* and *intermediate* **prompt management framework** that:
86
-
87
- 1. **fragments** requests exceeding any model's context limit,
88
- 2. **scales linearly** (sequential process with persisted memory),
89
- 3. **distributes** sub-tasks to **specialists** (heterogeneous models/tools), and
90
- 4. **feeds back** to the next step with an evaluation of what was done/expected (director LLM).
91
-
92
- It is not a model; it is a pluggable **orchestration layer** before the input of existing models (text, image, audio, video), using *universal tokens* and current technology.
93
-
94
- ---
95
-
96
- #### 🇧🇷 Elementos Essenciais (Telegráfico)
97
-
98
- * **Agnóstico a modelos:** opera com qualquer LLM/difusor/API.
99
- * **Pré-input manager:** recebe pedido do usuário, **divide** em blocos ≤ limite de tokens, **prioriza**, **agenda** e **roteia**.
100
- * **Memória persistida:** resultados/latentes/“eco” viram **estado compartilhado** para o próximo bloco (nada é ignorado).
101
- * **Especialistas:** *routers* decidem quem faz o quê (ex.: “descrição → LLM-A”, “keyframe → Img-B”, “vídeo → Vid-C”).
102
- * **Controle de qualidade:** LLM diretor compara *o que fez* × *o que deveria* × *o que falta* e **regenera objetivos** do próximo fragmento.
103
- * **Custo/latência-aware:** planeja pela **VRAM/tempo/custo**, não tenta “abraçar tudo de uma vez”.
104
-
105
- #### 🇬🇧 Essential Elements (Telegraphic)
106
-
107
- * **Model-agnostic:** operates with any LLM/diffuser/API.
108
- * **Pre-input manager:** receives user request, **divides** into blocks ≤ token limit, **prioritizes**, **schedules**, and **routes**.
109
- * **Persisted memory:** results/latents/“echo” become **shared state** for the next block (nothing is ignored).
110
- * **Specialists:** *routers* decide who does what (e.g., “description → LLM-A”, “keyframe → Img-B”, “video → Vid-C”).
111
- * **Quality control:** director LLM compares *what was done* × *what should be done* × *what is missing* and **regenerates objectives** for the next fragment.
112
- * **Cost/latency-aware:** plans by **VRAM/time/cost**, does not try to “embrace everything at once”.
113
-
114
- ---
115
-
116
- #### 🇧🇷 Reivindicações Independentes (Método e Sistema)
117
-
118
- **Reivindicação Independente (Método) — Versão Enxuta:**
119
-
120
- 1. **Método** de **orquestração de prompts** para execução de tarefas acima do limite de contexto de modelos de IA, compreendendo:
121
- (a) **receber** uma solicitação que excede um limite de tokens;
122
- (b) **analisar** a solicitação por um **LLM diretor** e **fragmentá-la** em sub-tarefas ≤ limite;
123
- (c) **selecionar** especialistas de execução para cada sub-tarefa com base em capacidades declaradas;
124
- (d) **gerar** prompts específicos por sub-tarefa em **tokens universais**, incluindo referências ao **estado persistido** de execuções anteriores;
125
- (e) **executar sequencialmente** as sub-tarefas e **persistir** suas saídas como memória (incluindo latentes/eco/artefatos);
126
- (f) **avaliar** automaticamente a saída versus metas declaradas e **regenerar objetivos** do próximo fragmento;
127
- (g) **iterar** (b)–(f) até que os critérios de completude sejam atendidos, produzindo o resultado agregado;
128
- em que o framework **escala linearmente** no tempo e armazenamento físico, **independente** da janela de contexto dos modelos subjacentes.
129
-
130
- **Reivindicação Independente (Sistema):**
131
-
132
- 2. **Sistema** de orquestração de prompts, compreendendo: um **planejador LLM diretor**; um **roteador de especialistas**; um **banco de estado persistido** (incl. memória cinética para vídeo); um **gerador de prompts universais**; e um **módulo de avaliação/realimentação**, acoplados por uma **API pré-input** a modelos heterogêneos.
133
-
134
- #### 🇬🇧 Independent Claims (Method and System)
135
-
136
- **Independent Claim (Method) — Concise Version:**
137
-
138
- 1. A **method** for **prompt orchestration** for executing tasks exceeding AI model context limits, comprising:
139
- (a) **receiving** a request that exceeds a token limit;
140
- (b) **analyzing** the request by a **director LLM** and **fragmenting it** into sub-tasks ≤ the limit;
141
- (c) **selecting** execution specialists for each sub-task based on declared capabilities;
142
- (d) **generating** specific prompts per sub-task in **universal tokens**, including references to the **persisted state** of previous executions;
143
- (e) **sequentially executing** the sub-tasks and **persisting** their outputs as memory (including latents/echo/artifacts);
144
- (f) **automatically evaluating** the output against declared goals and **regenerating objectives** for the next fragment;
145
- (g) **iterating** (b)–(f) until completion criteria are met, producing the aggregated result;
146
- wherein the framework **scales linearly** in time and physical storage, **independent** of the context window of the underlying models.
147
-
148
- **Independent Claim (System):**
149
-
150
- 2. A prompt orchestration **system**, comprising: a **director LLM planner**; a **specialist router**; a **persisted state bank** (incl. kinetic memory for video); a **universal prompt generator**; and an **evaluation/feedback module**, coupled via a **pre-input API** to heterogeneous models.
151
-
152
- ---
153
-
154
- #### 🇧🇷 Dependentes Úteis
155
-
156
- * (3) Onde o roteamento considera **custo/latência/VRAM** e metas de qualidade.
157
- * (4) Onde o banco de estado inclui **eco cinético** para vídeo (últimos *n* frames/latentes/fluxo).
158
- * (5) Onde a avaliação usa métricas específicas por domínio (Lflow, consistência semântica, etc.).
159
- * (6) Onde *tokens universais* padronizam instruções entre especialistas.
160
- * (7) Onde a orquestração decide **cut vs continuous** e **corte regenerativo** (Déjà-Vu) ao editar vídeo.
161
- * (8) Onde o sistema **nunca descarta** conteúdo excedente: **reagenda** em novos fragmentos.
162
-
163
- #### 🇬🇧 Useful Dependents
164
-
165
- * (3) Wherein routing considers **cost/latency/VRAM** and quality goals.
166
- * (4) Wherein the state bank includes **kinetic echo** for video (last *n* frames/latents/flow).
167
- * (5) Wherein evaluation uses domain-specific metrics (Lflow, semantic consistency, etc.).
168
- * (6) Wherein *universal tokens* standardize instructions between specialists.
169
- * (7) Wherein orchestration decides **cut vs continuous** and **regenerative cut** (Déjà-Vu) when editing video.
170
- * (8) Wherein the system **never discards** excess content: it **reschedules** it in new fragments.
171
-
172
- ---
173
-
174
- #### 🇧🇷 Como isso conversa com SDR (Vídeo)
175
-
176
- * **Eco Cinético**: é um **tipo de estado persistido** consumido pelo próximo passo.
177
- * **Déjà-Vu (Corte Regenerativo)**: é **uma política de orquestração** aplicada quando há edição; ADUC decide, monta os prompts certos e chama o especialista de vídeo.
178
- * **Cut vs Continuous**: decisão do **diretor** com base em estado + metas; ADUC roteia e garante a sobreposição/remoção final.
179
-
180
- #### 🇬🇧 How this Converses with SDR (Video)
181
-
182
- * **Kinetic Echo**: is a **type of persisted state** consumed by the next step.
183
- * **Déjà-Vu (Regenerative Cut)**: is an **orchestration policy** applied during editing; ADUC decides, crafts the right prompts, and calls the video specialist.
184
- * **Cut vs Continuous**: decision made by the **director** based on state + goals; ADUC routes and ensures the final overlap/removal.
185
-
186
- ---
187
-
188
- #### 🇧🇷 Mensagem Clara ao Usuário (Experiência)
189
-
190
- > “Seu pedido excede o limite X do modelo Y. Em vez de truncar silenciosamente, o **ADUC** dividirá e **entregará 100%** do conteúdo por etapas coordenadas.”
191
-
192
- Isso é diferencial prático e jurídico: **não-obviedade** por transformar limite de contexto em **pipeline controlado**, com **persistência de estado** e **avaliação iterativa**.
193
-
194
- #### 🇬🇧 Clear User Message (Experience)
195
-
196
- > "Your request exceeds model Y's limit X. Instead of silently truncating, **ADUC** will divide and **deliver 100%** of the content through coordinated steps."
197
-
198
- This is a practical and legal differentiator: **non-obviousness** by transforming context limits into a **controlled pipeline**, with **state persistence** and **iterative evaluation**.
199
-
200
- ---
201
-
202
- ### Contact / Contato / Contacto
203
-
204
- - **Author / Autor:** Carlos Rodrigues dos Santos
205
- - **Email:** carlex22@gmail.com
206
- - **GitHub:** [https://github.com/carlex22/Aduc-sdr](https://github.com/carlex22/Aduc-sdr)
207
- - **Hugging Face Spaces:**
208
- - [Ltx-SuperTime-60Secondos](https://huggingface.co/spaces/Carlexx/Ltx-SuperTime-60Secondos/)
209
- - [Novinho](https://huggingface.co/spaces/Carlexxx/Novinho/)
210
-
211
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
engineers/__init__.py DELETED
File without changes
engineers/deformes2D_thinker.py DELETED
@@ -1,171 +0,0 @@
1
- # engineers/deformes2D_thinker.py
2
- # AducSdr: Uma implementação aberta e funcional da arquitetura ADUC-SDR
3
- # Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
4
- #
5
- # Contato:
6
- # Carlos Rodrigues dos Santos
7
- # carlex22@gmail.com
8
- # Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
9
- #
10
- # Repositórios e Projetos Relacionados:
11
- # GitHub: https://github.com/carlex22/Aduc-sdr
12
- #
13
- # This program is free software: you can redistribute it and/or modify
14
- # it under the terms of the GNU Affero General Public License as published by
15
- # the Free Software Foundation, either version 3 of the License, or
16
- # (at your option) any later version.
17
- #
18
- # This program is distributed in the hope that it will be useful,
19
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
20
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21
- # GNU Affero General Public License for more details.
22
- #
23
- # You should have received a copy of the GNU Affero General Public License
24
- # along with this program. If not, see <https://www.gnu.org/licenses/>.
25
- #
26
- # This program is free software: you can redistribute it and/or modify
27
- # it under the terms of the GNU Affero General Public License...
28
- # PENDING PATENT NOTICE: Please see NOTICE.md.
29
- #
30
- # Version 1.0.1
31
-
32
- import logging
33
- from pathlib import Path
34
- from PIL import Image
35
- import gradio as gr
36
- from typing import List
37
-
38
- # It imports the communication layer, not the API directly
39
- from managers.gemini_manager import gemini_manager_singleton
40
-
41
- logger = logging.getLogger(__name__)
42
-
43
- class Deformes2DThinker:
44
- """
45
- The cognitive specialist that handles prompt engineering and creative logic.
46
- """
47
- def _read_prompt_template(self, filename: str) -> str:
48
- """Reads a prompt template file from the 'prompts' directory."""
49
- try:
50
- prompts_dir = Path(__file__).resolve().parent.parent / "prompts"
51
- with open(prompts_dir / filename, "r", encoding="utf-8") as f:
52
- return f.read()
53
- except FileNotFoundError:
54
- raise gr.Error(f"Prompt template file not found: prompts/{filename}")
55
-
56
- def generate_storyboard(self, prompt: str, num_keyframes: int, ref_image_paths: List[str]) -> List[str]:
57
- """Acts as a Scriptwriter to generate a storyboard."""
58
- try:
59
- template = self._read_prompt_template("unified_storyboard_prompt.txt")
60
- storyboard_prompt = template.format(user_prompt=prompt, num_fragments=num_keyframes)
61
- images = [Image.open(p) for p in ref_image_paths]
62
-
63
- # Assemble all parts into a single list for the manager
64
- prompt_parts = [storyboard_prompt] + images
65
- storyboard_data = gemini_manager_singleton.get_json_object(prompt_parts)
66
-
67
- storyboard = storyboard_data.get("scene_storyboard", [])
68
- if not storyboard or len(storyboard) != num_keyframes:
69
- raise ValueError(f"Incorrect number of scenes generated. Expected {num_keyframes}, got {len(storyboard)}.")
70
- return storyboard
71
- except Exception as e:
72
- raise gr.Error(f"The Scriptwriter (Deformes2D Thinker) failed: {e}")
73
-
74
- def select_keyframes_from_pool(self, storyboard: list, base_image_paths: list[str], pool_image_paths: list[str]) -> list[str]:
75
- """Acts as a Photographer/Editor to select keyframes."""
76
- if not pool_image_paths:
77
- raise gr.Error("The 'image pool' (Additional Images) is empty.")
78
-
79
- try:
80
- template = self._read_prompt_template("keyframe_selection_prompt.txt")
81
-
82
- image_map = {f"IMG-{i+1}": path for i, path in enumerate(pool_image_paths)}
83
-
84
- prompt_parts = ["# Reference Images (Story Base)"]
85
- prompt_parts.extend([Image.open(p) for p in base_image_paths])
86
- prompt_parts.append("\n# Image Pool (Scene Bank)")
87
- prompt_parts.extend([Image.open(p) for p in pool_image_paths])
88
-
89
- storyboard_str = "\n".join([f"- Scene {i+1}: {s}" for i, s in enumerate(storyboard)])
90
- selection_prompt = template.format(storyboard_str=storyboard_str, image_identifiers=list(image_map.keys()))
91
- prompt_parts.append(selection_prompt)
92
-
93
- selection_data = gemini_manager_singleton.get_json_object(prompt_parts)
94
-
95
- selected_identifiers = selection_data.get("selected_image_identifiers", [])
96
-
97
- if len(selected_identifiers) != len(storyboard):
98
- raise ValueError("The AI did not select the correct number of images for the scenes.")
99
-
100
- selected_paths = [image_map[identifier] for identifier in selected_identifiers]
101
- return selected_paths
102
-
103
- except Exception as e:
104
- raise gr.Error(f"The Photographer (Deformes2D Thinker) failed to select images: {e}")
105
-
106
- def get_anticipatory_keyframe_prompt(self, global_prompt: str, scene_history: str, current_scene_desc: str, future_scene_desc: str, last_image_path: str, fixed_ref_paths: list[str]) -> str:
107
- """Acts as an Art Director to generate an image prompt."""
108
- try:
109
- template = self._read_prompt_template("anticipatory_keyframe_prompt.txt")
110
-
111
- director_prompt = template.format(
112
- historico_prompt=scene_history,
113
- cena_atual=current_scene_desc,
114
- cena_futura=future_scene_desc
115
- )
116
-
117
- prompt_parts = [
118
- f"# CONTEXT:\n- Global Story Goal: {global_prompt}\n# VISUAL ASSETS:",
119
- "Current Base Image [IMG-BASE]:",
120
- Image.open(last_image_path)
121
- ]
122
-
123
- ref_counter = 1
124
- for path in fixed_ref_paths:
125
- if path != last_image_path:
126
- prompt_parts.extend([f"General Reference Image [IMG-REF-{ref_counter}]:", Image.open(path)])
127
- ref_counter += 1
128
-
129
- prompt_parts.append(director_prompt)
130
-
131
- final_flux_prompt = gemini_manager_singleton.get_raw_text(prompt_parts)
132
-
133
- return final_flux_prompt.strip().replace("`", "").replace("\"", "")
134
- except Exception as e:
135
- raise gr.Error(f"The Art Director (Deformes2D Thinker) failed: {e}")
136
-
137
- def get_cinematic_decision(self, global_prompt: str, story_history: str,
138
- past_keyframe_path: str, present_keyframe_path: str, future_keyframe_path: str,
139
- past_scene_desc: str, present_scene_desc: str, future_scene_desc: str) -> dict:
140
- """Acts as a Film Director to make editing decisions and generate motion prompts."""
141
- try:
142
- template = self._read_prompt_template("cinematic_director_prompt.txt")
143
- prompt_text = template.format(
144
- global_prompt=global_prompt,
145
- story_history=story_history,
146
- past_scene_desc=past_scene_desc,
147
- present_scene_desc=present_scene_desc,
148
- future_scene_desc=future_scene_desc
149
- )
150
-
151
- prompt_parts = [
152
- prompt_text,
153
- "[PAST_IMAGE]:", Image.open(past_keyframe_path),
154
- "[PRESENT_IMAGE]:", Image.open(present_keyframe_path),
155
- "[FUTURE_IMAGE]:", Image.open(future_keyframe_path)
156
- ]
157
-
158
- decision_data = gemini_manager_singleton.get_json_object(prompt_parts)
159
-
160
- if "transition_type" not in decision_data or "motion_prompt" not in decision_data:
161
- raise ValueError("AI response (Cinematographer) is malformed. Missing 'transition_type' or 'motion_prompt'.")
162
- return decision_data
163
- except Exception as e:
164
- logger.error(f"The Film Director (Deformes2D Thinker) failed: {e}. Using fallback to 'continuous'.", exc_info=True)
165
- return {
166
- "transition_type": "continuous",
167
- "motion_prompt": f"A smooth, continuous cinematic transition from '{present_scene_desc}' to '{future_scene_desc}'."
168
- }
169
-
170
- # --- Singleton Instance ---
171
- deformes2d_thinker_singleton = Deformes2DThinker()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
engineers/deformes3D.py DELETED
@@ -1,194 +0,0 @@
1
- # engineers/deformes3D.py
2
- #
3
- # AducSdr: Uma implementação aberta e funcional da arquitetura ADUC-SDR
4
- # Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
5
- #
6
- # Contato:
7
- # Carlos Rodrigues dos Santos
8
- # carlex22@gmail.com
9
- # Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
10
- #
11
- # Repositórios e Projetos Relacionados:
12
- # GitHub: https://github.com/carlex22/Aduc-sdr
13
- #
14
- # This program is free software: you can redistribute it and/or modify
15
- # it under the terms of the GNU Affero General Public License as published by
16
- # the Free Software Foundation, either version 3 of the License, or
17
- # (at your option) any later version.
18
- #
19
- # This program is distributed in the hope that it will be useful,
20
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
21
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
22
- # GNU Affero General Public License for more details.
23
- #
24
- # You should have received a copy of the GNU Affero General Public License
25
- # along with this program. If not, see <https://www.gnu.org/licenses/>.
26
- #
27
- # This program is free software: you can redistribute it and/or modify
28
- # it under the terms of the GNU Affero General Public License...
29
- # PENDING PATENT NOTICE: Please see NOTICE.md.
30
- #
31
- # Version 2.0.1
32
-
33
- from PIL import Image, ImageOps
34
- import os
35
- import time
36
- import logging
37
- import gradio as gr
38
- import yaml
39
- import torch
40
- import numpy as np
41
-
42
- from managers.flux_kontext_manager import flux_kontext_singleton
43
- from engineers.deformes2D_thinker import deformes2d_thinker_singleton
44
- from aduc_types import LatentConditioningItem
45
- from managers.ltx_manager import ltx_manager_singleton
46
- from managers.vae_manager import vae_manager_singleton
47
- from managers.latent_enhancer_manager import latent_enhancer_specialist_singleton
48
-
49
- logger = logging.getLogger(__name__)
50
-
51
- class Deformes3DEngine:
52
- """
53
- ADUC Specialist for static image (keyframe) generation.
54
- """
55
- def __init__(self, workspace_dir):
56
- self.workspace_dir = workspace_dir
57
- self.image_generation_helper = flux_kontext_singleton
58
- logger.info("3D Engine (Image Specialist) ready to receive orders from the Maestro.")
59
-
60
- def _generate_single_keyframe(self, prompt: str, reference_images: list[Image.Image], output_filename: str, width: int, height: int, callback: callable = None) -> str:
61
- """
62
- Low-level function that generates a single image using the LTX helper.
63
- """
64
- logger.info(f"Generating keyframe '{output_filename}' with prompt: '{prompt}'")
65
- generated_image = self.image_generation_helper.generate_image(
66
- reference_images=reference_images, prompt=prompt, width=width,
67
- height=height, seed=int(time.time()), callback=callback
68
- )
69
- final_path = os.path.join(self.workspace_dir, output_filename)
70
- generated_image.save(final_path)
71
- logger.info(f"Keyframe successfully saved to: {final_path}")
72
- return final_path
73
-
74
- def generate_keyframes_from_storyboard(self, storyboard: list, initial_ref_path: str, global_prompt: str, keyframe_resolution: int, general_ref_paths: list, progress_callback_factory: callable = None):
75
- """
76
- Orchestrates the generation of all keyframes.
77
- """
78
- current_base_image_path = initial_ref_path
79
- previous_prompt = "N/A (initial reference image)"
80
- final_keyframes_gallery = [] #[current_base_image_path]
81
- width, height = keyframe_resolution, keyframe_resolution
82
- target_resolution_tuple = (width, height)
83
-
84
- num_keyframes_to_generate = len(storyboard) - 1
85
- logger.info(f"IMAGE SPECIALIST: Received order to generate {num_keyframes_to_generate} keyframes (LTX versions).")
86
-
87
- for i in range(num_keyframes_to_generate):
88
- scene_index = i + 1
89
- current_scene = storyboard[i]
90
- future_scene = storyboard[i+1]
91
- progress_callback_flux = progress_callback_factory(scene_index, num_keyframes_to_generate) if progress_callback_factory else None
92
-
93
- logger.info(f"--> Generating Keyframe {scene_index}/{num_keyframes_to_generate}...")
94
-
95
- # --- STEP A: Generate with FLUX (Primary Method) ---
96
- logger.info(f" - Step A: Generating with keyframe...")
97
-
98
- img_prompt = deformes2d_thinker_singleton.get_anticipatory_keyframe_prompt(
99
- global_prompt=global_prompt, scene_history=previous_prompt,
100
- current_scene_desc=current_scene, future_scene_desc=future_scene,
101
- last_image_path=current_base_image_path, fixed_ref_paths=general_ref_paths
102
- )
103
-
104
- #flux_ref_paths = list(set([current_base_image_path] + general_ref_paths))
105
- #flux_ref_images = [Image.open(p) for p in flux_ref_paths]
106
-
107
- #flux_keyframe_path = self._generate_single_keyframe(
108
- # prompt=img_prompt, reference_images=flux_ref_images,
109
- # output_filename=f"keyframe_{scene_index}_flux.png", width=width, height=height,
110
- # callback=progress_callback_flux
111
- #)
112
- #final_keyframes_gallery.append(flux_keyframe_path)
113
-
114
- # --- STEP B: LTX Enrichment Experiment ---
115
- #logger.info(f" - Step B: Generating enrichment with LTX...")
116
-
117
- ltx_context_paths = []
118
- context_paths = []
119
- context_paths = [current_base_image_path] + [p for p in general_ref_paths if p != current_base_image_path][:3]
120
-
121
- ltx_context_paths = list(reversed(context_paths))
122
- logger.info(f" - LTX Context Order (Reversed): {[os.path.basename(p) for p in ltx_context_paths]}")
123
-
124
- ltx_conditioning_items = []
125
-
126
- weight = 0.6
127
- for idx, path in enumerate(ltx_context_paths):
128
- img_pil = Image.open(path).convert("RGB")
129
- img_processed = self._preprocess_image_for_latent_conversion(img_pil, target_resolution_tuple)
130
- pixel_tensor = self._pil_to_pixel_tensor(img_processed)
131
- latent_tensor = vae_manager_singleton.encode(pixel_tensor)
132
-
133
- ltx_conditioning_items.append(LatentConditioningItem(latent_tensor, 0, weight))
134
-
135
- if idx >= 0:
136
- weight -= 0.1
137
-
138
- ltx_base_params = {"guidance_scale": 1.0, "stg_scale": 0.001, "num_inference_steps": 25}
139
- generated_latents, _ = ltx_manager_singleton.generate_latent_fragment(
140
- callback_on_step_end=progress_callback_flux
141
- height=height, width=width,
142
- conditioning_items_data=ltx_conditioning_items,
143
- motion_prompt=img_prompt,
144
- video_total_frames=48,
145
- video_fps=24,
146
- **ltx_base_params
147
- )
148
-
149
- final_latent = generated_latents[:, :, -1:, :, :]
150
- upscaled_latent = latent_enhancer_specialist_singleton.upscale(final_latent)
151
- enriched_pixel_tensor = vae_manager_singleton.decode(upscaled_latent)
152
-
153
- ltx_keyframe_path = os.path.join(self.workspace_dir, f"keyframe_{scene_index}_ltx.png")
154
- self.save_image_from_tensor(enriched_pixel_tensor, ltx_keyframe_path)
155
- final_keyframes_gallery.append(ltx_keyframe_path)
156
-
157
- # Use the FLUX keyframe as the base for the next iteration to maintain the primary narrative path
158
- current_base_image_path = ltx_keyframe_path #flux_keyframe_path
159
- previous_prompt = img_prompt
160
-
161
- logger.info(f"IMAGE SPECIALIST: Generation of all keyframe versions (LTX) complete.")
162
- return final_keyframes_gallery
163
-
164
- # --- HELPER FUNCTIONS ---
165
-
166
- def _preprocess_image_for_latent_conversion(self, image: Image.Image, target_resolution: tuple) -> Image.Image:
167
- """Resizes and fits an image to the target resolution for VAE encoding."""
168
- if image.size != target_resolution:
169
- return ImageOps.fit(image, target_resolution, Image.Resampling.LANCZOS)
170
- return image
171
-
172
- def _pil_to_pixel_tensor(self, pil_image: Image.Image) -> torch.Tensor:
173
- """Helper to convert PIL to the 5D pixel tensor the VAE expects."""
174
- image_np = np.array(pil_image).astype(np.float32) / 255.0
175
- tensor = torch.from_numpy(image_np).permute(2, 0, 1).unsqueeze(0).unsqueeze(2)
176
- return (tensor * 2.0) - 1.0
177
-
178
- def save_image_from_tensor(self, pixel_tensor: torch.Tensor, path: str):
179
- """Helper to save a 1-frame pixel tensor as an image."""
180
- tensor_chw = pixel_tensor.squeeze(0).squeeze(1)
181
- tensor_hwc = tensor_chw.permute(1, 2, 0)
182
- tensor_hwc = (tensor_hwc.clamp(-1, 1) + 1) / 2.0
183
- image_np = (tensor_hwc.cpu().float().numpy() * 255).astype(np.uint8)
184
- Image.fromarray(image_np).save(path)
185
-
186
- # --- Singleton Instantiation ---
187
- try:
188
- with open("config.yaml", 'r') as f:
189
- config = yaml.safe_load(f)
190
- WORKSPACE_DIR = config['application']['workspace_dir']
191
- deformes3d_engine_singleton = Deformes3DEngine(workspace_dir=WORKSPACE_DIR)
192
- except Exception as e:
193
- logger.error(f"Could not initialize Deformes3DEngine: {e}", exc_info=True)
194
- deformes3d_engine_singleton = None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
engineers/deformes3D_thinker.py DELETED
@@ -1,136 +0,0 @@
1
- # engineers/deformes3D_thinker.py
2
- #
3
- # Copyright (C) 2025 Carlos Rodrigues dos Santos
4
- #
5
- # Version: 4.0.0 (Definitive)
6
- #
7
- # This is the definitive, robust implementation. It directly contains the prompt
8
- # enhancement logic copied from the LTX pipeline's utils. It accesses the
9
- # enhancement models loaded by the LTX Manager and performs the captioning
10
- # and LLM generation steps locally, ensuring full control and compatibility.
11
-
12
- import logging
13
- from PIL import Image
14
- import torch
15
-
16
- # Importa o singleton do LTX para ter acesso à sua pipeline e aos modelos nela
17
- from managers.ltx_manager import ltx_manager_singleton
18
-
19
- # Importa o prompt de sistema do LTX para garantir consistência
20
- from ltx_video.utils.prompt_enhance_utils import I2V_CINEMATIC_PROMPT
21
-
22
- logger = logging.getLogger(__name__)
23
-
24
- class Deformes3DThinker:
25
- """
26
- The tactical specialist that now directly implements the prompt enhancement
27
- logic, using the models provided by the LTX pipeline.
28
- """
29
-
30
- def __init__(self):
31
- # Acessa a pipeline exposta para obter os modelos necessários
32
- pipeline = ltx_manager_singleton.prompt_enhancement_pipeline
33
- if not pipeline:
34
- raise RuntimeError("Deformes3DThinker could not access the LTX pipeline.")
35
-
36
- # Armazena os modelos e processadores como atributos diretos
37
- self.caption_model = pipeline.prompt_enhancer_image_caption_model
38
- self.caption_processor = pipeline.prompt_enhancer_image_caption_processor
39
- self.llm_model = pipeline.prompt_enhancer_llm_model
40
- self.llm_tokenizer = pipeline.prompt_enhancer_llm_tokenizer
41
-
42
- # Verifica se os modelos foram realmente carregados
43
- if not all([self.caption_model, self.caption_processor, self.llm_model, self.llm_tokenizer]):
44
- logger.warning("Deformes3DThinker initialized, but one or more enhancement models were not loaded by the LTX pipeline. Fallback will be used.")
45
- else:
46
- logger.info("Deformes3DThinker initialized and successfully linked to LTX enhancement models.")
47
-
48
- @torch.no_grad()
49
- def get_enhanced_motion_prompt(self, global_prompt: str, story_history: str,
50
- past_keyframe_path: str, present_keyframe_path: str, future_keyframe_path: str,
51
- past_scene_desc: str, present_scene_desc: str, future_scene_desc: str) -> str:
52
- """
53
- Generates a refined motion prompt by directly executing the enhancement pipeline logic.
54
- """
55
- # Verifica se os modelos estão disponíveis antes de tentar usá-los
56
- if not all([self.caption_model, self.caption_processor, self.llm_model, self.llm_tokenizer]):
57
- logger.warning("Enhancement models not available. Using fallback prompt.")
58
- return f"A cinematic transition from '{present_scene_desc}' to '{future_scene_desc}'."
59
-
60
- try:
61
- present_image = Image.open(present_keyframe_path).convert("RGB")
62
-
63
- # --- INÍCIO DA LÓGICA COPIADA E ADAPTADA DO LTX ---
64
-
65
- # 1. Gerar a caption da imagem de referência (presente)
66
- image_captions = self._generate_image_captions([present_image])
67
-
68
- # 2. Construir o prompt para o LLM
69
- # Usamos a cena futura como o "prompt do usuário"
70
- messages = [
71
- {"role": "system", "content": I2V_CINEMATIC_PROMPT},
72
- {"role": "user", "content": f"user_prompt: {future_scene_desc}\nimage_caption: {image_captions[0]}"},
73
- ]
74
-
75
- # 3. Gerar e decodificar o prompt final com o LLM
76
- enhanced_prompt = self._generate_and_decode_prompts(messages)
77
-
78
- # --- FIM DA LÓGICA COPIADA E ADAPTADA ---
79
-
80
- logger.info(f"Deformes3DThinker received enhanced prompt: '{enhanced_prompt}'")
81
- return enhanced_prompt
82
-
83
- except Exception as e:
84
- logger.error(f"The Film Director (Deformes3D Thinker) failed during enhancement: {e}. Using fallback.", exc_info=True)
85
- return f"A smooth, continuous cinematic transition from '{present_scene_desc}' to '{future_scene_desc}'."
86
-
87
- def _generate_image_captions(self, images: list[Image.Image]) -> list[str]:
88
- """
89
- Lógica interna para gerar captions, copiada do LTX utils.
90
- """
91
- # O modelo Florence-2 do LTX não usa um system_prompt aqui, mas um task_prompt
92
- task_prompt = "<MORE_DETAILED_CAPTION>"
93
- inputs = self.caption_processor(
94
- text=[task_prompt] * len(images), images=images, return_tensors="pt"
95
- ).to(self.caption_model.device)
96
-
97
- generated_ids = self.caption_model.generate(
98
- input_ids=inputs["input_ids"],
99
- pixel_values=inputs["pixel_values"],
100
- max_new_tokens=1024,
101
- num_beams=3,
102
- )
103
-
104
- # Usa o post_process_generation para extrair a resposta limpa
105
- generated_text = self.caption_processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
106
- processed_result = self.caption_processor.post_process_generation(
107
- generated_text,
108
- task=task_prompt,
109
- image_size=(images[0].width, images[0].height)
110
- )
111
- return [processed_result[task_prompt]]
112
-
113
- def _generate_and_decode_prompts(self, messages: list[dict]) -> str:
114
- """
115
- Lógica interna para gerar prompt com o LLM, copiada do LTX utils.
116
- """
117
- text = self.llm_tokenizer.apply_chat_template(
118
- messages, tokenize=False, add_generation_prompt=True
119
- )
120
- model_inputs = self.llm_tokenizer([text], return_tensors="pt").to(self.llm_model.device)
121
-
122
- output_ids = self.llm_model.generate(**model_inputs, max_new_tokens=256)
123
-
124
- input_ids_len = model_inputs.input_ids.shape[1]
125
- decoded_prompts = self.llm_tokenizer.batch_decode(
126
- output_ids[:, input_ids_len:], skip_special_tokens=True
127
- )
128
- return decoded_prompts[0].strip()
129
-
130
- # --- Singleton Instantiation ---
131
- try:
132
- deformes3d_thinker_singleton = Deformes3DThinker()
133
- except Exception as e:
134
- # A falha já terá sido logada dentro do __init__
135
- deformes3d_thinker_singleton = None
136
- raise e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
engineers/deformes4D.py DELETED
@@ -1,338 +0,0 @@
1
- # engineers/deformes4D.py
2
- #
3
- # AducSdr: Uma implementação aberta e funcional da arquitetura ADUC-SDR
4
- # Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
5
- #
6
- # Contato:
7
- # Carlos Rodrigues dos Santos
8
- # carlex22@gmail.com
9
- # Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
10
- #
11
- # Repositórios e Projetos Relacionados:
12
- # GitHub: https://github.com/carlex22/Aduc-sdr
13
- #
14
- # This program is free software: you can redistribute it and/or modify
15
- # it under the terms of the GNU Affero General Public License as published by
16
- # the Free Software Foundation, either version 3 of the License, or
17
- # (at your option) any later version.
18
- #
19
- # This program is distributed in the hope that it will be useful,
20
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
21
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
22
- # GNU Affero General Public License for more details.
23
- #
24
- # You should have received a copy of the GNU Affero General Public License
25
- # along with this program. If not, see <https://www.gnu.org/licenses/>.
26
- #
27
- # This program is free software: you can redistribute it and/or modify
28
- # it under the terms of the GNU Affero General Public License...
29
- # PENDING PATENT NOTICE: Please see NOTICE.md.
30
- #
31
- # Version 2.0.1
32
-
33
- import os
34
- import time
35
- import imageio
36
- import numpy as np
37
- import torch
38
- import logging
39
- from PIL import Image, ImageOps
40
- from dataclasses import dataclass
41
- import gradio as gr
42
- import subprocess
43
- import gc
44
- import shutil
45
- from pathlib import Path
46
- from typing import List, Tuple, Generator, Dict, Any
47
-
48
- from aduc_types import LatentConditioningItem
49
- from managers.ltx_manager import ltx_manager_singleton
50
- from managers.latent_enhancer_manager import latent_enhancer_specialist_singleton
51
- from managers.vae_manager import vae_manager_singleton
52
- from engineers.deformes2D_thinker import deformes2d_thinker_singleton
53
- from managers.seedvr_manager import seedvr_manager_singleton
54
- from managers.mmaudio_manager import mmaudio_manager_singleton
55
- from tools.video_encode_tool import video_encode_tool_singleton
56
-
57
- logger = logging.getLogger(__name__)
58
-
59
- class Deformes4DEngine:
60
- """
61
- Implements the Camera (Ψ) and Distiller (Δ) of the ADUC-SDR architecture.
62
- Orchestrates the generation, latent post-production, and final rendering of video fragments.
63
- """
64
- def __init__(self, workspace_dir="deformes_workspace"):
65
- self.workspace_dir = workspace_dir
66
- self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
67
- logger.info("Deformes4D Specialist (ADUC-SDR Executor) initialized.")
68
- os.makedirs(self.workspace_dir, exist_ok=True)
69
-
70
- # --- HELPER METHODS ---
71
-
72
- def save_video_from_tensor(self, video_tensor: torch.Tensor, path: str, fps: int = 24):
73
- """Saves a pixel-space tensor as an MP4 video file."""
74
- if video_tensor is None or video_tensor.ndim != 5 or video_tensor.shape[2] == 0: return
75
- video_tensor = video_tensor.squeeze(0).permute(1, 2, 3, 0)
76
- video_tensor = (video_tensor.clamp(-1, 1) + 1) / 2.0
77
- video_np = (video_tensor.detach().cpu().float().numpy() * 255).astype(np.uint8)
78
- with imageio.get_writer(path, fps=fps, codec='libx264', quality=8, output_params=['-pix_fmt', 'yuv420p']) as writer:
79
- for frame in video_np: writer.append_data(frame)
80
-
81
- def read_video_to_tensor(self, video_path: str) -> torch.Tensor:
82
- """Reads a video file and converts it into a pixel-space tensor."""
83
- with imageio.get_reader(video_path, 'ffmpeg') as reader:
84
- frames = [frame for frame in reader]
85
-
86
- frames_np = np.stack(frames, axis=0).astype(np.float32) / 255.0
87
- # (F, H, W, C) -> (C, F, H, W)
88
- tensor = torch.from_numpy(frames_np).permute(3, 0, 1, 2)
89
- tensor = tensor.unsqueeze(0) # (B, C, F, H, W)
90
- tensor = (tensor * 2.0) - 1.0 # Normalize to [-1, 1]
91
- return tensor.to(self.device)
92
-
93
- def _preprocess_image_for_latent_conversion(self, image: Image.Image, target_resolution: tuple) -> Image.Image:
94
- """Resizes and fits an image to the target resolution for VAE encoding."""
95
- if image.size != target_resolution:
96
- return ImageOps.fit(image, target_resolution, Image.Resampling.LANCZOS)
97
- return image
98
-
99
- def pil_to_latent(self, pil_image: Image.Image) -> torch.Tensor:
100
- """Converts a PIL Image to a latent tensor by calling the VaeManager."""
101
- image_np = np.array(pil_image).astype(np.float32) / 255.0
102
- tensor = torch.from_numpy(image_np).permute(2, 0, 1).unsqueeze(0).unsqueeze(2)
103
- tensor = (tensor * 2.0) - 1.0
104
- return vae_manager_singleton.encode(tensor)
105
-
106
- # --- CORE ADUC-SDR LOGIC ---
107
-
108
- def generate_original_movie(self, keyframes: list, global_prompt: str, storyboard: list,
109
- seconds_per_fragment: float, trim_percent: int,
110
- handler_strength: float, destination_convergence_strength: float,
111
- video_resolution: int, use_continuity_director: bool,
112
- guidance_scale: float, stg_scale: float, num_inference_steps: int,
113
- progress: gr.Progress = gr.Progress()):
114
- FPS = 24
115
- FRAMES_PER_LATENT_CHUNK = 8
116
- LATENT_PROCESSING_CHUNK_SIZE = 4
117
-
118
- run_timestamp = int(time.time())
119
- temp_latent_dir = os.path.join(self.workspace_dir, f"temp_latents_{run_timestamp}")
120
- temp_video_clips_dir = os.path.join(self.workspace_dir, f"temp_clips_{run_timestamp}")
121
- os.makedirs(temp_latent_dir, exist_ok=True)
122
- os.makedirs(temp_video_clips_dir, exist_ok=True)
123
-
124
- total_frames_brutos = self._quantize_to_multiple(int(seconds_per_fragment * FPS), FRAMES_PER_LATENT_CHUNK)
125
- frames_a_podar = self._quantize_to_multiple(int(total_frames_brutos * (trim_percent / 100)), FRAMES_PER_LATENT_CHUNK)
126
- latents_a_podar = frames_a_podar // FRAMES_PER_LATENT_CHUNK
127
-
128
- #if frames_a_podar % 2 == 0:
129
- # frames_a_podar = frames_a_podar-1
130
-
131
- total_latent_frames = total_frames_brutos // FRAMES_PER_LATENT_CHUNK
132
-
133
- DEJAVU_FRAME_TARGET = frames_a_podar - 1 if frames_a_podar > 0 else 0
134
- DESTINATION_FRAME_TARGET = total_frames_brutos - 1
135
-
136
- base_ltx_params = {"guidance_scale": guidance_scale, "stg_scale": stg_scale, "num_inference_steps": num_inference_steps, "rescaling_scale": 0.15, "image_cond_noise_scale": 0.00}
137
- keyframe_paths = [item[0] if isinstance(item, tuple) else item for item in keyframes]
138
- story_history = ""
139
- target_resolution_tuple = (video_resolution, video_resolution)
140
- eco_latent_for_next_loop, dejavu_latent_for_next_loop = None, None
141
- latent_fragment_paths = []
142
-
143
- if len(keyframe_paths) < 2: raise gr.Error(f"Generation requires at least 2 keyframes. You provided {len(keyframe_paths)}.")
144
- num_transitions_to_generate = len(keyframe_paths) - 1
145
-
146
- logger.info("--- STARTING STAGE 1: Latent Fragment Generation ---")
147
- for i in range(num_transitions_to_generate):
148
- fragment_index = i + 1
149
- progress(i / num_transitions_to_generate, desc=f"Generating Latent {fragment_index}/{num_transitions_to_generate}")
150
- past_keyframe_path = keyframe_paths[i - 1] if i > 0 else keyframe_paths[i]
151
- start_keyframe_path = keyframe_paths[i]
152
- destination_keyframe_path = keyframe_paths[i + 1]
153
- future_story_prompt = storyboard[i + 1] if (i + 1) < len(storyboard) else "The final scene."
154
- logger.info(f"Calling deformes2D_thinker to generate cinematic decision for fragment {fragment_index}...")
155
- decision = deformes2d_thinker_singleton.get_cinematic_decision(global_prompt, story_history, past_keyframe_path, start_keyframe_path, destination_keyframe_path, storyboard[i - 1] if i > 0 else "The beginning.", storyboard[i], future_story_prompt)
156
- transition_type, motion_prompt = decision["transition_type"], decision["motion_prompt"]
157
- story_history += f"\n- Act {fragment_index}: {motion_prompt}"
158
-
159
- conditioning_items = []
160
- if eco_latent_for_next_loop is None:
161
- img_start = self._preprocess_image_for_latent_conversion(Image.open(start_keyframe_path).convert("RGB"), target_resolution_tuple)
162
- conditioning_items.append(LatentConditioningItem(self.pil_to_latent(img_start), 0, 1.0))
163
- else:
164
- conditioning_items.append(LatentConditioningItem(eco_latent_for_next_loop, 0, 1.0))
165
- conditioning_items.append(LatentConditioningItem(dejavu_latent_for_next_loop, DEJAVU_FRAME_TARGET, handler_strength))
166
-
167
- if transition_type == "cutx":
168
- logger.info(f"Cinematic Director chose a 'cut'. Creating FFmpeg transition bridge...")
169
- bridge_duration_seconds = FRAMES_PER_LATENT_CHUNK / FPS
170
- bridge_video_path = video_encode_tool_singleton.create_transition_bridge(
171
- start_image_path=start_keyframe_path, end_image_path=destination_keyframe_path,
172
- duration=bridge_duration_seconds, fps=FPS, target_resolution=target_resolution_tuple,
173
- workspace_dir=self.workspace_dir
174
- )
175
- bridge_pixel_tensor = self.read_video_to_tensor(bridge_video_path)
176
- bridge_latent_tensor = vae_manager_singleton.encode(bridge_pixel_tensor)
177
- final_fade_latent = bridge_latent_tensor[:, :, -2:, :, :]
178
- conditioning_items.append(LatentConditioningItem(final_fade_latent, total_latent_frames - 16, 0.95))
179
- #img_dest = self._preprocess_image_for_latent_conversion(Image.open(destination_keyframe_path).convert("RGB"), target_resolution_tuple)
180
- #conditioning_items.append(LatentConditioningItem(self.pil_to_latent(img_dest), DESTINATION_FRAME_TARGET, destination_convergence_strength * 0.5))
181
- del bridge_pixel_tensor, bridge_latent_tensor, final_fade_latent
182
- if os.path.exists(bridge_video_path): os.remove(bridge_video_path)
183
- else:
184
- img_dest = self._preprocess_image_for_latent_conversion(Image.open(destination_keyframe_path).convert("RGB"), target_resolution_tuple)
185
- conditioning_items.append(LatentConditioningItem(self.pil_to_latent(img_dest), DESTINATION_FRAME_TARGET, destination_convergence_strength))
186
-
187
- current_ltx_params = {**base_ltx_params, "motion_prompt": motion_prompt}
188
- logger.info(f"Calling LTX to generate video latents for fragment {fragment_index} ({total_frames_brutos} frames)...")
189
- latents_brutos, _ = self._generate_latent_tensor_internal(conditioning_items, current_ltx_params, target_resolution_tuple, total_frames_brutos)
190
- num_latent_frames = latents_brutos.shape[2]
191
- logger.info(f"LTX responded with a latent tensor of shape {latents_brutos.shape}, representing ~{num_latent_frames * 8 + 1} video frames at {FPS} FPS.")
192
-
193
- last_trim = latents_brutos[:, :, -(latents_a_podar+1):, :, :].clone()
194
- eco_latent_for_next_loop = last_trim[:, :, :2, :, :].clone()
195
- dejavu_latent_for_next_loop = last_trim[:, :, -1:, :, :].clone()
196
- latents_video = latents_brutos[:, :, :-(latents_a_podar-1), :, :].clone()
197
- latents_video = latents_video[:, :, 1:, :, :]
198
- del last_trim, latents_brutos; gc.collect(); torch.cuda.empty_cache()
199
-
200
- if transition_type == "cutx":
201
- eco_latent_for_next_loop, dejavu_latent_for_next_loop = None, None
202
-
203
-
204
- cpu_latent = latents_video.cpu()
205
- latent_path = os.path.join(temp_latent_dir, f"latent_fragment_{i:04d}.pt")
206
- torch.save(cpu_latent, latent_path)
207
- latent_fragment_paths.append(latent_path)
208
- del latents_video, cpu_latent; gc.collect()
209
- del eco_latent_for_next_loop, dejavu_latent_for_next_loop; gc.collect(); torch.cuda.empty_cache()
210
-
211
- logger.info(f"--- STARTING STAGE 2: Processing {len(latent_fragment_paths)} latents in chunks of {LATENT_PROCESSING_CHUNK_SIZE} ---")
212
- final_video_clip_paths = []
213
- num_chunks = -(-len(latent_fragment_paths) // LATENT_PROCESSING_CHUNK_SIZE)
214
- for i in range(num_chunks):
215
- chunk_start_index = i * LATENT_PROCESSING_CHUNK_SIZE
216
- chunk_end_index = chunk_start_index + LATENT_PROCESSING_CHUNK_SIZE
217
- chunk_paths = latent_fragment_paths[chunk_start_index:chunk_end_index]
218
- progress(i / num_chunks, desc=f"Processing & Decoding Batch {i+1}/{num_chunks}")
219
- tensors_in_chunk = [torch.load(p, map_location=self.device) for p in chunk_paths]
220
- tensors_para_concatenar = [frag[:, :, :-1, :, :] if j < len(tensors_in_chunk) - 1 else frag for j, frag in enumerate(tensors_in_chunk)]
221
- sub_group_latent = torch.cat(tensors_para_concatenar, dim=2)
222
- del tensors_in_chunk, tensors_para_concatenar; gc.collect(); torch.cuda.empty_cache()
223
- logger.info(f"Batch {i+1} concatenated. Latent shape: {sub_group_latent.shape}")
224
- base_name = f"clip_{i:04d}_{run_timestamp}"
225
- current_clip_path = os.path.join(temp_video_clips_dir, f"{base_name}.mp4")
226
- pixel_tensor = vae_manager_singleton.decode(sub_group_latent)
227
- self.save_video_from_tensor(pixel_tensor, current_clip_path, fps=FPS)
228
- del pixel_tensor, sub_group_latent; gc.collect(); torch.cuda.empty_cache()
229
- final_video_clip_paths.append(current_clip_path)
230
-
231
- progress(0.98, desc="Final assembly of clips...")
232
- final_video_path = os.path.join(self.workspace_dir, f"original_movie_{run_timestamp}.mp4")
233
- video_encode_tool_singleton.concatenate_videos(video_paths=final_video_clip_paths, output_path=final_video_path, workspace_dir=self.workspace_dir)
234
- logger.info("Cleaning up temporary clip files...")
235
- try:
236
- shutil.rmtree(temp_video_clips_dir)
237
- except OSError as e:
238
- logger.warning(f"Could not remove temporary clip directory: {e}")
239
- logger.info(f"Process complete! Original video saved to: {final_video_path}")
240
- return {"final_path": final_video_path, "latent_paths": latent_fragment_paths}
241
-
242
- def upscale_latents_and_create_video(self, latent_paths: list, chunk_size: int, progress: gr.Progress):
243
- if not latent_paths:
244
- raise gr.Error("Cannot perform upscaling: no latent paths were provided.")
245
- logger.info("--- STARTING POST-PRODUCTION: Latent Upscaling ---")
246
- run_timestamp = int(time.time())
247
- temp_upscaled_clips_dir = os.path.join(self.workspace_dir, f"temp_upscaled_clips_{run_timestamp}")
248
- os.makedirs(temp_upscaled_clips_dir, exist_ok=True)
249
- final_upscaled_clip_paths = []
250
- num_chunks = -(-len(latent_paths) // chunk_size)
251
- for i in range(num_chunks):
252
- chunk_start_index = i * chunk_size
253
- chunk_end_index = chunk_start_index + chunk_size
254
- chunk_paths = latent_paths[chunk_start_index:chunk_end_index]
255
- progress(i / num_chunks, desc=f"Upscaling & Decoding Batch {i+1}/{num_chunks}")
256
- tensors_in_chunk = [torch.load(p, map_location=self.device) for p in chunk_paths]
257
- tensors_para_concatenar = [frag[:, :, :-1, :, :] if j < len(tensors_in_chunk) - 1 else frag for j, frag in enumerate(tensors_in_chunk)]
258
- sub_group_latent = torch.cat(tensors_para_concatenar, dim=2)
259
- del tensors_in_chunk, tensors_para_concatenar; gc.collect(); torch.cuda.empty_cache()
260
- logger.info(f"Batch {i+1} loaded. Original latent shape: {sub_group_latent.shape}")
261
- upscaled_latent_chunk = latent_enhancer_specialist_singleton.upscale(sub_group_latent)
262
- del sub_group_latent; gc.collect(); torch.cuda.empty_cache()
263
- logger.info(f"Batch {i+1} upscaled. New latent shape: {upscaled_latent_chunk.shape}")
264
- pixel_tensor = vae_manager_singleton.decode(upscaled_latent_chunk)
265
- del upscaled_latent_chunk; gc.collect(); torch.cuda.empty_cache()
266
- base_name = f"upscaled_clip_{i:04d}_{run_timestamp}"
267
- current_clip_path = os.path.join(temp_upscaled_clips_dir, f"{base_name}.mp4")
268
- self.save_video_from_tensor(pixel_tensor, current_clip_path, fps=24)
269
- final_upscaled_clip_paths.append(current_clip_path)
270
- del pixel_tensor; gc.collect(); torch.cuda.empty_cache()
271
- logger.info(f"Saved upscaled clip: {Path(current_clip_path).name}")
272
- progress(0.98, desc="Assembling upscaled clips...")
273
- final_video_path = os.path.join(self.workspace_dir, f"upscaled_movie_{run_timestamp}.mp4")
274
- video_encode_tool_singleton.concatenate_videos(video_paths=final_upscaled_clip_paths, output_path=final_video_path, workspace_dir=self.workspace_dir)
275
- logger.info("Cleaning up temporary upscaled clip files...")
276
- try:
277
- shutil.rmtree(temp_upscaled_clips_dir)
278
- except OSError as e:
279
- logger.warning(f"Could not remove temporary upscaled clip directory: {e}")
280
- logger.info(f"Latent upscaling complete! Final video at: {final_video_path}")
281
- yield {"final_path": final_video_path}
282
-
283
- def master_video_hd(self, source_video_path: str, model_version: str, steps: int, prompt: str, progress: gr.Progress):
284
- logger.info(f"--- STARTING POST-PRODUCTION: HD Mastering with SeedVR {model_version} ---")
285
- progress(0.1, desc=f"Preparing for HD Mastering with SeedVR {model_version}...")
286
- run_timestamp = int(time.time())
287
- output_path = os.path.join(self.workspace_dir, f"hd_mastered_movie_{model_version}_{run_timestamp}.mp4")
288
- try:
289
- final_path = seedvr_manager_singleton.process_video(
290
- input_video_path=source_video_path,
291
- output_video_path=output_path,
292
- prompt=prompt,
293
- model_version=model_version,
294
- steps=steps,
295
- progress=progress
296
- )
297
- logger.info(f"HD Mastering complete! Final video at: {final_path}")
298
- yield {"final_path": final_path}
299
- except Exception as e:
300
- logger.error(f"HD Mastering failed: {e}", exc_info=True)
301
- raise gr.Error(f"HD Mastering failed. Details: {e}")
302
-
303
- def generate_audio_for_final_video(self, source_video_path: str, audio_prompt: str, progress: gr.Progress):
304
- logger.info(f"--- STARTING POST-PRODUCTION: Audio Generation ---")
305
- progress(0.1, desc="Preparing for audio generation...")
306
- run_timestamp = int(time.time())
307
- source_name = Path(source_video_path).stem
308
- output_path = os.path.join(self.workspace_dir, f"{source_name}_with_audio_{run_timestamp}.mp4")
309
- try:
310
- result = subprocess.run(
311
- ["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", source_video_path],
312
- capture_output=True, text=True, check=True)
313
- duration = float(result.stdout.strip())
314
- logger.info(f"Source video duration: {duration:.2f} seconds.")
315
- progress(0.5, desc="Generating audio track...")
316
- final_path = mmaudio_manager_singleton.generate_audio_for_video(
317
- video_path=source_video_path,
318
- prompt=audio_prompt,
319
- duration_seconds=duration,
320
- output_path_override=output_path
321
- )
322
- logger.info(f"Audio generation complete! Final video with audio at: {final_path}")
323
- progress(1.0, desc="Audio generation complete!")
324
- yield {"final_path": final_path}
325
- except Exception as e:
326
- logger.error(f"Audio generation failed: {e}", exc_info=True)
327
- raise gr.Error(f"Audio generation failed. Details: {e}")
328
-
329
- def _generate_latent_tensor_internal(self, conditioning_items, ltx_params, target_resolution, total_frames_to_generate):
330
- """Internal helper to call the LTX manager."""
331
- final_ltx_params = {**ltx_params, 'width': target_resolution[0], 'height': target_resolution[1], 'video_total_frames': total_frames_to_generate, 'video_fps': 24, 'current_fragment_index': int(time.time()), 'conditioning_items_data': conditioning_items}
332
- return ltx_manager_singleton.generate_latent_fragment(**final_ltx_params)
333
-
334
- def _quantize_to_multiple(self, n, m):
335
- """Helper to round n to the nearest multiple of m."""
336
- if m == 0: return n
337
- quantized = int(round(n / m) * m)
338
- return m if n > 0 and quantized == 0 else quantized
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
engineers/deformes7D.py DELETED
@@ -1,316 +0,0 @@
1
- # engineers/deformes7D.py
2
- #
3
- # AducSdr: Uma implementação aberta e funcional da arquitetura ADUC-SDR
4
- # Copyright (C) 4 de Agosto de 2025 Carlos Rodrigues dos Santos
5
- #
6
- # Contato:
7
- # Carlos Rodrigues dos Santos
8
- # carlex22@gmail.com
9
- # Rua Eduardo Carlos Pereira, 4125, B1 Ap32, Curitiba, PR, Brazil, CEP 8102025
10
- #
11
- # Repositórios e Projetos Relacionados:
12
- # GitHub: https://github.com/carlex22/Aduc-sdr
13
- #
14
- # This program is free software: you can redistribute it and/or modify
15
- # it under the terms of the GNU Affero General Public License as published by
16
- # the Free Software Foundation, either version 3 of the License, or
17
- # (at your option) any later version.
18
- #
19
- # This program is distributed in the hope that it will be useful,
20
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
21
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
22
- # GNU Affero General Public License for more details.
23
- #
24
- # You should have received a copy of the GNU Affero General Public License
25
- # along with this program. If not, see <https://www.gnu.org/licenses/>.
26
- #
27
- # This program is free software: you can redistribute it and/or modify
28
- # it under the terms of the GNU Affero General Public License...
29
- # PENDING PATENT NOTICE: Please see NOTICE.md.
30
- #
31
- # Version 3.2.1
32
-
33
- import os
34
- import time
35
- import imageio
36
- import numpy as np
37
- import torch
38
- import logging
39
- from PIL import Image, ImageOps
40
- import gradio as gr
41
- import subprocess
42
- import gc
43
- import yaml
44
- import shutil
45
- from pathlib import Path
46
- from typing import List, Tuple, Dict, Generator
47
-
48
- from aduc_types import LatentConditioningItem
49
- from managers.ltx_manager import ltx_manager_singleton
50
- from managers.latent_enhancer_manager import latent_enhancer_specialist_singleton
51
- from managers.vae_manager import vae_manager_singleton
52
- from engineers.deformes2D_thinker import deformes2d_thinker_singleton
53
- from engineers.deformes3D_thinker import deformes3d_thinker_singleton
54
- from managers.seedvr_manager import seedvr_manager_singleton
55
- from managers.mmaudio_manager import mmaudio_manager_singleton
56
- from tools.video_encode_tool import video_encode_tool_singleton
57
-
58
- logger = logging.getLogger(__name__)
59
-
60
- class Deformes7DEngine:
61
- # ... (todo o corpo da classe permanece exatamente o mesmo da nossa última versão) ...
62
- """
63
- Unified 3D/4D engine for continuous, interleaved generation of keyframes and video fragments.
64
- """
65
- def __init__(self, workspace_dir="deformes_workspace"):
66
- self.workspace_dir = workspace_dir
67
- self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
68
- logger.info("Deformes7D Unified Engine initialized.")
69
- os.makedirs(self.workspace_dir, exist_ok=True)
70
-
71
- # --- HELPER METHODS ---
72
- def save_video_from_tensor(self, video_tensor: torch.Tensor, path: str, fps: int = 24):
73
- """Saves a pixel-space tensor as an MP4 video file."""
74
- if video_tensor is None or video_tensor.ndim != 5 or video_tensor.shape[2] == 0: return
75
- video_tensor = video_tensor.squeeze(0).permute(1, 2, 3, 0)
76
- video_tensor = (video_tensor.clamp(-1, 1) + 1) / 2.0
77
- video_np = (video_tensor.detach().cpu().float().numpy() * 255).astype(np.uint8)
78
- with imageio.get_writer(path, fps=fps, codec='libx264', quality=8, output_params=['-pix_fmt', 'yuv420p']) as writer:
79
- for frame in video_np: writer.append_data(frame)
80
-
81
- def read_video_to_tensor(self, video_path: str) -> torch.Tensor:
82
- """Reads a video file and converts it into a pixel-space tensor."""
83
- with imageio.get_reader(video_path, 'ffmpeg') as reader:
84
- frames = [frame for frame in reader]
85
- frames_np = np.stack(frames, axis=0).astype(np.float32) / 255.0
86
- tensor = torch.from_numpy(frames_np).permute(3, 0, 1, 2)
87
- tensor = tensor.unsqueeze(0)
88
- tensor = (tensor * 2.0) - 1.0
89
- return tensor.to(self.device)
90
-
91
- def _preprocess_image(self, image: Image.Image, target_resolution: tuple) -> Image.Image:
92
- if image.size != target_resolution:
93
- return ImageOps.fit(image, target_resolution, Image.Resampling.LANCZOS)
94
- return image
95
-
96
- def _pil_to_pixel_tensor(self, pil_image: Image.Image) -> torch.Tensor:
97
- image_np = np.array(pil_image).astype(np.float32) / 255.0
98
- tensor = torch.from_numpy(image_np).permute(2, 0, 1).unsqueeze(0).unsqueeze(2)
99
- return (tensor * 2.0) - 1.0
100
-
101
- def _save_image_from_tensor(self, pixel_tensor: torch.Tensor, path: str):
102
- tensor_chw = pixel_tensor.squeeze(0).squeeze(1)
103
- tensor_hwc = tensor_chw.permute(1, 2, 0)
104
- tensor_hwc = (tensor_hwc.clamp(-1, 1) + 1) / 2.0
105
- image_np = (tensor_hwc.cpu().float().numpy() * 255).astype(np.uint8)
106
- Image.fromarray(image_np).save(path)
107
-
108
- def _quantize_to_multiple(self, n, m):
109
- if m == 0: return n
110
- quantized = int(round(n / m) * m)
111
- return m if n > 0 and quantized == 0 else quantized
112
-
113
- # --- CORE GENERATION LOGIC ---
114
- def _generate_next_causal_keyframe(self, base_keyframe_path: str, all_ref_paths: list,
115
- prompt: str, resolution_tuple: tuple) -> Tuple[str, torch.Tensor]:
116
- # (código interno deste método permanece o mesmo)
117
- ltx_context_paths = [base_keyframe_path] + [p for p in all_ref_paths if p != base_keyframe_path][:3]
118
- ltx_conditioning_items = []
119
- weight = 1.0
120
- for path in ltx_context_paths:
121
- img_pil = Image.open(path).convert("RGB")
122
- img_processed = self._preprocess_image(img_pil, resolution_tuple)
123
- pixel_tensor = self._pil_to_pixel_tensor(img_processed)
124
- latent_tensor = vae_manager_singleton.encode(pixel_tensor)
125
- ltx_conditioning_items.append(LatentConditioningItem(latent_tensor, 0, weight))
126
- if weight == 1.0: weight = -0.2
127
- else: weight -= 0.2
128
- ltx_base_params = {"guidance_scale": 3.0, "stg_scale": 0.1, "num_inference_steps": 25}
129
- generated_latents, _ = ltx_manager_singleton.generate_latent_fragment(
130
- height=resolution_tuple[0], width=resolution_tuple[1],
131
- conditioning_items_data=ltx_conditioning_items, motion_prompt=prompt,
132
- video_total_frames=48, video_fps=24, **ltx_base_params
133
- )
134
- final_latent = generated_latents[:, :, -1:, :, :]
135
- upscaled_latent = latent_enhancer_specialist_singleton.upscale(final_latent)
136
- pixel_tensor_out = vae_manager_singleton.decode(upscaled_latent)
137
- timestamp = int(time.time() * 1000)
138
- output_path = os.path.join(self.workspace_dir, f"keyframe_{timestamp}.png")
139
- self._save_image_from_tensor(pixel_tensor_out, output_path)
140
- return output_path, final_latent
141
-
142
- def generate_full_movie_interleaved(self, initial_ref_paths: list, storyboard: list, global_prompt: str,
143
- video_resolution: int, seconds_per_fragment: float, trim_percent: int,
144
- handler_strength: float, dest_strength: float, ltx_params: dict,
145
- progress=gr.Progress()):
146
- # (código interno deste método permanece o mesmo)
147
- logger.info("--- DEFORMES 7D: INITIATING INTERLEAVED RENDERING PIPELINE ---")
148
- run_timestamp = int(time.time())
149
- temp_video_clips_dir = os.path.join(self.workspace_dir, f"temp_clips_{run_timestamp}")
150
- os.makedirs(temp_video_clips_dir, exist_ok=True)
151
- FPS = 24
152
- FRAMES_PER_LATENT_CHUNK = 8
153
- resolution_tuple = (video_resolution, video_resolution)
154
- generated_keyframe_paths, generated_keyframe_latents, generated_video_fragment_paths = [], [], []
155
- progress(0, desc="Bootstrap: Processing K0...")
156
- k0_path = initial_ref_paths[0]
157
- k0_pil = Image.open(k0_path).convert("RGB")
158
- k0_processed_pil = self._preprocess_image(k0_pil, resolution_tuple)
159
- k0_pixel_tensor = self._pil_to_pixel_tensor(k0_processed_pil)
160
- k0_latent = vae_manager_singleton.encode(k0_pixel_tensor)
161
- generated_keyframe_paths.append(k0_path)
162
- generated_keyframe_latents.append(k0_latent)
163
- progress(0.01, desc="Bootstrap: Generating K1...")
164
- prompt_k1 = deformes2d_thinker_singleton.get_anticipatory_keyframe_prompt(
165
- global_prompt, "Initial scene.", storyboard[0], storyboard[1], k0_path, initial_ref_paths
166
- )
167
- k1_path, k1_latent = self._generate_next_causal_keyframe(k0_path, initial_ref_paths, prompt_k1, resolution_tuple)
168
- generated_keyframe_paths.append(k1_path)
169
- generated_keyframe_latents.append(k1_latent)
170
- story_history = ""
171
- eco_latent_for_next_loop, dejavu_latent_for_next_loop = None, None
172
- num_transitions = len(storyboard) - 1
173
- base_4d_ltx_params = {"rescaling_scale": 0.15, "image_cond_noise_scale": 0.00, **ltx_params}
174
-
175
- for i in range(1, num_transitions):
176
- act_progress = i / num_transitions
177
- progress(act_progress, desc=f"Processing Act {i+1}/{num_transitions} (Keyframe Gen)...")
178
- logger.info(f"--> Step 3D: Generating Keyframe K{i+1}")
179
- kx_path = generated_keyframe_paths[i]
180
- prompt_ky = deformes2d_thinker_singleton.get_anticipatory_keyframe_prompt(
181
- global_prompt, story_history, storyboard[i], storyboard[i+1], kx_path, initial_ref_paths
182
- )
183
- ky_path, ky_latent = self._generate_next_causal_keyframe(kx_path, initial_ref_paths, prompt_ky, resolution_tuple)
184
- generated_keyframe_paths.append(ky_path)
185
- generated_keyframe_latents.append(ky_latent)
186
- progress(act_progress, desc=f"Processing Act {i+1}/{num_transitions} (Video Gen)...")
187
- logger.info(f"--> Step 4D: Generating Video Fragment V{i-1}")
188
- kb_path, kx_path, ky_path = generated_keyframe_paths[i-1], generated_keyframe_paths[i], generated_keyframe_paths[i+1]
189
- motion_prompt = deformes3d_thinker_singleton.get_enhanced_motion_prompt(
190
- global_prompt, story_history, kb_path, kx_path, ky_path,
191
- storyboard[i-1], storyboard[i], storyboard[i+1]
192
- )
193
- transition_type = "continuous"
194
- story_history += f"\n- Act {i}: {motion_prompt}"
195
- total_frames_brutos = self._quantize_to_multiple(int(seconds_per_fragment * FPS), FRAMES_PER_LATENT_CHUNK)
196
- frames_a_podar = self._quantize_to_multiple(int(total_frames_brutos * (trim_percent / 100)), FRAMES_PER_LATENT_CHUNK)
197
- latents_a_podar = frames_a_podar // FRAMES_PER_LATENT_CHUNK
198
- DEJAVU_FRAME_TARGET = frames_a_podar - 1 if frames_a_podar > 0 else 0
199
- DESTINATION_FRAME_TARGET = total_frames_brutos - 1
200
- conditioning_items = []
201
- if eco_latent_for_next_loop is None:
202
- conditioning_items.append(LatentConditioningItem(generated_keyframe_latents[i], 0, 1.0))
203
- else:
204
- conditioning_items.append(LatentConditioningItem(eco_latent_for_next_loop, 0, 1.0))
205
- conditioning_items.append(LatentConditioningItem(dejavu_latent_for_next_loop, DEJAVU_FRAME_TARGET, handler_strength))
206
- if transition_type != "cut":
207
- conditioning_items.append(LatentConditioningItem(ky_latent, DESTINATION_FRAME_TARGET, dest_strength))
208
- fragment_latents_brutos, _ = ltx_manager_singleton.generate_latent_fragment(
209
- height=video_resolution, width=video_resolution,
210
- conditioning_items_data=conditioning_items, motion_prompt=motion_prompt,
211
- video_total_frames=total_frames_brutos, video_fps=FPS, **base_4d_ltx_params
212
- )
213
- last_trim = fragment_latents_brutos[:, :, -(latents_a_podar+1):, :, :].clone()
214
- eco_latent_for_next_loop = last_trim[:, :, :2, :, :].clone()
215
- dejavu_latent_for_next_loop = last_trim[:, :, -1:, :, :].clone()
216
- final_fragment_latents = fragment_latents_brutos[:, :, :-(latents_a_podar-1), :, :].clone()
217
- final_fragment_latents = final_fragment_latents[:, :, 1:, :, :]
218
- pixel_tensor = vae_manager_singleton.decode(final_fragment_latents)
219
- fragment_path = os.path.join(temp_video_clips_dir, f"fragment_{i-1}.mp4")
220
- self.save_video_from_tensor(pixel_tensor, fragment_path, fps=FPS)
221
- generated_video_fragment_paths.append(fragment_path)
222
- logger.info(f"Video Fragment V{i-1} saved to {fragment_path}")
223
-
224
- logger.info("--- Final Assembly of Video Fragments ---")
225
- final_video_path = os.path.join(self.workspace_dir, f"movie_7D_{run_timestamp}.mp4")
226
- video_encode_tool_singleton.concatenate_videos(generated_video_fragment_paths, final_video_path, self.workspace_dir)
227
- shutil.rmtree(temp_video_clips_dir)
228
- logger.info(f"Full movie generated at: {final_video_path}")
229
- return {"final_path": final_video_path, "all_keyframes": generated_keyframe_paths, "latent_paths": "NOT_IMPLEMENTED_YET"}
230
-
231
- # --- POST-PRODUCTION METHODS ---
232
- def task_run_latent_upscaling(self, latent_paths: list, chunk_size: int, progress: gr.Progress) -> Generator[Dict[str, any], None, None]:
233
- # (código interno deste método permanece o mesmo)
234
- if not latent_paths:
235
- raise gr.Error("Cannot perform upscaling: no latent paths were provided from the main generation.")
236
- logger.info("--- POST-PRODUCTION: Latent Upscaling ---")
237
- run_timestamp = int(time.time())
238
- temp_upscaled_clips_dir = os.path.join(self.workspace_dir, f"temp_upscaled_clips_{run_timestamp}")
239
- os.makedirs(temp_upscaled_clips_dir, exist_ok=True)
240
- final_upscaled_clip_paths = []
241
- num_chunks = -(-len(latent_paths) // chunk_size)
242
- for i in range(num_chunks):
243
- chunk_start_index = i * chunk_size
244
- chunk_end_index = chunk_start_index + chunk_size
245
- chunk_paths = latent_paths[chunk_start_index:chunk_end_index]
246
- progress(i / num_chunks, desc=f"Upscaling & Decoding Batch {i+1}/{num_chunks}")
247
- tensors_in_chunk = [torch.load(p, map_location=self.device) for p in chunk_paths]
248
- tensors_para_concatenar = [frag[:, :, :-1, :, :] if j < len(tensors_in_chunk) - 1 else frag for j, frag in enumerate(tensors_in_chunk)]
249
- sub_group_latent = torch.cat(tensors_para_concatenar, dim=2)
250
- del tensors_in_chunk, tensors_para_concatenar; gc.collect(); torch.cuda.empty_cache()
251
- upscaled_latent_chunk = latent_enhancer_specialist_singleton.upscale(sub_group_latent)
252
- del sub_group_latent; gc.collect(); torch.cuda.empty_cache()
253
- pixel_tensor = vae_manager_singleton.decode(upscaled_latent_chunk)
254
- del upscaled_latent_chunk; gc.collect(); torch.cuda.empty_cache()
255
- base_name = f"upscaled_clip_{i:04d}_{run_timestamp}"
256
- current_clip_path = os.path.join(temp_upscaled_clips_dir, f"{base_name}.mp4")
257
- self.save_video_from_tensor(pixel_tensor, current_clip_path, fps=24)
258
- final_upscaled_clip_paths.append(current_clip_path)
259
- del pixel_tensor; gc.collect(); torch.cuda.empty_cache()
260
- progress(0.98, desc="Assembling upscaled clips...")
261
- final_video_path = os.path.join(self.workspace_dir, f"upscaled_movie_{run_timestamp}.mp4")
262
- video_encode_tool_singleton.concatenate_videos(video_paths=final_upscaled_clip_paths, output_path=final_video_path, workspace_dir=self.workspace_dir)
263
- shutil.rmtree(temp_upscaled_clips_dir)
264
- logger.info(f"Latent upscaling complete! Final video at: {final_video_path}")
265
- yield {"final_path": final_video_path}
266
-
267
- def master_video_hd(self, source_video_path: str, model_version: str, steps: int, prompt: str, progress: gr.Progress):
268
- # (código interno deste método permanece o mesmo)
269
- logger.info(f"--- POST-PRODUCTION: HD Mastering with SeedVR {model_version} ---")
270
- run_timestamp = int(time.time())
271
- output_path = os.path.join(self.workspace_dir, f"{Path(source_video_path).stem}_hd.mp4")
272
- try:
273
- final_path = seedvr_manager_singleton.process_video(
274
- input_video_path=source_video_path, output_video_path=output_path,
275
- prompt=prompt, model_version=model_version, steps=steps, progress=progress
276
- )
277
- yield {"final_path": final_path}
278
- except Exception as e:
279
- logger.error(f"HD Mastering failed: {e}", exc_info=True)
280
- raise gr.Error(f"HD Mastering failed. Details: {e}")
281
-
282
- def generate_audio(self, source_video_path: str, audio_prompt: str, progress: gr.Progress):
283
- # (código interno deste método permanece o mesmo)
284
- logger.info(f"--- POST-PRODUCTION: Audio Generation ---")
285
- run_timestamp = int(time.time())
286
- output_path = os.path.join(self.workspace_dir, f"{Path(source_video_path).stem}_audio.mp4")
287
- try:
288
- result = subprocess.run(
289
- ["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", source_video_path],
290
- capture_output=True, text=True, check=True)
291
- duration = float(result.stdout.strip())
292
- progress(0.5, desc="Generating audio track...")
293
- final_path = mmaudio_manager_singleton.generate_audio_for_video(
294
- video_path=source_video_path, prompt=audio_prompt,
295
- duration_seconds=duration, output_path_override=output_path
296
- )
297
- yield {"final_path": final_path}
298
- except Exception as e:
299
- logger.error(f"Audio generation failed: {e}", exc_info=True)
300
- raise gr.Error(f"Audio generation failed. Details: {e}")
301
-
302
- # --- Singleton Instantiation ---
303
- try:
304
- config_path = Path(__file__).resolve().parent.parent / "config.yaml"
305
- with open(config_path, 'r') as f:
306
- config = yaml.safe_load(f)
307
- WORKSPACE_DIR = config['application']['workspace_dir']
308
- deformes7d_engine_singleton = Deformes7DEngine(workspace_dir=WORKSPACE_DIR)
309
- # <--- INÍCIO DA CORREÇÃO --->
310
- except Exception as e:
311
- # Loga o erro como CRÍTICO, pois a aplicação não pode funcionar sem este motor.
312
- logger.critical(f"CRITICAL: Failed to initialize the Deformes7DEngine singleton from {config_path}: {e}", exc_info=True)
313
- # Relança a exceção para parar a aplicação imediatamente.
314
- # Isso evita o erro 'NoneType' mais tarde e fornece um ponto claro de falha.
315
- raise
316
- # <--- FIM DA CORREÇÃO --->