Spaces:
Runtime error
Runtime error
Upload folder using huggingface_hub
Browse files- README.md +32 -10
- requirements.txt +3 -2
- src/app.py +39 -3
README.md
CHANGED
|
@@ -4,7 +4,7 @@ app_file: src/app.py
|
|
| 4 |
sdk: gradio
|
| 5 |
sdk_version: 5.13.1
|
| 6 |
---
|
| 7 |
-
# 🎓 AI Script
|
| 8 |
|
| 9 |
[](https://opensource.org/licenses/MIT)
|
| 10 |
[](https://www.python.org/downloads/)
|
|
@@ -12,13 +12,31 @@ sdk_version: 5.13.1
|
|
| 12 |
[](http://makeapullrequest.com)
|
| 13 |
[](https://huggingface.co/spaces/rogeliorichman/AI_Script_Generator)
|
| 14 |
|
| 15 |
-
> Transform transcripts and PDFs into timed, structured teaching scripts using AI
|
| 16 |
|
| 17 |
-
AI Script
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
## 🔗 Live Demo
|
| 20 |
|
| 21 |
-
Try it out: [AI Script
|
| 22 |
|
| 23 |
## ✨ Features
|
| 24 |
|
|
@@ -30,6 +48,8 @@ Try it out: [AI Script Generator on Hugging Face Spaces](https://huggingface.co/
|
|
| 30 |
- ⏱️ Time-marked sections for pacing
|
| 31 |
- 🌐 Multilingual interface (English/Spanish) with flag selector
|
| 32 |
- 🌍 Generation in ANY language through the guiding prompt (not limited to UI languages)
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## Output Format
|
| 35 |
|
|
@@ -189,12 +209,15 @@ Our system uses a sophisticated multi-stage prompting approach:
|
|
| 189 |
|
| 190 |
### Architecture
|
| 191 |
|
| 192 |
-
The system follows a modular design:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
|
| 194 |
-
|
| 195 |
-
- 🔍 Text analysis component
|
| 196 |
-
- 🤖 AI integration layer
|
| 197 |
-
- 📝 Output formatting system
|
| 198 |
|
| 199 |
## 🤝 Contributing
|
| 200 |
|
|
@@ -214,7 +237,6 @@ Distributed under the MIT License. See `LICENSE` for more information.
|
|
| 214 |
|
| 215 |
## 🌟 Acknowledgments
|
| 216 |
|
| 217 |
-
- Thanks to all contributors who have helped shape AI Script Generator
|
| 218 |
- Special thanks to the Gemini and OpenAI teams for their amazing APIs
|
| 219 |
- Inspired by educators and communicators worldwide who make learning engaging
|
| 220 |
|
|
|
|
| 4 |
sdk: gradio
|
| 5 |
sdk_version: 5.13.1
|
| 6 |
---
|
| 7 |
+
# 🎓 AI Agent Script Builder
|
| 8 |
|
| 9 |
[](https://opensource.org/licenses/MIT)
|
| 10 |
[](https://www.python.org/downloads/)
|
|
|
|
| 12 |
[](http://makeapullrequest.com)
|
| 13 |
[](https://huggingface.co/spaces/rogeliorichman/AI_Script_Generator)
|
| 14 |
|
| 15 |
+
> Transform transcripts and PDFs into timed, structured teaching scripts using an autonomous AI agent
|
| 16 |
|
| 17 |
+
AI Agent Script Builder is an advanced autonomous agent that converts PDF transcripts, raw text, and conversational content into well-structured teaching scripts. It seamlessly processes inputs, extracting and analyzing the content to create organized, pedagogically scripts with time markers. Designed for educators, students, content creators, and anyone looking to transform information into clear explanations.
|
| 18 |
+
|
| 19 |
+
## 🤖 AI Agent Architecture
|
| 20 |
+
|
| 21 |
+
AI Agent Script Builder functions as a **specialized AI agent** that autonomously processes and transforms content with minimal human intervention:
|
| 22 |
+
|
| 23 |
+
### Agent Capabilities
|
| 24 |
+
- **Autonomous Processing**: Independently analyzes content, determines structure, and generates complete scripts
|
| 25 |
+
- **Decision Making**: Intelligently allocates time, prioritizes topics, and structures content based on input analysis
|
| 26 |
+
- **Contextual Adaptation**: Adjusts to different languages, styles, and requirements through guiding prompts
|
| 27 |
+
- **Obstacle Management**: Implements progressive retry strategies when facing API quota limitations
|
| 28 |
+
- **Goal-Oriented Operation**: Consistently works toward transforming unstructured information into coherent educational scripts
|
| 29 |
+
|
| 30 |
+
### Agent Limitations
|
| 31 |
+
- **Domain Specificity**: Specialized for educational script generation rather than general-purpose tasks
|
| 32 |
+
- **External API Dependency**: Relies on third-party language models (Gemini/OpenAI) for core reasoning
|
| 33 |
+
- **No Continuous Learning**: Does not improve through experience or previous interactions
|
| 34 |
+
|
| 35 |
+
This architecture enables the system to function autonomously within its specialized domain while maintaining high-quality output and resilience to common obstacles.
|
| 36 |
|
| 37 |
## 🔗 Live Demo
|
| 38 |
|
| 39 |
+
Try it out: [AI Agent Script Builder on Hugging Face Spaces](https://huggingface.co/spaces/rogeliorichman/AI_Script_Generator)
|
| 40 |
|
| 41 |
## ✨ Features
|
| 42 |
|
|
|
|
| 48 |
- ⏱️ Time-marked sections for pacing
|
| 49 |
- 🌐 Multilingual interface (English/Spanish) with flag selector
|
| 50 |
- 🌍 Generation in ANY language through the guiding prompt (not limited to UI languages)
|
| 51 |
+
- 🧠 Autonomous decision-making for content organization and pacing
|
| 52 |
+
- 🛡️ Self-healing capabilities with progressive retry strategies for API limitations
|
| 53 |
|
| 54 |
## Output Format
|
| 55 |
|
|
|
|
| 209 |
|
| 210 |
### Architecture
|
| 211 |
|
| 212 |
+
The system follows a modular agent-based design:
|
| 213 |
+
|
| 214 |
+
- 📄 PDF/text processing module (Perception)
|
| 215 |
+
- 🔍 Text analysis component (Cognition)
|
| 216 |
+
- 🤖 AI integration layer (Decision-making)
|
| 217 |
+
- 📝 Output formatting system (Action)
|
| 218 |
+
- 🔄 Error handling system (Self-correction)
|
| 219 |
|
| 220 |
+
This agent architecture enables autonomous processing from raw input to final output with built-in adaptation to errors and limitations.
|
|
|
|
|
|
|
|
|
|
| 221 |
|
| 222 |
## 🤝 Contributing
|
| 223 |
|
|
|
|
| 237 |
|
| 238 |
## 🌟 Acknowledgments
|
| 239 |
|
|
|
|
| 240 |
- Special thanks to the Gemini and OpenAI teams for their amazing APIs
|
| 241 |
- Inspired by educators and communicators worldwide who make learning engaging
|
| 242 |
|
requirements.txt
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
gradio
|
| 2 |
transformers>=4.30.0
|
| 3 |
torch>=2.0.0
|
| 4 |
pypdf2>=3.0.0
|
|
@@ -6,4 +6,5 @@ python-dotenv>=0.19.0
|
|
| 6 |
numpy>=1.21.0
|
| 7 |
tqdm>=4.65.0
|
| 8 |
openai>=1.0.0
|
| 9 |
-
tiktoken>=0.5.0
|
|
|
|
|
|
| 1 |
+
gradio==4.26.0
|
| 2 |
transformers>=4.30.0
|
| 3 |
torch>=2.0.0
|
| 4 |
pypdf2>=3.0.0
|
|
|
|
| 6 |
numpy>=1.21.0
|
| 7 |
tqdm>=4.65.0
|
| 8 |
openai>=1.0.0
|
| 9 |
+
tiktoken>=0.5.0
|
| 10 |
+
fastapi<0.110.0
|
src/app.py
CHANGED
|
@@ -1,12 +1,17 @@
|
|
| 1 |
import os
|
| 2 |
import gradio as gr
|
| 3 |
import re
|
|
|
|
| 4 |
from dotenv import load_dotenv
|
| 5 |
from src.core.transformer import TranscriptTransformer
|
| 6 |
from src.utils.pdf_processor import PDFProcessor
|
| 7 |
from src.utils.text_processor import TextProcessor
|
| 8 |
|
|
|
|
|
|
|
|
|
|
| 9 |
load_dotenv()
|
|
|
|
| 10 |
|
| 11 |
# Translations dictionary for UI elements
|
| 12 |
TRANSLATIONS = {
|
|
@@ -66,12 +71,14 @@ LANGUAGE_PROMPTS = {
|
|
| 66 |
|
| 67 |
class TranscriptTransformerApp:
|
| 68 |
def __init__(self):
|
|
|
|
| 69 |
self.pdf_processor = PDFProcessor()
|
| 70 |
self.text_processor = TextProcessor()
|
| 71 |
self.current_language = "en" # Default language
|
| 72 |
self.last_generated_content = "" # Store the last generated content
|
| 73 |
self.content_with_timestamps = "" # Store content with timestamps
|
| 74 |
self.content_without_timestamps = "" # Store content without timestamps
|
|
|
|
| 75 |
|
| 76 |
def process_transcript(self,
|
| 77 |
language: str,
|
|
@@ -100,9 +107,11 @@ class TranscriptTransformerApp:
|
|
| 100 |
Returns:
|
| 101 |
str: Generated teaching transcript
|
| 102 |
"""
|
|
|
|
| 103 |
try:
|
| 104 |
# Force enable Gemini if thinking model is selected
|
| 105 |
if use_thinking_model:
|
|
|
|
| 106 |
use_gemini = True
|
| 107 |
|
| 108 |
self.transformer = TranscriptTransformer(
|
|
@@ -144,32 +153,43 @@ class TranscriptTransformerApp:
|
|
| 144 |
|
| 145 |
# Store the generated content
|
| 146 |
self.content_with_timestamps = lecture_transcript
|
|
|
|
| 147 |
|
| 148 |
# Create a version without timestamps
|
| 149 |
self.content_without_timestamps = self.remove_timestamps(lecture_transcript)
|
|
|
|
| 150 |
|
| 151 |
# Default: show content with timestamps
|
| 152 |
self.last_generated_content = lecture_transcript
|
| 153 |
|
|
|
|
| 154 |
return lecture_transcript
|
| 155 |
|
| 156 |
except Exception as e:
|
|
|
|
| 157 |
return f"{TRANSLATIONS[language]['error_prefix']}{str(e)}"
|
| 158 |
|
| 159 |
def remove_timestamps(self, text):
|
| 160 |
"""Remove all timestamps (e.g., [00:00]) from the text"""
|
|
|
|
| 161 |
# Regex to match the timestamp pattern [MM:SS] or [HH:MM:SS]
|
| 162 |
-
|
|
|
|
|
|
|
| 163 |
|
| 164 |
def toggle_timestamps(self, show_timestamps):
|
| 165 |
"""Toggle visibility of timestamps in output"""
|
|
|
|
| 166 |
if show_timestamps:
|
|
|
|
| 167 |
return self.content_with_timestamps
|
| 168 |
else:
|
|
|
|
| 169 |
return self.content_without_timestamps
|
| 170 |
|
| 171 |
def update_ui_language(self, language):
|
| 172 |
"""Update UI elements based on selected language"""
|
|
|
|
| 173 |
self.current_language = language
|
| 174 |
|
| 175 |
translations = TRANSLATIONS[language]
|
|
@@ -191,11 +211,14 @@ class TranscriptTransformerApp:
|
|
| 191 |
translations["submit_button"],
|
| 192 |
translations["output_label"]
|
| 193 |
]
|
|
|
|
| 194 |
|
| 195 |
def launch(self):
|
| 196 |
"""Launch the Gradio interface"""
|
|
|
|
| 197 |
# Get the path to the example PDF
|
| 198 |
example_pdf = os.path.join(os.path.dirname(os.path.dirname(__file__)), "data", "sample2.pdf")
|
|
|
|
| 199 |
|
| 200 |
with gr.Blocks(title=TRANSLATIONS["en"]["title"]) as interface:
|
| 201 |
# Header with title and language selector side by side
|
|
@@ -301,10 +324,14 @@ class TranscriptTransformerApp:
|
|
| 301 |
|
| 302 |
# Get language code from display value
|
| 303 |
def get_language_code(language_display):
|
| 304 |
-
|
|
|
|
|
|
|
|
|
|
| 305 |
|
| 306 |
# Update UI elements when language changes
|
| 307 |
def update_ui_with_display(language_display):
|
|
|
|
| 308 |
language = get_language_code(language_display)
|
| 309 |
self.current_language = language
|
| 310 |
|
|
@@ -325,6 +352,7 @@ class TranscriptTransformerApp:
|
|
| 325 |
gr.update(label=translations["output_label"]),
|
| 326 |
gr.update(label=translations["show_timestamps"])
|
| 327 |
]
|
|
|
|
| 328 |
|
| 329 |
input_type.change(
|
| 330 |
fn=lambda lang_display, choice: update_input_visibility(lang_display, choice),
|
|
@@ -371,13 +399,21 @@ class TranscriptTransformerApp:
|
|
| 371 |
)
|
| 372 |
|
| 373 |
# Example for PDF input
|
|
|
|
| 374 |
gr.Examples(
|
| 375 |
examples=[[example_pdf, "", "", 30, True, True]],
|
| 376 |
inputs=[file_input, text_input, initial_prompt, target_duration, include_examples, use_thinking_model]
|
| 377 |
)
|
|
|
|
| 378 |
|
|
|
|
|
|
|
|
|
|
| 379 |
interface.launch(share=True)
|
|
|
|
| 380 |
|
| 381 |
if __name__ == "__main__":
|
|
|
|
| 382 |
app = TranscriptTransformerApp()
|
| 383 |
-
app.launch()
|
|
|
|
|
|
| 1 |
import os
|
| 2 |
import gradio as gr
|
| 3 |
import re
|
| 4 |
+
import logging # Added for debugging
|
| 5 |
from dotenv import load_dotenv
|
| 6 |
from src.core.transformer import TranscriptTransformer
|
| 7 |
from src.utils.pdf_processor import PDFProcessor
|
| 8 |
from src.utils.text_processor import TextProcessor
|
| 9 |
|
| 10 |
+
# Set up basic logging
|
| 11 |
+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
| 12 |
+
|
| 13 |
load_dotenv()
|
| 14 |
+
logging.info("Environment variables loaded.")
|
| 15 |
|
| 16 |
# Translations dictionary for UI elements
|
| 17 |
TRANSLATIONS = {
|
|
|
|
| 71 |
|
| 72 |
class TranscriptTransformerApp:
|
| 73 |
def __init__(self):
|
| 74 |
+
logging.info("Initializing TranscriptTransformerApp...")
|
| 75 |
self.pdf_processor = PDFProcessor()
|
| 76 |
self.text_processor = TextProcessor()
|
| 77 |
self.current_language = "en" # Default language
|
| 78 |
self.last_generated_content = "" # Store the last generated content
|
| 79 |
self.content_with_timestamps = "" # Store content with timestamps
|
| 80 |
self.content_without_timestamps = "" # Store content without timestamps
|
| 81 |
+
logging.info("TranscriptTransformerApp initialized.")
|
| 82 |
|
| 83 |
def process_transcript(self,
|
| 84 |
language: str,
|
|
|
|
| 107 |
Returns:
|
| 108 |
str: Generated teaching transcript
|
| 109 |
"""
|
| 110 |
+
logging.info(f"Processing transcript. Language: {language}, InputType: {input_type}, HasFile: {file_obj is not None}, HasText: {bool(raw_text_input)}, Duration: {target_duration}, Examples: {include_examples}, Gemini: {use_gemini}, ThinkingModel: {use_thinking_model}")
|
| 111 |
try:
|
| 112 |
# Force enable Gemini if thinking model is selected
|
| 113 |
if use_thinking_model:
|
| 114 |
+
logging.info("Thinking model selected, forcing use_gemini=True")
|
| 115 |
use_gemini = True
|
| 116 |
|
| 117 |
self.transformer = TranscriptTransformer(
|
|
|
|
| 153 |
|
| 154 |
# Store the generated content
|
| 155 |
self.content_with_timestamps = lecture_transcript
|
| 156 |
+
logging.info("Generated content stored (with timestamps).")
|
| 157 |
|
| 158 |
# Create a version without timestamps
|
| 159 |
self.content_without_timestamps = self.remove_timestamps(lecture_transcript)
|
| 160 |
+
logging.info("Generated content stored (without timestamps).")
|
| 161 |
|
| 162 |
# Default: show content with timestamps
|
| 163 |
self.last_generated_content = lecture_transcript
|
| 164 |
|
| 165 |
+
logging.info("Transcript processing successful.")
|
| 166 |
return lecture_transcript
|
| 167 |
|
| 168 |
except Exception as e:
|
| 169 |
+
logging.error(f"Error processing transcript: {e}", exc_info=True) # Log exception info
|
| 170 |
return f"{TRANSLATIONS[language]['error_prefix']}{str(e)}"
|
| 171 |
|
| 172 |
def remove_timestamps(self, text):
|
| 173 |
"""Remove all timestamps (e.g., [00:00]) from the text"""
|
| 174 |
+
logging.info("Removing timestamps...")
|
| 175 |
# Regex to match the timestamp pattern [MM:SS] or [HH:MM:SS]
|
| 176 |
+
result = re.sub(r'\[\d{1,2}:\d{2}(:\d{2})?\]', '', text)
|
| 177 |
+
logging.info("Timestamps removed.")
|
| 178 |
+
return result
|
| 179 |
|
| 180 |
def toggle_timestamps(self, show_timestamps):
|
| 181 |
"""Toggle visibility of timestamps in output"""
|
| 182 |
+
logging.info(f"Toggling timestamps visibility. Show: {show_timestamps}")
|
| 183 |
if show_timestamps:
|
| 184 |
+
logging.info("Returning content WITH timestamps.")
|
| 185 |
return self.content_with_timestamps
|
| 186 |
else:
|
| 187 |
+
logging.info("Returning content WITHOUT timestamps.")
|
| 188 |
return self.content_without_timestamps
|
| 189 |
|
| 190 |
def update_ui_language(self, language):
|
| 191 |
"""Update UI elements based on selected language"""
|
| 192 |
+
logging.info(f"Updating UI language to: {language}")
|
| 193 |
self.current_language = language
|
| 194 |
|
| 195 |
translations = TRANSLATIONS[language]
|
|
|
|
| 211 |
translations["submit_button"],
|
| 212 |
translations["output_label"]
|
| 213 |
]
|
| 214 |
+
logging.info("UI language updated.")
|
| 215 |
|
| 216 |
def launch(self):
|
| 217 |
"""Launch the Gradio interface"""
|
| 218 |
+
logging.info("Configuring Gradio interface...")
|
| 219 |
# Get the path to the example PDF
|
| 220 |
example_pdf = os.path.join(os.path.dirname(os.path.dirname(__file__)), "data", "sample2.pdf")
|
| 221 |
+
logging.info(f"Example PDF path: {example_pdf}")
|
| 222 |
|
| 223 |
with gr.Blocks(title=TRANSLATIONS["en"]["title"]) as interface:
|
| 224 |
# Header with title and language selector side by side
|
|
|
|
| 324 |
|
| 325 |
# Get language code from display value
|
| 326 |
def get_language_code(language_display):
|
| 327 |
+
logging.info(f"Getting language code for display value: {language_display}")
|
| 328 |
+
code = lang_map.get(language_display, "en")
|
| 329 |
+
logging.info(f"Language code: {code}")
|
| 330 |
+
return code
|
| 331 |
|
| 332 |
# Update UI elements when language changes
|
| 333 |
def update_ui_with_display(language_display):
|
| 334 |
+
logging.info(f"Update UI triggered for language: {language_display}")
|
| 335 |
language = get_language_code(language_display)
|
| 336 |
self.current_language = language
|
| 337 |
|
|
|
|
| 352 |
gr.update(label=translations["output_label"]),
|
| 353 |
gr.update(label=translations["show_timestamps"])
|
| 354 |
]
|
| 355 |
+
logging.info("UI elements update values prepared.")
|
| 356 |
|
| 357 |
input_type.change(
|
| 358 |
fn=lambda lang_display, choice: update_input_visibility(lang_display, choice),
|
|
|
|
| 399 |
)
|
| 400 |
|
| 401 |
# Example for PDF input
|
| 402 |
+
logging.info("Setting up Gradio Examples...")
|
| 403 |
gr.Examples(
|
| 404 |
examples=[[example_pdf, "", "", 30, True, True]],
|
| 405 |
inputs=[file_input, text_input, initial_prompt, target_duration, include_examples, use_thinking_model]
|
| 406 |
)
|
| 407 |
+
logging.info("Gradio Examples configured.")
|
| 408 |
|
| 409 |
+
logging.info("Launching Gradio interface...")
|
| 410 |
+
# Note: Setting share=True is not recommended/supported in Spaces, but kept for consistency with original code
|
| 411 |
+
# It might generate a warning, which is expected.
|
| 412 |
interface.launch(share=True)
|
| 413 |
+
logging.info("Gradio interface launched.")
|
| 414 |
|
| 415 |
if __name__ == "__main__":
|
| 416 |
+
logging.info("Starting application...")
|
| 417 |
app = TranscriptTransformerApp()
|
| 418 |
+
app.launch()
|
| 419 |
+
logging.info("Application finished.")
|