rogeliorichman commited on
Commit
4403ebb
·
verified ·
1 Parent(s): d4af98c

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +32 -10
  2. requirements.txt +3 -2
  3. src/app.py +39 -3
README.md CHANGED
@@ -4,7 +4,7 @@ app_file: src/app.py
4
  sdk: gradio
5
  sdk_version: 5.13.1
6
  ---
7
- # 🎓 AI Script Generator
8
 
9
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
10
  [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
@@ -12,13 +12,31 @@ sdk_version: 5.13.1
12
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)
13
  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/rogeliorichman/AI_Script_Generator)
14
 
15
- > Transform transcripts and PDFs into timed, structured teaching scripts using AI
16
 
17
- AI Script Generator is an advanced AI system that converts PDF transcripts, raw text, and conversational content into well-structured teaching scripts. It seamlessly processes inputs, extracting and analyzing the content to create organized, pedagogically sound scripts with time markers. Designed for educators, students, content creators, and anyone looking to transform information into clear explanations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## 🔗 Live Demo
20
 
21
- Try it out: [AI Script Generator on Hugging Face Spaces](https://huggingface.co/spaces/rogeliorichman/AI_Script_Generator)
22
 
23
  ## ✨ Features
24
 
@@ -30,6 +48,8 @@ Try it out: [AI Script Generator on Hugging Face Spaces](https://huggingface.co/
30
  - ⏱️ Time-marked sections for pacing
31
  - 🌐 Multilingual interface (English/Spanish) with flag selector
32
  - 🌍 Generation in ANY language through the guiding prompt (not limited to UI languages)
 
 
33
 
34
  ## Output Format
35
 
@@ -189,12 +209,15 @@ Our system uses a sophisticated multi-stage prompting approach:
189
 
190
  ### Architecture
191
 
192
- The system follows a modular design:
 
 
 
 
 
 
193
 
194
- - 📄 PDF/text processing module
195
- - 🔍 Text analysis component
196
- - 🤖 AI integration layer
197
- - 📝 Output formatting system
198
 
199
  ## 🤝 Contributing
200
 
@@ -214,7 +237,6 @@ Distributed under the MIT License. See `LICENSE` for more information.
214
 
215
  ## 🌟 Acknowledgments
216
 
217
- - Thanks to all contributors who have helped shape AI Script Generator
218
  - Special thanks to the Gemini and OpenAI teams for their amazing APIs
219
  - Inspired by educators and communicators worldwide who make learning engaging
220
 
 
4
  sdk: gradio
5
  sdk_version: 5.13.1
6
  ---
7
+ # 🎓 AI Agent Script Builder
8
 
9
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
10
  [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
 
12
  [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)
13
  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/rogeliorichman/AI_Script_Generator)
14
 
15
+ > Transform transcripts and PDFs into timed, structured teaching scripts using an autonomous AI agent
16
 
17
+ AI Agent Script Builder is an advanced autonomous agent that converts PDF transcripts, raw text, and conversational content into well-structured teaching scripts. It seamlessly processes inputs, extracting and analyzing the content to create organized, pedagogically scripts with time markers. Designed for educators, students, content creators, and anyone looking to transform information into clear explanations.
18
+
19
+ ## 🤖 AI Agent Architecture
20
+
21
+ AI Agent Script Builder functions as a **specialized AI agent** that autonomously processes and transforms content with minimal human intervention:
22
+
23
+ ### Agent Capabilities
24
+ - **Autonomous Processing**: Independently analyzes content, determines structure, and generates complete scripts
25
+ - **Decision Making**: Intelligently allocates time, prioritizes topics, and structures content based on input analysis
26
+ - **Contextual Adaptation**: Adjusts to different languages, styles, and requirements through guiding prompts
27
+ - **Obstacle Management**: Implements progressive retry strategies when facing API quota limitations
28
+ - **Goal-Oriented Operation**: Consistently works toward transforming unstructured information into coherent educational scripts
29
+
30
+ ### Agent Limitations
31
+ - **Domain Specificity**: Specialized for educational script generation rather than general-purpose tasks
32
+ - **External API Dependency**: Relies on third-party language models (Gemini/OpenAI) for core reasoning
33
+ - **No Continuous Learning**: Does not improve through experience or previous interactions
34
+
35
+ This architecture enables the system to function autonomously within its specialized domain while maintaining high-quality output and resilience to common obstacles.
36
 
37
  ## 🔗 Live Demo
38
 
39
+ Try it out: [AI Agent Script Builder on Hugging Face Spaces](https://huggingface.co/spaces/rogeliorichman/AI_Script_Generator)
40
 
41
  ## ✨ Features
42
 
 
48
  - ⏱️ Time-marked sections for pacing
49
  - 🌐 Multilingual interface (English/Spanish) with flag selector
50
  - 🌍 Generation in ANY language through the guiding prompt (not limited to UI languages)
51
+ - 🧠 Autonomous decision-making for content organization and pacing
52
+ - 🛡️ Self-healing capabilities with progressive retry strategies for API limitations
53
 
54
  ## Output Format
55
 
 
209
 
210
  ### Architecture
211
 
212
+ The system follows a modular agent-based design:
213
+
214
+ - 📄 PDF/text processing module (Perception)
215
+ - 🔍 Text analysis component (Cognition)
216
+ - 🤖 AI integration layer (Decision-making)
217
+ - 📝 Output formatting system (Action)
218
+ - 🔄 Error handling system (Self-correction)
219
 
220
+ This agent architecture enables autonomous processing from raw input to final output with built-in adaptation to errors and limitations.
 
 
 
221
 
222
  ## 🤝 Contributing
223
 
 
237
 
238
  ## 🌟 Acknowledgments
239
 
 
240
  - Special thanks to the Gemini and OpenAI teams for their amazing APIs
241
  - Inspired by educators and communicators worldwide who make learning engaging
242
 
requirements.txt CHANGED
@@ -1,4 +1,4 @@
1
- gradio>=4.0.0
2
  transformers>=4.30.0
3
  torch>=2.0.0
4
  pypdf2>=3.0.0
@@ -6,4 +6,5 @@ python-dotenv>=0.19.0
6
  numpy>=1.21.0
7
  tqdm>=4.65.0
8
  openai>=1.0.0
9
- tiktoken>=0.5.0
 
 
1
+ gradio==4.26.0
2
  transformers>=4.30.0
3
  torch>=2.0.0
4
  pypdf2>=3.0.0
 
6
  numpy>=1.21.0
7
  tqdm>=4.65.0
8
  openai>=1.0.0
9
+ tiktoken>=0.5.0
10
+ fastapi<0.110.0
src/app.py CHANGED
@@ -1,12 +1,17 @@
1
  import os
2
  import gradio as gr
3
  import re
 
4
  from dotenv import load_dotenv
5
  from src.core.transformer import TranscriptTransformer
6
  from src.utils.pdf_processor import PDFProcessor
7
  from src.utils.text_processor import TextProcessor
8
 
 
 
 
9
  load_dotenv()
 
10
 
11
  # Translations dictionary for UI elements
12
  TRANSLATIONS = {
@@ -66,12 +71,14 @@ LANGUAGE_PROMPTS = {
66
 
67
  class TranscriptTransformerApp:
68
  def __init__(self):
 
69
  self.pdf_processor = PDFProcessor()
70
  self.text_processor = TextProcessor()
71
  self.current_language = "en" # Default language
72
  self.last_generated_content = "" # Store the last generated content
73
  self.content_with_timestamps = "" # Store content with timestamps
74
  self.content_without_timestamps = "" # Store content without timestamps
 
75
 
76
  def process_transcript(self,
77
  language: str,
@@ -100,9 +107,11 @@ class TranscriptTransformerApp:
100
  Returns:
101
  str: Generated teaching transcript
102
  """
 
103
  try:
104
  # Force enable Gemini if thinking model is selected
105
  if use_thinking_model:
 
106
  use_gemini = True
107
 
108
  self.transformer = TranscriptTransformer(
@@ -144,32 +153,43 @@ class TranscriptTransformerApp:
144
 
145
  # Store the generated content
146
  self.content_with_timestamps = lecture_transcript
 
147
 
148
  # Create a version without timestamps
149
  self.content_without_timestamps = self.remove_timestamps(lecture_transcript)
 
150
 
151
  # Default: show content with timestamps
152
  self.last_generated_content = lecture_transcript
153
 
 
154
  return lecture_transcript
155
 
156
  except Exception as e:
 
157
  return f"{TRANSLATIONS[language]['error_prefix']}{str(e)}"
158
 
159
  def remove_timestamps(self, text):
160
  """Remove all timestamps (e.g., [00:00]) from the text"""
 
161
  # Regex to match the timestamp pattern [MM:SS] or [HH:MM:SS]
162
- return re.sub(r'\[\d{1,2}:\d{2}(:\d{2})?\]', '', text)
 
 
163
 
164
  def toggle_timestamps(self, show_timestamps):
165
  """Toggle visibility of timestamps in output"""
 
166
  if show_timestamps:
 
167
  return self.content_with_timestamps
168
  else:
 
169
  return self.content_without_timestamps
170
 
171
  def update_ui_language(self, language):
172
  """Update UI elements based on selected language"""
 
173
  self.current_language = language
174
 
175
  translations = TRANSLATIONS[language]
@@ -191,11 +211,14 @@ class TranscriptTransformerApp:
191
  translations["submit_button"],
192
  translations["output_label"]
193
  ]
 
194
 
195
  def launch(self):
196
  """Launch the Gradio interface"""
 
197
  # Get the path to the example PDF
198
  example_pdf = os.path.join(os.path.dirname(os.path.dirname(__file__)), "data", "sample2.pdf")
 
199
 
200
  with gr.Blocks(title=TRANSLATIONS["en"]["title"]) as interface:
201
  # Header with title and language selector side by side
@@ -301,10 +324,14 @@ class TranscriptTransformerApp:
301
 
302
  # Get language code from display value
303
  def get_language_code(language_display):
304
- return lang_map.get(language_display, "en")
 
 
 
305
 
306
  # Update UI elements when language changes
307
  def update_ui_with_display(language_display):
 
308
  language = get_language_code(language_display)
309
  self.current_language = language
310
 
@@ -325,6 +352,7 @@ class TranscriptTransformerApp:
325
  gr.update(label=translations["output_label"]),
326
  gr.update(label=translations["show_timestamps"])
327
  ]
 
328
 
329
  input_type.change(
330
  fn=lambda lang_display, choice: update_input_visibility(lang_display, choice),
@@ -371,13 +399,21 @@ class TranscriptTransformerApp:
371
  )
372
 
373
  # Example for PDF input
 
374
  gr.Examples(
375
  examples=[[example_pdf, "", "", 30, True, True]],
376
  inputs=[file_input, text_input, initial_prompt, target_duration, include_examples, use_thinking_model]
377
  )
 
378
 
 
 
 
379
  interface.launch(share=True)
 
380
 
381
  if __name__ == "__main__":
 
382
  app = TranscriptTransformerApp()
383
- app.launch()
 
 
1
  import os
2
  import gradio as gr
3
  import re
4
+ import logging # Added for debugging
5
  from dotenv import load_dotenv
6
  from src.core.transformer import TranscriptTransformer
7
  from src.utils.pdf_processor import PDFProcessor
8
  from src.utils.text_processor import TextProcessor
9
 
10
+ # Set up basic logging
11
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
12
+
13
  load_dotenv()
14
+ logging.info("Environment variables loaded.")
15
 
16
  # Translations dictionary for UI elements
17
  TRANSLATIONS = {
 
71
 
72
  class TranscriptTransformerApp:
73
  def __init__(self):
74
+ logging.info("Initializing TranscriptTransformerApp...")
75
  self.pdf_processor = PDFProcessor()
76
  self.text_processor = TextProcessor()
77
  self.current_language = "en" # Default language
78
  self.last_generated_content = "" # Store the last generated content
79
  self.content_with_timestamps = "" # Store content with timestamps
80
  self.content_without_timestamps = "" # Store content without timestamps
81
+ logging.info("TranscriptTransformerApp initialized.")
82
 
83
  def process_transcript(self,
84
  language: str,
 
107
  Returns:
108
  str: Generated teaching transcript
109
  """
110
+ logging.info(f"Processing transcript. Language: {language}, InputType: {input_type}, HasFile: {file_obj is not None}, HasText: {bool(raw_text_input)}, Duration: {target_duration}, Examples: {include_examples}, Gemini: {use_gemini}, ThinkingModel: {use_thinking_model}")
111
  try:
112
  # Force enable Gemini if thinking model is selected
113
  if use_thinking_model:
114
+ logging.info("Thinking model selected, forcing use_gemini=True")
115
  use_gemini = True
116
 
117
  self.transformer = TranscriptTransformer(
 
153
 
154
  # Store the generated content
155
  self.content_with_timestamps = lecture_transcript
156
+ logging.info("Generated content stored (with timestamps).")
157
 
158
  # Create a version without timestamps
159
  self.content_without_timestamps = self.remove_timestamps(lecture_transcript)
160
+ logging.info("Generated content stored (without timestamps).")
161
 
162
  # Default: show content with timestamps
163
  self.last_generated_content = lecture_transcript
164
 
165
+ logging.info("Transcript processing successful.")
166
  return lecture_transcript
167
 
168
  except Exception as e:
169
+ logging.error(f"Error processing transcript: {e}", exc_info=True) # Log exception info
170
  return f"{TRANSLATIONS[language]['error_prefix']}{str(e)}"
171
 
172
  def remove_timestamps(self, text):
173
  """Remove all timestamps (e.g., [00:00]) from the text"""
174
+ logging.info("Removing timestamps...")
175
  # Regex to match the timestamp pattern [MM:SS] or [HH:MM:SS]
176
+ result = re.sub(r'\[\d{1,2}:\d{2}(:\d{2})?\]', '', text)
177
+ logging.info("Timestamps removed.")
178
+ return result
179
 
180
  def toggle_timestamps(self, show_timestamps):
181
  """Toggle visibility of timestamps in output"""
182
+ logging.info(f"Toggling timestamps visibility. Show: {show_timestamps}")
183
  if show_timestamps:
184
+ logging.info("Returning content WITH timestamps.")
185
  return self.content_with_timestamps
186
  else:
187
+ logging.info("Returning content WITHOUT timestamps.")
188
  return self.content_without_timestamps
189
 
190
  def update_ui_language(self, language):
191
  """Update UI elements based on selected language"""
192
+ logging.info(f"Updating UI language to: {language}")
193
  self.current_language = language
194
 
195
  translations = TRANSLATIONS[language]
 
211
  translations["submit_button"],
212
  translations["output_label"]
213
  ]
214
+ logging.info("UI language updated.")
215
 
216
  def launch(self):
217
  """Launch the Gradio interface"""
218
+ logging.info("Configuring Gradio interface...")
219
  # Get the path to the example PDF
220
  example_pdf = os.path.join(os.path.dirname(os.path.dirname(__file__)), "data", "sample2.pdf")
221
+ logging.info(f"Example PDF path: {example_pdf}")
222
 
223
  with gr.Blocks(title=TRANSLATIONS["en"]["title"]) as interface:
224
  # Header with title and language selector side by side
 
324
 
325
  # Get language code from display value
326
  def get_language_code(language_display):
327
+ logging.info(f"Getting language code for display value: {language_display}")
328
+ code = lang_map.get(language_display, "en")
329
+ logging.info(f"Language code: {code}")
330
+ return code
331
 
332
  # Update UI elements when language changes
333
  def update_ui_with_display(language_display):
334
+ logging.info(f"Update UI triggered for language: {language_display}")
335
  language = get_language_code(language_display)
336
  self.current_language = language
337
 
 
352
  gr.update(label=translations["output_label"]),
353
  gr.update(label=translations["show_timestamps"])
354
  ]
355
+ logging.info("UI elements update values prepared.")
356
 
357
  input_type.change(
358
  fn=lambda lang_display, choice: update_input_visibility(lang_display, choice),
 
399
  )
400
 
401
  # Example for PDF input
402
+ logging.info("Setting up Gradio Examples...")
403
  gr.Examples(
404
  examples=[[example_pdf, "", "", 30, True, True]],
405
  inputs=[file_input, text_input, initial_prompt, target_duration, include_examples, use_thinking_model]
406
  )
407
+ logging.info("Gradio Examples configured.")
408
 
409
+ logging.info("Launching Gradio interface...")
410
+ # Note: Setting share=True is not recommended/supported in Spaces, but kept for consistency with original code
411
+ # It might generate a warning, which is expected.
412
  interface.launch(share=True)
413
+ logging.info("Gradio interface launched.")
414
 
415
  if __name__ == "__main__":
416
+ logging.info("Starting application...")
417
  app = TranscriptTransformerApp()
418
+ app.launch()
419
+ logging.info("Application finished.")