Spaces:

marcosremar2
/

docker_mineru

Sleeping

marcosremar2 commited on May 3, 2025

Commit

ed4cfc9

1 Parent(s): 41ee299

Update PDF to Markdown converter API with NVIDIA L4 support

Files changed (2) hide show

Dockerfile CHANGED Viewed

@@ -76,5 +76,5 @@ ENV MARKER_FONT_PATH=/home/user/.cache/marker_fonts
 EXPOSE 7860
 # Command to run the application with Gunicorn and Uvicorn workers
-# Start with 4 workers. Adjust based on monitoring L40S resources.
-CMD ["gunicorn", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "app.main:app", "--bind", "0.0.0.0:7860"]

 EXPOSE 7860
 # Command to run the application with Gunicorn and Uvicorn workers
+# Increased workers to 16 for L40S. Adjust based on monitoring.
+CMD ["gunicorn", "-w", "16", "-k", "uvicorn.workers.UvicornWorker", "app.main:app", "--bind", "0.0.0.0:7860"]

pdf_converter/convert_pdf_to_md.py CHANGED Viewed

@@ -28,9 +28,12 @@ def initialize_converter():
                 except Exception as e:
                     print(f"Error setting custom font path: {e}", file=sys.stderr)
-            # Create configuration, explicitly setting output format
-            # Potential optimization: Check if batch_multiplier or similar exists
-            config_parser = ConfigParser({'output_format': 'markdown'}) # Add batch_multiplier here if applicable
             # Load models
             # Potential optimization: Check if device mapping/multi-GPU is possible
@@ -45,7 +48,7 @@ def initialize_converter():
                 renderer=config_parser.get_renderer(),
                 llm_service=config_parser.get_llm_service()
             )
-            print("Marker models initialized successfully.")
         except Exception as e:
             print(f"Failed to initialize marker models: {e}", file=sys.stderr)
             _converter = None # Ensure it's None if init fails

                 except Exception as e:
                     print(f"Error setting custom font path: {e}", file=sys.stderr)
+            # Create configuration, explicitly setting output format and batch multiplier
+            # Increased batch_multiplier for potentially faster processing on L40S
+            config_parser = ConfigParser({
+                'output_format': 'markdown',
+                'batch_multiplier': 4 # Increased from default 2
+            })
             # Load models
             # Potential optimization: Check if device mapping/multi-GPU is possible
                 renderer=config_parser.get_renderer(),
                 llm_service=config_parser.get_llm_service()
             )
+            print("Marker models initialized successfully with batch_multiplier=4.")
         except Exception as e:
             print(f"Failed to initialize marker models: {e}", file=sys.stderr)
             _converter = None # Ensure it's None if init fails