Spaces:

tachiwin
/

multilingual_ocr

Running

App Files Files Community

Luis J Camargo commited on 26 days ago

Commit

9c692ff

1 Parent(s): f9963b2

warning and text display attempt

Browse files

Files changed (1) hide show

app.py +26 -6

app.py CHANGED Viewed

@@ -109,14 +109,32 @@ def inference(img):
         if not result or len(result) == 0:
             return "No text detected in the image."
-        # Extract text from parsing_res_list
         extracted_texts = []
-        for page in result:
-            if hasattr(page, 'parsing_res_list'):
-                for block in page.parsing_res_list:
-                    if hasattr(block, 'content') and block.content:
-                        extracted_texts.append(block.content)
         if not extracted_texts:
             return "No text could be extracted from the image."
@@ -142,6 +160,8 @@ the diverse character and glyph repertoire of Mexico's 68 indigenous languages.
 **How to use:** Simply upload an image containing text in any Mexican indigenous language, and the model will
 detect and recognize the text.
 🔗 [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
 '''

         if not result or len(result) == 0:
             return "No text detected in the image."
+        # Serialize to JSON first (this worked before)
+        import json
+        def serialize_for_json(obj):
+            """Convert non-serializable objects to strings"""
+            if isinstance(obj, dict):
+                return {k: serialize_for_json(v) for k, v in obj.items()}
+            elif isinstance(obj, list):
+                return [serialize_for_json(item) for item in obj]
+            elif hasattr(obj, '__dict__'):
+                return serialize_for_json(obj.__dict__)
+            elif isinstance(obj, (str, int, float, bool, type(None))):
+                return obj
+            else:
+                return str(type(obj))
+        serialized_result = serialize_for_json(result)
+        # Now extract text from the serialized structure
         extracted_texts = []
+        for page in serialized_result:
+            if isinstance(page, dict) and 'parsing_res_list' in page:
+                for block in page['parsing_res_list']:
+                    if isinstance(block, dict) and 'content' in block and block['content']:
+                        extracted_texts.append(block['content'])
         if not extracted_texts:
             return "No text could be extracted from the image."
 **How to use:** Simply upload an image containing text in any Mexican indigenous language, and the model will
 detect and recognize the text.
+### Warning: as this free demonstrator space uses only CPU, a small image could take up to 5 minutes, so be patient.
 🔗 [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
 '''