Spaces:

Pulastya0
/

Data-Science-Agent

Running

Pulastya B commited on Dec 29, 2025

Commit

7d775b3

1 Parent(s): 2797314

Fix Phase 1 errors: schema extraction and message type handling

FIXES:
1. Schema extraction NoneType comparison error
- Added explicit None check before comparing unique_count
- Prevents '<' operator error on None vs int

2. ChatCompletionMessage AttributeError
- Messages list contains mix of dicts and Pydantic objects
- Updated token estimation to handle both types
- Uses isinstance() check and getattr() fallback

Both errors prevented workflow from completing after first tool execution.

Files changed (2) hide show

src/orchestrator.py +4 -1
src/utils/schema_extraction.py +5 -1

src/orchestrator.py CHANGED Viewed

@@ -1717,7 +1717,10 @@ You are a DOER. Complete workflows based on user intent."""
                     print(f"✂️  Pruned conversation (keeping last 4 exchanges, ~4K tokens saved)")
                 # 🔍 Token estimation and warning
-                estimated_tokens = sum(len(str(m.get('content', ''))) // 4 for m in messages)
                 if estimated_tokens > 8000:
                     # Emergency pruning - keep only last 2 exchanges
                     messages = [messages[0], messages[1]] + messages[-4:]

                     print(f"✂️  Pruned conversation (keeping last 4 exchanges, ~4K tokens saved)")
                 # 🔍 Token estimation and warning
+                estimated_tokens = sum(
+                    len(str(m.get('content', '') if isinstance(m, dict) else getattr(m, 'content', ''))) // 4
+                    for m in messages
+                )
                 if estimated_tokens > 8000:
                     # Emergency pruning - keep only last 2 exchanges
                     messages = [messages[0], messages[1]] + messages[-4:]

src/utils/schema_extraction.py CHANGED Viewed

@@ -74,7 +74,11 @@ def extract_schema_local(file_path: str, sample_rows: int = 5) -> Dict[str, Any]
         ]
         schema_info['categorical_columns'] = [
             col for col, info in schema_info['columns'].items()
-            if info['dtype'] in ['Utf8', 'String'] or (info.get('unique_count', 999999) < 50 and col not in schema_info['numeric_columns'])
         ]
         schema_info['datetime_columns'] = [
             col for col, info in schema_info['columns'].items()

         ]
         schema_info['categorical_columns'] = [
             col for col, info in schema_info['columns'].items()
+            if info['dtype'] in ['Utf8', 'String'] or (
+                info.get('unique_count') is not None and
+                info.get('unique_count') < 50 and
+                col not in schema_info['numeric_columns']
+            )
         ]
         schema_info['datetime_columns'] = [
             col for col, info in schema_info['columns'].items()