# Fix: Removed Hardcoded Patterns from Neuro-Symbolic VQA ## Problem Identified The `_detect_objects_with_clip()` method in `semantic_neurosymbolic_vqa.py` contained a **predefined list of object categories**, which is essentially pattern matching and defeats the purpose of a truly neuro-symbolic approach. ```python # ❌ OLD CODE - Hardcoded categories (pattern matching!) object_categories = [ "food", "soup", "noodles", "rice", "meat", "vegetable", "fruit", "bowl", "plate", "cup", "glass", "spoon", "fork", "knife", ... ] ``` This is **not acceptable** because: - It limits detection to predefined categories only - It's essentially pattern matching, not true neural understanding - It violates the neuro-symbolic principle of learning from data ## Solution Applied ### 1. Deprecated `_detect_objects_with_clip()` The method now returns an empty list and warns that it's deprecated: ```python # ✅ NEW CODE - No predefined lists! def _detect_objects_with_clip(self, image_features, image_path=None): """ NOTE: This method is deprecated in favor of using the VQA model directly from ensemble_vqa_app.py. """ print("⚠️ _detect_objects_with_clip is deprecated") print("→ Use VQA model's _detect_multiple_objects() instead") return [] ``` ### 2. Updated `answer_with_clip_features()` Now **requires** objects to be provided by the VQA model: ```python # ✅ Objects must come from VQA model, not predefined lists def answer_with_clip_features( self, image_features, question, image_path=None, detected_objects: List[str] = None # REQUIRED! ): if not detected_objects: print("⚠️ No objects provided - neuro-symbolic reasoning requires VQA-detected objects") return None ``` ### 3. Ensemble VQA Uses True VQA Detection The `ensemble_vqa_app.py` already uses `_detect_multiple_objects()` which: - Asks the VQA model **open-ended questions** like "What is this?" - Uses the model's learned knowledge, not predefined categories - Generates objects dynamically based on visual understanding ```python # ✅ TRUE NEURO-SYMBOLIC APPROACH detected_objects = self._detect_multiple_objects(image, model, top_k=5) # This asks VQA model: "What is this?", "What food is this?", etc. # NO predefined categories! ``` ## Result ✅ **Pure Neuro-Symbolic Pipeline**: 1. **VQA Model** detects objects using learned visual understanding (no predefined lists) 2. **Wikidata** provides factual knowledge about detected objects 3. **LLM** performs Chain-of-Thought reasoning on the facts 4. **No pattern matching** anywhere in the pipeline ## Files Modified - `semantic_neurosymbolic_vqa.py`: - Deprecated `_detect_objects_with_clip()` - Updated `answer_with_clip_features()` to require VQA-detected objects - Changed knowledge source from "CLIP + Wikidata" to "VQA + Wikidata" ## Verification The system now uses a **truly neuro-symbolic approach**: - ✅ No hardcoded object categories - ✅ No predefined patterns - ✅ Pure learned visual understanding from VQA model - ✅ Symbolic reasoning from Wikidata + LLM - ✅ Chain-of-Thought transparency