Xml-Cleaner / README.md
Suhasdev's picture
Refactor XML Cleaner with dependency injection and MinHash-based similarity matching
ba6e49b
---
title: Intelligent XML Cleaner
emoji: 🌳
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false
---
# Intelligent XML Cleaner & Visualizer
This tool helps Android developers and QA engineers clean stale accessibility node information from UI XML dumps.
## Features
* **Active-Based Sibling Pruning:** Intelligently removes XML nodes that are not visible on the screen based on OCR analysis or manual text input.
* **Flexible Text Input:** Optionally provide visible text manually, or use OCR for automatic extraction.
* **Dual OCR Strategy:** Choose between **EasyOCR** (Deep Learning based, high accuracy) or **Tesseract** (Fast, standard) as fallback when manual text is not provided.
* **Comprehensive Visualization:**
* **Tree View:** See the hierarchical structure of your XML before and after cleaning.
* **Screen View:** Visual confirmation of bounding boxes overlaid on the original screenshot.
## How to use
1. Upload the Screenshot of the app state.
2. Upload the corresponding XML dump (from `uiautomator`).
3. **(Optional)** Enter visible text from the screenshot manually (one per line or comma-separated). If left empty, OCR will be used automatically.
4. Select your preferred OCR engine (only used if visible text is not provided).
5. Click **Process**.
6. View the comparisons in the tabs and download the cleaned XML.
## Technical Details
This application uses a sophisticated pipeline:
1. **Text Extraction:** Uses provided visible text (if available) or extracts visible text from the image using OCR.
2. **LCA Calculation:** Finds the Lowest Common Ancestor of all active elements.
3. **Pruning:** Traverses upward from the Active LCA and prunes siblings that contain no visible text.