Spaces:

Harika22
/

Machine_learning

Sleeping

App Files Files Community

Harika22 commited on Dec 13, 2024

Commit

6e25e3f

verified ·

1 Parent(s): 48ba83b

Update pages/6_Semi_structured_data.py

Browse files

Files changed (1) hide show

pages/6_Semi_structured_data.py +117 -0

pages/6_Semi_structured_data.py CHANGED Viewed

	@@ -157,3 +157,120 @@ if file_type == "CSV":
157	''')
158
159

     ''')
+elif file_type == "XML":
+    st.title("XML")
+    st.markdown('''
+        - XML is an Extensible Markup Language
+        - In XML, we can define our own tags
+        - XML (Extensible Markup Language) is a flexible, text-based format used for storing and transporting structured data.
+        - It uses tags to define elements and attributes, making it both human-readable and machine-readable.
+          as **Extensible** Markup Language
+            ''')
+    # Example : XML Structure
+    st.subheader('**XML Structure**')
+    st.markdown('''
+    A simple XML file
+    ''')
+    st.code('''
+    <data>
+        <person>
+            <name>Harika</name>
+            <age>21</age>
+            <height>145</height>
+        </person>
+        <person>
+            <name>sreeja/name>
+            <age>22</age>
+            <height>153</height>
+        </person>
+    </data>
+    ''')
+    st.code('''
+    import pandas as pd
+    # Example: Reading a XML file
+    df = pd.read_xml('data.xml', xpath='/data/person')
+    print(df)
+    ''')
+    st.markdown('''
+    The output DataFrame will look like this:
+    | name           | age        | height |
+    |----------------|------------|------  |
+    | Harika         |  21        |   145  |
+    | sreeja         |  22        |  153   |
+    ''')
+    st.markdown('''
+     **`xpath` parameter**:
+       - Specifies the XML path to extract specific elements.
+       - For example:
+         - `xpath='/data/person'`: Extracts all `<person>` elements from `<data>`. ''')
+    # Example 2: Nested XML Structure
+    st.subheader('**Nested XML Structure**')
+    st.markdown('''
+    A more complex XML file with nested elements and attributes.
+    ''')
+    st.code('''
+    <company>
+        <department id="1" name="HR">
+            <employee>
+                <name>John Doe</name>
+                <position>Manager</position>
+            </employee>
+            <employee>
+                <name>Jane Smith</name>
+                <position>Assistant</position>
+            </employee>
+        </department>
+        <department id="2" name="Engineering">
+            <employee>
+                <name>Emily Johnson</name>
+                <position>Engineer</position>
+            </employee>
+        </department>
+    </company>
+    ''')
+    st.code('''
+    import pandas as pd
+    # Example: Reading a nested XML file
+    df = pd.read_xml(
+        'nested.xml',
+        xpath='.//employee',
+        elem_cols=['name', 'position'],
+        attr_cols=['id', 'name']
+    )
+    print(df)
+    ''')
+    st.markdown('''
+    The output DataFrame will look like this:
+    | id | department name | name          | position   |
+    |----|-----------------|---------------|------------|
+    | 1  | HR              | John Doe      | Manager    |
+    | 1  | HR              | Jane Smith    | Assistant  |
+    | 2  | Engineering     | Emily Johnson | Engineer   |
+    ''')
+    st.markdown('''
+    1. **`elem_cols` parameter**:
+       - Specifies the child tags (elements) you want to include in the DataFrame.
+       - Example:
+         - `elem_cols=['name', 'position']`: Extracts `<name>` and `<position>` from `<employee>` tags.
+    2. **`attr_cols` parameter**:
+       - Specifies the attributes of the parent elements to include in the DataFrame.
+       - Example:
+         - `attr_cols=['id', 'name']`: Extracts the `id` and `name` attributes from the `<department>` tag.
+    ''')
+    st.markdown('''
+    By combining `xpath`, `elem_cols`, and `attr_cols`, you can efficiently parse complex XML files into structured DataFrames.
+    ''')