Spaces:
Sleeping
Sleeping
Update pages/6_Semi_structured_data.py
Browse files- pages/6_Semi_structured_data.py +117 -0
pages/6_Semi_structured_data.py
CHANGED
|
@@ -157,3 +157,120 @@ if file_type == "CSV":
|
|
| 157 |
''')
|
| 158 |
|
| 159 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
''')
|
| 158 |
|
| 159 |
|
| 160 |
+
elif file_type == "XML":
|
| 161 |
+
st.title("XML")
|
| 162 |
+
st.markdown('''
|
| 163 |
+
- XML is an Extensible Markup Language
|
| 164 |
+
- In XML, we can define our own tags
|
| 165 |
+
- XML (Extensible Markup Language) is a flexible, text-based format used for storing and transporting structured data.
|
| 166 |
+
- It uses tags to define elements and attributes, making it both human-readable and machine-readable.
|
| 167 |
+
as **Extensible** Markup Language
|
| 168 |
+
''')
|
| 169 |
+
|
| 170 |
+
# Example : XML Structure
|
| 171 |
+
st.subheader('**XML Structure**')
|
| 172 |
+
st.markdown('''
|
| 173 |
+
A simple XML file
|
| 174 |
+
''')
|
| 175 |
+
st.code('''
|
| 176 |
+
<data>
|
| 177 |
+
<person>
|
| 178 |
+
<name>Harika</name>
|
| 179 |
+
<age>21</age>
|
| 180 |
+
<height>145</height>
|
| 181 |
+
</person>
|
| 182 |
+
<person>
|
| 183 |
+
<name>sreeja/name>
|
| 184 |
+
<age>22</age>
|
| 185 |
+
<height>153</height>
|
| 186 |
+
</person>
|
| 187 |
+
</data>
|
| 188 |
+
''')
|
| 189 |
+
|
| 190 |
+
st.code('''
|
| 191 |
+
import pandas as pd
|
| 192 |
+
|
| 193 |
+
# Example: Reading a XML file
|
| 194 |
+
df = pd.read_xml('data.xml', xpath='/data/person')
|
| 195 |
+
print(df)
|
| 196 |
+
''')
|
| 197 |
+
|
| 198 |
+
st.markdown('''
|
| 199 |
+
The output DataFrame will look like this:
|
| 200 |
+
| name | age | height |
|
| 201 |
+
|----------------|------------|------ |
|
| 202 |
+
| Harika | 21 | 145 |
|
| 203 |
+
| sreeja | 22 | 153 |
|
| 204 |
+
''')
|
| 205 |
+
|
| 206 |
+
|
| 207 |
+
st.markdown('''
|
| 208 |
+
**`xpath` parameter**:
|
| 209 |
+
- Specifies the XML path to extract specific elements.
|
| 210 |
+
- For example:
|
| 211 |
+
- `xpath='/data/person'`: Extracts all `<person>` elements from `<data>`. ''')
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
# Example 2: Nested XML Structure
|
| 215 |
+
st.subheader('**Nested XML Structure**')
|
| 216 |
+
st.markdown('''
|
| 217 |
+
A more complex XML file with nested elements and attributes.
|
| 218 |
+
''')
|
| 219 |
+
st.code('''
|
| 220 |
+
<company>
|
| 221 |
+
<department id="1" name="HR">
|
| 222 |
+
<employee>
|
| 223 |
+
<name>John Doe</name>
|
| 224 |
+
<position>Manager</position>
|
| 225 |
+
</employee>
|
| 226 |
+
<employee>
|
| 227 |
+
<name>Jane Smith</name>
|
| 228 |
+
<position>Assistant</position>
|
| 229 |
+
</employee>
|
| 230 |
+
</department>
|
| 231 |
+
<department id="2" name="Engineering">
|
| 232 |
+
<employee>
|
| 233 |
+
<name>Emily Johnson</name>
|
| 234 |
+
<position>Engineer</position>
|
| 235 |
+
</employee>
|
| 236 |
+
</department>
|
| 237 |
+
</company>
|
| 238 |
+
''')
|
| 239 |
+
|
| 240 |
+
st.code('''
|
| 241 |
+
import pandas as pd
|
| 242 |
+
|
| 243 |
+
# Example: Reading a nested XML file
|
| 244 |
+
df = pd.read_xml(
|
| 245 |
+
'nested.xml',
|
| 246 |
+
xpath='.//employee',
|
| 247 |
+
elem_cols=['name', 'position'],
|
| 248 |
+
attr_cols=['id', 'name']
|
| 249 |
+
)
|
| 250 |
+
print(df)
|
| 251 |
+
''')
|
| 252 |
+
|
| 253 |
+
st.markdown('''
|
| 254 |
+
The output DataFrame will look like this:
|
| 255 |
+
| id | department name | name | position |
|
| 256 |
+
|----|-----------------|---------------|------------|
|
| 257 |
+
| 1 | HR | John Doe | Manager |
|
| 258 |
+
| 1 | HR | Jane Smith | Assistant |
|
| 259 |
+
| 2 | Engineering | Emily Johnson | Engineer |
|
| 260 |
+
''')
|
| 261 |
+
|
| 262 |
+
st.markdown('''
|
| 263 |
+
1. **`elem_cols` parameter**:
|
| 264 |
+
- Specifies the child tags (elements) you want to include in the DataFrame.
|
| 265 |
+
- Example:
|
| 266 |
+
- `elem_cols=['name', 'position']`: Extracts `<name>` and `<position>` from `<employee>` tags.
|
| 267 |
+
|
| 268 |
+
2. **`attr_cols` parameter**:
|
| 269 |
+
- Specifies the attributes of the parent elements to include in the DataFrame.
|
| 270 |
+
- Example:
|
| 271 |
+
- `attr_cols=['id', 'name']`: Extracts the `id` and `name` attributes from the `<department>` tag.
|
| 272 |
+
''')
|
| 273 |
+
|
| 274 |
+
st.markdown('''
|
| 275 |
+
By combining `xpath`, `elem_cols`, and `attr_cols`, you can efficiently parse complex XML files into structured DataFrames.
|
| 276 |
+
''')
|