Harika22 commited on
Commit
6e25e3f
·
verified ·
1 Parent(s): 48ba83b

Update pages/6_Semi_structured_data.py

Browse files
Files changed (1) hide show
  1. pages/6_Semi_structured_data.py +117 -0
pages/6_Semi_structured_data.py CHANGED
@@ -157,3 +157,120 @@ if file_type == "CSV":
157
  ''')
158
 
159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  ''')
158
 
159
 
160
+ elif file_type == "XML":
161
+ st.title("XML")
162
+ st.markdown('''
163
+ - XML is an Extensible Markup Language
164
+ - In XML, we can define our own tags
165
+ - XML (Extensible Markup Language) is a flexible, text-based format used for storing and transporting structured data.
166
+ - It uses tags to define elements and attributes, making it both human-readable and machine-readable.
167
+ as **Extensible** Markup Language
168
+ ''')
169
+
170
+ # Example : XML Structure
171
+ st.subheader('**XML Structure**')
172
+ st.markdown('''
173
+ A simple XML file
174
+ ''')
175
+ st.code('''
176
+ <data>
177
+ <person>
178
+ <name>Harika</name>
179
+ <age>21</age>
180
+ <height>145</height>
181
+ </person>
182
+ <person>
183
+ <name>sreeja/name>
184
+ <age>22</age>
185
+ <height>153</height>
186
+ </person>
187
+ </data>
188
+ ''')
189
+
190
+ st.code('''
191
+ import pandas as pd
192
+
193
+ # Example: Reading a XML file
194
+ df = pd.read_xml('data.xml', xpath='/data/person')
195
+ print(df)
196
+ ''')
197
+
198
+ st.markdown('''
199
+ The output DataFrame will look like this:
200
+ | name | age | height |
201
+ |----------------|------------|------ |
202
+ | Harika | 21 | 145 |
203
+ | sreeja | 22 | 153 |
204
+ ''')
205
+
206
+
207
+ st.markdown('''
208
+ **`xpath` parameter**:
209
+ - Specifies the XML path to extract specific elements.
210
+ - For example:
211
+ - `xpath='/data/person'`: Extracts all `<person>` elements from `<data>`. ''')
212
+
213
+
214
+ # Example 2: Nested XML Structure
215
+ st.subheader('**Nested XML Structure**')
216
+ st.markdown('''
217
+ A more complex XML file with nested elements and attributes.
218
+ ''')
219
+ st.code('''
220
+ <company>
221
+ <department id="1" name="HR">
222
+ <employee>
223
+ <name>John Doe</name>
224
+ <position>Manager</position>
225
+ </employee>
226
+ <employee>
227
+ <name>Jane Smith</name>
228
+ <position>Assistant</position>
229
+ </employee>
230
+ </department>
231
+ <department id="2" name="Engineering">
232
+ <employee>
233
+ <name>Emily Johnson</name>
234
+ <position>Engineer</position>
235
+ </employee>
236
+ </department>
237
+ </company>
238
+ ''')
239
+
240
+ st.code('''
241
+ import pandas as pd
242
+
243
+ # Example: Reading a nested XML file
244
+ df = pd.read_xml(
245
+ 'nested.xml',
246
+ xpath='.//employee',
247
+ elem_cols=['name', 'position'],
248
+ attr_cols=['id', 'name']
249
+ )
250
+ print(df)
251
+ ''')
252
+
253
+ st.markdown('''
254
+ The output DataFrame will look like this:
255
+ | id | department name | name | position |
256
+ |----|-----------------|---------------|------------|
257
+ | 1 | HR | John Doe | Manager |
258
+ | 1 | HR | Jane Smith | Assistant |
259
+ | 2 | Engineering | Emily Johnson | Engineer |
260
+ ''')
261
+
262
+ st.markdown('''
263
+ 1. **`elem_cols` parameter**:
264
+ - Specifies the child tags (elements) you want to include in the DataFrame.
265
+ - Example:
266
+ - `elem_cols=['name', 'position']`: Extracts `<name>` and `<position>` from `<employee>` tags.
267
+
268
+ 2. **`attr_cols` parameter**:
269
+ - Specifies the attributes of the parent elements to include in the DataFrame.
270
+ - Example:
271
+ - `attr_cols=['id', 'name']`: Extracts the `id` and `name` attributes from the `<department>` tag.
272
+ ''')
273
+
274
+ st.markdown('''
275
+ By combining `xpath`, `elem_cols`, and `attr_cols`, you can efficiently parse complex XML files into structured DataFrames.
276
+ ''')