Spaces:

hari3485
/

DiveIntoML

Sleeping

App Files Files Community

hari3485 commited on Dec 19, 2024

Commit

334b087

verified ·

1 Parent(s): ebd1124

Update pages/hari.py

Browse files

Files changed (1) hide show

pages/hari.py +40 -38

pages/hari.py CHANGED Viewed

@@ -230,75 +230,77 @@ def html_details_page():
     **HTML** (HyperText Markup Language) is used to structure web pages.
     - Semi-structured data with nested tags.
-    - Libraries like `BeautifulSoup` help parse and extract information.
     """)
     st.write("""
-    - **HTML (HyperText Markup Language)** is a semi-structured data format.
-    - HTML uses tags like `<table>`, `<tr>`, `<th>`, and `<td>` to structure tabular data.
-    - Unlike XML, HTML does not allow creating custom tags freely.
-    - Not all HTML content can be converted into dataframes, especially paragraph text or unstructured data.
-    - Typically, only table-related elements (`<table>`, `<tr>`, `<th>`, `<td>`) can be converted into dataframes.
     """)
-    # Reading HTML Files Section
-    st.header("Reading HTML Files into DataFrames")
-    st.write("**Reading HTML Files:**")
     st.code("""
     import pandas as pd
-    tables = pd.read_html(path_or_buffer)
     """, language="python")
     st.write("""
-    - **`pd.read_html(path_or_buffer)`** reads HTML files or websites containing tables.
-    - Extracts all tables and returns them as a list of dataframes.
     """)
-    st.write("**Accessing Specific Tables:**")
     st.code("""
-    # Accessing the first table from the list
     table = tables[0]
     """, language="python")
     st.write("""
-    - Each table is stored in the list by index.
-    - Use indexing to select the table you want to work with.
     """)
     st.write("**Limitations:**")
     st.write("""
-    - Not all websites or HTML files can be read, even if they have tables.
-    - Issues like authorization restrictions can prevent reading certain tables.
     """)
-    st.write("**Using the `match` Parameter:**")
     st.code("""
-    # Reading a specific table using the match parameter
-    tables = pd.read_html(path, match="keyword")
     """, language="python")
     st.write("""
-    - To locate specific tables, use `match="keyword"` while reading HTML.
-    - The `match` parameter searches for tables containing the specified keyword.
     """)
-    # Exporting DataFrames Section
     st.header("Exporting DataFrames to HTML")
-    st.write("**Exporting DataFrame to HTML:**")
     st.code("""
-    # Exporting a dataframe to an HTML file
     df.to_html("output.html")
     """, language="python")
     st.write("""
-    - Converts a dataframe into an HTML file.
-    - Saves the dataframe in an HTML-compatible table format at the specified path.
     """)
-    st.code('from bs4 import BeautifulSoup\nsoup = BeautifulSoup(open("file.html"))', language="python")
-    if st.button("Back to Home"):
-        st.session_state['page'] = "home"
 # Unstructured Data - Image Page
 def image_details_page():

     **HTML** (HyperText Markup Language) is used to structure web pages.
     - Semi-structured data with nested tags.
     """)
+    # App title
+    st.title("Working with HTML Data in Python")
+    # Section: HTML and DataFrames
+    st.header("HTML and DataFrames")
     st.write("""
+    - **HTML** stands for HyperText Markup Language and is a semi-structured format.
+    - HTML uses tags like `<table>`, `<tr>`, `<th>`, and `<td>` to show table data.
+    - Unlike XML, HTML doesn’t let you create any custom tags.
+    - Not all HTML can be changed into dataframes, especially plain text like paragraphs.
+    - Usually, only table-related tags (`<table>`, `<tr>`, `<th>`, `<td>`) can be converted into dataframes.
     """)
+    # Section: Reading HTML Files
+    st.write("**How to Read HTML Files:**")
     st.code("""
     import pandas as pd
+    tables = pd.read_html("path_or_url")
     """, language="python")
     st.write("""
+    - Use `pd.read_html()` to read tables from an HTML file or a website.
+    - This function collects all tables and gives them as a list of dataframes.
     """)
+    st.write("**How to Get Specific Tables:**")
     st.code("""
+    # Select the first table from the list
     table = tables[0]
     """, language="python")
     st.write("""
+    - The tables are stored as a list, and you can access them using their index number.
     """)
     st.write("**Limitations:**")
     st.write("""
+    - Some HTML files or websites cannot be read, even if they have tables.
+    - Issues like file permissions or restrictions may stop reading.
     """)
+    st.write("**Using `match` to Find Specific Tables:**")
     st.code("""
+    # Read a specific table by searching for a keyword
+    tables = pd.read_html("path_or_url", match="keyword")
     """, language="python")
     st.write("""
+    - The `match` parameter lets you find tables with specific keywords.
+    - This is useful to pick the right table when many are present.
     """)
+    # Section: Exporting DataFrames
     st.header("Exporting DataFrames to HTML")
+    st.write("**How to Export a DataFrame to HTML:**")
     st.code("""
+    # Save a dataframe as an HTML file
     df.to_html("output.html")
     """, language="python")
     st.write("""
+    - This converts your dataframe into an HTML file.
+    - You can save the HTML file at a specified location.
     """)
 # Unstructured Data - Image Page
 def image_details_page():