Machine_learning / pages /5_Structured_data.py
Harika22's picture
Update pages/5_Structured_data.py
379cde1 verified
import streamlit as st
import pandas as pd
st.markdown(
"""
<style>
/* General page settings */
body {
background-color: #f0f0f5; /* Light gray background */
color: #333333; /* Dark text for good contrast */
font-family: 'Arial', sans-serif; /* Clean, modern font */
}
/* Title and Header Styling */
h1, h2, h3 {
color: black;
font-weight: bold;
}
/* Style for subheaders */
h3 {
color: red;
font-family: 'Roboto', sans-serif;
font-weight: 500;
margin-top: 20px;
}
.custom-subheader {
color: #00FFFF;
font-family: 'Roboto', sans-serif;
font-weight: 600;
margin-bottom: 15px;
}
/* Paragraph styling */
p {
font-family: 'Georgia', serif;
line-height: 1.8;
color: black;
margin-bottom: 20px;
}
/* List styling with checkmark bullets */
.icon-bullet {
list-style-type: none;
padding-left: 20px;
}
.icon-bullet li {
font-family: 'Georgia', serif;
font-size: 1.1em;
margin-bottom: 10px;
color: #FFFFF0;
}
.icon-bullet li::before {
content: "◆";
padding-right: 10px;
color: #b3b3ff;
}
/* Sidebar styling */
.sidebar .sidebar-content {
background-color: #ffffff;
border-radius: 10px;
padding: 15px;
}
.sidebar h2 {
color: #495057;
}
/* Custom button style */
.streamlit-button {
background-color: #00FFFF;
color: #000000;
font-weight: bold;
}
</style>
""", unsafe_allow_html=True)
st.title("📂Handling Excel files📂")
st.markdown(''' - Excel is a widely used software application for organizing,
storing, and analyzing data in tabular format.
- It is a spreadsheet tool that allows users to work with rows, columns, and cells to manage numerical or textual data.
- Excel files are typically saved with extensions like .xls or .xlsx.''')
st.header('**How to Read These Files:**')
st.subheader('''**Using Python Libraries:**''')
st.code('''
import pandas as pd
# Reading an Excel file
df = pd.read_excel('file.xlsx')
print(df.head())''')
st.header('**Issues in Excel:**')
st.markdown('''
1. **File Format Issues:**
- `.xls` and `.xlsx` are different formats.
2. **Corrupted Files:**
- Files may get corrupted during transfer or storage, making them unreadable.
3. **Encoding Issues:**
- Data with special characters or non-`UTF-8` encoding can cause errors.''')
st.write('**Solution:**')
st.code('''
df = pd.read_excel('file.xlsx', encoding='utf-8')
''')
st.markdown('''
4. **Missing Values:**
- Cells with missing or `NaN values` may disrupt data processing.''')
st.write('**Solution:**')
st.code('''
df.fillna(0, inplace=True)
df.dropna(inplace=True)
''')
st.markdown('''
5. **Large File Size:**
- Handling very large Excel files can result in memory issues.''')
st.write('**Solution:**')
st.code('''
chunks = pd.read_excel('large_file.xlsx', chunksize=10000)
for chunk in chunks:
print(chunk)
''')
st.markdown('''
6. **Multiple Sheets:**
- Huge files may have multiple sheets, making it harder to extract relevant data.''')
st.write('**Solution:**')
st.code('''
df = pd.read_excel('file.xlsx', sheet_name=[0,1,2])
''')