Yashvj123 commited on
Commit
6e51482
·
verified ·
1 Parent(s): 8d4b888

Upload Streamlit_Excel.ipynb

Browse files
Files changed (1) hide show
  1. pages/Streamlit_Excel.ipynb +186 -0
pages/Streamlit_Excel.ipynb ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": []
7
+ },
8
+ "kernelspec": {
9
+ "name": "python3",
10
+ "display_name": "Python 3"
11
+ },
12
+ "language_info": {
13
+ "name": "python"
14
+ }
15
+ },
16
+ "cells": [
17
+ {
18
+ "cell_type": "markdown",
19
+ "source": [
20
+ "## How to Read Excel Files"
21
+ ],
22
+ "metadata": {
23
+ "id": "ujfKzrKC_sSy"
24
+ }
25
+ },
26
+ {
27
+ "cell_type": "code",
28
+ "source": [
29
+ "import pandas as pd\n",
30
+ "\n",
31
+ "# Reading an Excel file\n",
32
+ "df = pd.read_excel('file.xlsx')\n",
33
+ "print(df.head())"
34
+ ],
35
+ "metadata": {
36
+ "id": "i2_veBWk_6yp"
37
+ },
38
+ "execution_count": null,
39
+ "outputs": []
40
+ },
41
+ {
42
+ "cell_type": "markdown",
43
+ "source": [
44
+ "## Issues we face in Excel"
45
+ ],
46
+ "metadata": {
47
+ "id": "lVFBSxUiAG-i"
48
+ }
49
+ },
50
+ {
51
+ "cell_type": "markdown",
52
+ "source": [
53
+ "1. **Encoding Issues:**\n",
54
+ "\n",
55
+ "- Data with special characters or non-`UTF-8` encoding can cause errors"
56
+ ],
57
+ "metadata": {
58
+ "id": "4fX5KAtPAHig"
59
+ }
60
+ },
61
+ {
62
+ "cell_type": "code",
63
+ "source": [
64
+ "df = pd.read_excel('file.xlsx', encoding='utf-8')"
65
+ ],
66
+ "metadata": {
67
+ "id": "vd8BXAVyAIEw"
68
+ },
69
+ "execution_count": null,
70
+ "outputs": []
71
+ },
72
+ {
73
+ "cell_type": "markdown",
74
+ "source": [
75
+ "2. **Missing Values:**\n",
76
+ "\n",
77
+ "- Cells with missing or `NaN values` may disrupt data processing"
78
+ ],
79
+ "metadata": {
80
+ "id": "hxfLn27pAIPy"
81
+ }
82
+ },
83
+ {
84
+ "cell_type": "code",
85
+ "source": [
86
+ "df.fillna(0, inplace=True) # Replace missing values with 0\n",
87
+ "df.dropna(inplace=True) # Drop rows with missing values"
88
+ ],
89
+ "metadata": {
90
+ "id": "HdcBGDTRAIZv"
91
+ },
92
+ "execution_count": null,
93
+ "outputs": []
94
+ },
95
+ {
96
+ "cell_type": "markdown",
97
+ "source": [
98
+ "3. **Large File Size:**\n",
99
+ "\n",
100
+ "- Handling very large Excel files can result in memory issues"
101
+ ],
102
+ "metadata": {
103
+ "id": "nUt0gpXIAIh6"
104
+ }
105
+ },
106
+ {
107
+ "cell_type": "code",
108
+ "source": [
109
+ "chunks = pd.read_excel('large_file.xlsx', chunksize=10000)\n",
110
+ "for chunk in chunks:\n",
111
+ " process(chunk)"
112
+ ],
113
+ "metadata": {
114
+ "id": "bauoWLAlBAbk"
115
+ },
116
+ "execution_count": null,
117
+ "outputs": []
118
+ },
119
+ {
120
+ "cell_type": "markdown",
121
+ "source": [
122
+ "4. **Multiple Sheets:**\n",
123
+ "\n",
124
+ "- Complex files may have multiple sheets, making it harder to extract relevant data"
125
+ ],
126
+ "metadata": {
127
+ "id": "e7nPI_QKBAuQ"
128
+ }
129
+ },
130
+ {
131
+ "cell_type": "code",
132
+ "source": [
133
+ "df = pd.read_excel('file.xlsx', sheet_name=[0,1,2])"
134
+ ],
135
+ "metadata": {
136
+ "id": "5TwUfTVoBA7h"
137
+ },
138
+ "execution_count": null,
139
+ "outputs": []
140
+ },
141
+ {
142
+ "cell_type": "markdown",
143
+ "source": [
144
+ "5. **Merging and Cleaning Data:**\n",
145
+ "\n",
146
+ "- Mismatched headers, different column formats, or duplicate entries can cause inconsistencies"
147
+ ],
148
+ "metadata": {
149
+ "id": "UvBM_-ucBBIO"
150
+ }
151
+ },
152
+ {
153
+ "cell_type": "code",
154
+ "source": [
155
+ "df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')"
156
+ ],
157
+ "metadata": {
158
+ "id": "puiRjHBwBBUS"
159
+ },
160
+ "execution_count": null,
161
+ "outputs": []
162
+ },
163
+ {
164
+ "cell_type": "markdown",
165
+ "source": [
166
+ "6. **Date Parsing Issues:**\n",
167
+ "\n",
168
+ "- Dates may be stored in different formats, leading to incorrect parsing"
169
+ ],
170
+ "metadata": {
171
+ "id": "JP_A3CJFBrfn"
172
+ }
173
+ },
174
+ {
175
+ "cell_type": "code",
176
+ "source": [
177
+ "df = pd.read_excel('file.xlsx', parse_dates=['date_column'])"
178
+ ],
179
+ "metadata": {
180
+ "id": "23NDfM21Brqc"
181
+ },
182
+ "execution_count": null,
183
+ "outputs": []
184
+ }
185
+ ]
186
+ }