hari3485 commited on
Commit
a6e9046
·
verified ·
1 Parent(s): c597bee

Update pages/Data Collection.py

Browse files
Files changed (1) hide show
  1. pages/Data Collection.py +31 -8
pages/Data Collection.py CHANGED
@@ -117,12 +117,16 @@ df = pd.read_csv("file.csv", on_bad_lines="skip")
117
  df = pd.read_csv("file.csv", on_bad_lines="warn")
118
  """, language="python")
119
 
120
- st.subheader("c) Fixing Encoding Issues")
121
  st.markdown("""
122
  <ul style="font-family: Arial; line-height: 1.6;">
123
- <li>Files use different encodings, such as UTF-8, ASCII, or others.</li>
124
- <li>If the encoding of a file doesn't match the default (<code>utf-8</code>), you might encounter a <b>UnicodeDecodeError</b>.</li>
125
- <li>To fix this, try reading the file with different encodings until you find the right one.</li>
 
 
 
 
126
  </ul>
127
  """, unsafe_allow_html=True)
128
 
@@ -143,12 +147,31 @@ for encoding in encodings_list:
143
  pass # Skip to the next encoding
144
  """, language="python")
145
 
146
- st.subheader("d) Handling Large Files")
147
  st.markdown("""
148
  <ul style="font-family: Arial; line-height: 1.6;">
149
- <li>Large files can cause memory issues because the entire file is loaded into RAM.</li>
150
- <li>Use the <code>chunksize</code> parameter in <code>pd.read_csv()</code> to read the file in smaller parts (chunks).</li>
151
- <li>Process each chunk one by one to avoid memory errors.</li>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  </ul>
153
  """, unsafe_allow_html=True)
154
 
 
117
  df = pd.read_csv("file.csv", on_bad_lines="warn")
118
  """, language="python")
119
 
120
+ st.subheader("c) Unicode Decode Error")
121
  st.markdown("""
122
  <ul style="font-family: Arial; line-height: 1.6;">
123
+ <li>Each character, when saved, is represented by a unique number (ASCII/Unicode code point).</li>
124
+ <li> ord("a") 97 , bin(97) 0b1100001 (Binary representation of 'a') </li>
125
+ <li>Characters are saved in memory using a specific encoding, typically UTF-8 by default.</li>
126
+ <li>Unicode Decode Error: Occurs when the system is unable to decode a file due to an incorrect or incompatible encoding.To solve this, you need to find the appropriate encoding for the file.</li>
127
+ <li>Python uses utf-8 by default for encoding, but files may be saved with other encodings.</li>
128
+ <li><code>Using the encodings module</code>: To explore the available encodings, you can import encodings in Python</li>
129
+ <li> There are <code>326</code> different encoding aliases available in Python, which can be accessed via <code>encodings.aliases.aliases.,/code></li>
130
  </ul>
131
  """, unsafe_allow_html=True)
132
 
 
147
  pass # Skip to the next encoding
148
  """, language="python")
149
 
150
+ st.subheader("Lookup Error:")
151
  st.markdown("""
152
  <ul style="font-family: Arial; line-height: 1.6;">
153
+ <li>Occurs if you try to access an encoding that is not available or supported.</li>
154
+ <li>Use a try-except block to handle it gracefully</li>
155
+ </ul>
156
+ """, unsafe_allow_html=True)
157
+
158
+ st.code('''
159
+ except LookupError:
160
+ print("Incorrect Encoding".format(y))
161
+ ''')
162
+
163
+ st.markdown("""
164
+ <ul style="font-family: Arial; line-height: 1.6;">
165
+ <li>After this when we get <code> Parse error </code> to solve that error add <code> on_badlines = "skip" parametre </code> .</li>
166
+ </ul>
167
+ """, unsafe_allow_html=True)
168
+
169
+ st.subheader("d) Handling Large CSV Files")
170
+ st.markdown("""
171
+ <ul style="font-family: Arial; line-height: 1.6;">
172
+ <li>When working with large CSV files, the file might not fit into memory, leading to a <code>MemoryError</code>.</li>
173
+ <li><code>Solution: Use chunksize to break the file into smaller chunks.</code></li>
174
+ <li>: To handle each chunk, you can iterate through the chunks and process them as needed.</li>
175
  </ul>
176
  """, unsafe_allow_html=True)
177