Spaces:

hari3485
/

DiveIntoML

Sleeping

App Files Files Community

hari3485 commited on Dec 12, 2024

Commit

a6e9046

verified ·

1 Parent(s): c597bee

Update pages/Data Collection.py

Browse files

Files changed (1) hide show

pages/Data Collection.py +31 -8

pages/Data Collection.py CHANGED Viewed

@@ -117,12 +117,16 @@ df = pd.read_csv("file.csv", on_bad_lines="skip")
 df = pd.read_csv("file.csv", on_bad_lines="warn")
     """, language="python")
-    st.subheader("c) Fixing Encoding Issues")
     st.markdown("""
     <ul style="font-family: Arial; line-height: 1.6;">
-        <li>Files use different encodings, such as UTF-8, ASCII, or others.</li>
-        <li>If the encoding of a file doesn't match the default (<code>utf-8</code>), you might encounter a <b>UnicodeDecodeError</b>.</li>
-        <li>To fix this, try reading the file with different encodings until you find the right one.</li>
     </ul>
     """, unsafe_allow_html=True)
@@ -143,12 +147,31 @@ for encoding in encodings_list:
         pass  # Skip to the next encoding
     """, language="python")
-    st.subheader("d) Handling Large Files")
     st.markdown("""
     <ul style="font-family: Arial; line-height: 1.6;">
-        <li>Large files can cause memory issues because the entire file is loaded into RAM.</li>
-        <li>Use the <code>chunksize</code> parameter in <code>pd.read_csv()</code> to read the file in smaller parts (chunks).</li>
-        <li>Process each chunk one by one to avoid memory errors.</li>
     </ul>
     """, unsafe_allow_html=True)

 df = pd.read_csv("file.csv", on_bad_lines="warn")
     """, language="python")
+    st.subheader("c) Unicode Decode Error")
     st.markdown("""
     <ul style="font-family: Arial; line-height: 1.6;">
+        <li>Each character, when saved, is represented by a unique number (ASCII/Unicode code point).</li>
+        <li> ord("a") → 97 , bin(97) → 0b1100001 (Binary representation of 'a') </li>
+        <li>Characters are saved in memory using a specific encoding, typically UTF-8 by default.</li>
+        <li>Unicode Decode Error: Occurs when the system is unable to decode a file due to an incorrect or incompatible encoding.To solve this, you need to find the appropriate encoding for the file.</li>
+        <li>Python uses utf-8 by default for encoding, but files may be saved with other encodings.</li>
+        <li><code>Using the encodings module</code>: To explore the available encodings, you can import encodings in Python</li>
+        <li> There are <code>326</code> different encoding aliases available in Python, which can be accessed via <code>encodings.aliases.aliases.,/code></li>
     </ul>
     """, unsafe_allow_html=True)
         pass  # Skip to the next encoding
     """, language="python")
+st.subheader("Lookup Error:")
     st.markdown("""
     <ul style="font-family: Arial; line-height: 1.6;">
+        <li>Occurs if you try to access an encoding that is not available or supported.</li>
+        <li>Use a try-except block to handle it gracefully</li>
+    </ul>
+    """, unsafe_allow_html=True)
+st.code('''
+          except LookupError:
+              print("Incorrect Encoding".format(y))
+              ''')
+st.markdown("""
+    <ul style="font-family: Arial; line-height: 1.6;">
+        <li>After this when we get <code> Parse error </code> to solve that error add <code> on_badlines = "skip" parametre </code> .</li>
+    </ul>
+    """, unsafe_allow_html=True)
+    st.subheader("d) Handling Large CSV Files")
+    st.markdown("""
+    <ul style="font-family: Arial; line-height: 1.6;">
+        <li>When working with large CSV files, the file might not fit into memory, leading to a <code>MemoryError</code>.</li>
+        <li><code>Solution: Use chunksize to break the file into smaller chunks.</code></li>
+        <li>: To handle each chunk, you can iterate through the chunks and process them as needed.</li>
     </ul>
     """, unsafe_allow_html=True)