Spaces:
Sleeping
Sleeping
Update pages/Data Collection.py
Browse files- pages/Data Collection.py +31 -8
pages/Data Collection.py
CHANGED
|
@@ -117,12 +117,16 @@ df = pd.read_csv("file.csv", on_bad_lines="skip")
|
|
| 117 |
df = pd.read_csv("file.csv", on_bad_lines="warn")
|
| 118 |
""", language="python")
|
| 119 |
|
| 120 |
-
st.subheader("c)
|
| 121 |
st.markdown("""
|
| 122 |
<ul style="font-family: Arial; line-height: 1.6;">
|
| 123 |
-
<li>
|
| 124 |
-
<li>
|
| 125 |
-
<li>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
</ul>
|
| 127 |
""", unsafe_allow_html=True)
|
| 128 |
|
|
@@ -143,12 +147,31 @@ for encoding in encodings_list:
|
|
| 143 |
pass # Skip to the next encoding
|
| 144 |
""", language="python")
|
| 145 |
|
| 146 |
-
|
| 147 |
st.markdown("""
|
| 148 |
<ul style="font-family: Arial; line-height: 1.6;">
|
| 149 |
-
<li>
|
| 150 |
-
<li>Use
|
| 151 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
</ul>
|
| 153 |
""", unsafe_allow_html=True)
|
| 154 |
|
|
|
|
| 117 |
df = pd.read_csv("file.csv", on_bad_lines="warn")
|
| 118 |
""", language="python")
|
| 119 |
|
| 120 |
+
st.subheader("c) Unicode Decode Error")
|
| 121 |
st.markdown("""
|
| 122 |
<ul style="font-family: Arial; line-height: 1.6;">
|
| 123 |
+
<li>Each character, when saved, is represented by a unique number (ASCII/Unicode code point).</li>
|
| 124 |
+
<li> ord("a") → 97 , bin(97) → 0b1100001 (Binary representation of 'a') </li>
|
| 125 |
+
<li>Characters are saved in memory using a specific encoding, typically UTF-8 by default.</li>
|
| 126 |
+
<li>Unicode Decode Error: Occurs when the system is unable to decode a file due to an incorrect or incompatible encoding.To solve this, you need to find the appropriate encoding for the file.</li>
|
| 127 |
+
<li>Python uses utf-8 by default for encoding, but files may be saved with other encodings.</li>
|
| 128 |
+
<li><code>Using the encodings module</code>: To explore the available encodings, you can import encodings in Python</li>
|
| 129 |
+
<li> There are <code>326</code> different encoding aliases available in Python, which can be accessed via <code>encodings.aliases.aliases.,/code></li>
|
| 130 |
</ul>
|
| 131 |
""", unsafe_allow_html=True)
|
| 132 |
|
|
|
|
| 147 |
pass # Skip to the next encoding
|
| 148 |
""", language="python")
|
| 149 |
|
| 150 |
+
st.subheader("Lookup Error:")
|
| 151 |
st.markdown("""
|
| 152 |
<ul style="font-family: Arial; line-height: 1.6;">
|
| 153 |
+
<li>Occurs if you try to access an encoding that is not available or supported.</li>
|
| 154 |
+
<li>Use a try-except block to handle it gracefully</li>
|
| 155 |
+
</ul>
|
| 156 |
+
""", unsafe_allow_html=True)
|
| 157 |
+
|
| 158 |
+
st.code('''
|
| 159 |
+
except LookupError:
|
| 160 |
+
print("Incorrect Encoding".format(y))
|
| 161 |
+
''')
|
| 162 |
+
|
| 163 |
+
st.markdown("""
|
| 164 |
+
<ul style="font-family: Arial; line-height: 1.6;">
|
| 165 |
+
<li>After this when we get <code> Parse error </code> to solve that error add <code> on_badlines = "skip" parametre </code> .</li>
|
| 166 |
+
</ul>
|
| 167 |
+
""", unsafe_allow_html=True)
|
| 168 |
+
|
| 169 |
+
st.subheader("d) Handling Large CSV Files")
|
| 170 |
+
st.markdown("""
|
| 171 |
+
<ul style="font-family: Arial; line-height: 1.6;">
|
| 172 |
+
<li>When working with large CSV files, the file might not fit into memory, leading to a <code>MemoryError</code>.</li>
|
| 173 |
+
<li><code>Solution: Use chunksize to break the file into smaller chunks.</code></li>
|
| 174 |
+
<li>: To handle each chunk, you can iterate through the chunks and process them as needed.</li>
|
| 175 |
</ul>
|
| 176 |
""", unsafe_allow_html=True)
|
| 177 |
|