Phani1008 commited on
Commit
63d1f4e
·
verified ·
1 Parent(s): 0444a83

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +14 -75
app.py CHANGED
@@ -15,21 +15,21 @@ def show_home_page():
15
  )
16
 
17
  if st.button("NLP Terminologies"):
18
- st.query_params(page="terminologies")
19
  if st.button("One-Hot Vectorization"):
20
- st.query_params(page="one_hot")
21
  if st.button("Bag of Words"):
22
- st.query_params(page="bow")
23
  if st.button("TF-IDF Vectorizer"):
24
- st.query_params(page="tfidf")
25
  if st.button("Word2Vec"):
26
- st.query_params(page="word2vec")
27
  if st.button("FastText"):
28
- st.query_params(page="fasttext")
29
  if st.button("Tokenization"):
30
- st.query_params(page="tokenization")
31
  if st.button("Stop Words"):
32
- st.query_params(page="stop_words")
33
 
34
  def show_page(page):
35
  if page == "terminologies":
@@ -60,7 +60,6 @@ def show_page(page):
60
  - **Named Entity Recognition (NER)**: Identifying entities like names, locations, and organizations in text.
61
 
62
  - **Parsing**: Analyzing grammatical structure and relationships between words.
63
-
64
  """
65
  )
66
  elif page == "one_hot":
@@ -139,17 +138,6 @@ def show_page(page):
139
  - **Term Frequency (TF)**: Number of times a term appears in a document divided by total terms in the document.
140
  - **Inverse Document Frequency (IDF)**: Logarithm of total documents divided by the number of documents containing the term.
141
 
142
- #### Advantages:
143
- - Reduces the weight of common words.
144
- - Highlights unique and important words.
145
-
146
- #### Example:
147
- For the corpus:
148
- - Doc1: "NLP is amazing."
149
- - Doc2: "NLP is fun and amazing."
150
-
151
- TF-IDF highlights words like "fun" and "amazing" over commonly occurring words like "is".
152
-
153
  #### Applications:
154
  - Search engines, information retrieval, and document classification.
155
  """
@@ -166,19 +154,8 @@ def show_page(page):
166
  - **CBOW (Continuous Bag of Words)**: Predicts the target word from its context.
167
  - **Skip-gram**: Predicts the context from the target word.
168
 
169
- #### Advantages:
170
- - Captures semantic meaning (e.g., "king" - "man" + "woman" ≈ "queen").
171
- - Efficient for large datasets.
172
-
173
- #### Training Process:
174
- - Uses shallow neural networks.
175
- - Optimized using techniques like negative sampling.
176
-
177
  #### Applications:
178
  - Text classification, sentiment analysis, and recommendation systems.
179
-
180
- #### Limitations:
181
- - Requires significant computational resources.
182
  """
183
  )
184
  elif page == "fasttext":
@@ -189,19 +166,9 @@ def show_page(page):
189
 
190
  FastText is an extension of Word2Vec that represents words as a combination of character n-grams.
191
 
192
- #### Advantages:
193
- - Handles rare and out-of-vocabulary words.
194
- - Captures subword information (e.g., prefixes and suffixes).
195
-
196
- #### Example:
197
- The word "playing" might be represented by n-grams like "pla", "lay", "ayi", "ing".
198
-
199
  #### Applications:
200
  - Multilingual text processing.
201
  - Handling noisy and incomplete data.
202
-
203
- #### Limitations:
204
- - Higher computational cost compared to Word2Vec.
205
  """
206
  )
207
  elif page == "tokenization":
@@ -211,23 +178,6 @@ def show_page(page):
211
  ### Tokenization
212
 
213
  Tokenization is the process of breaking text into smaller units (tokens) such as words, phrases, or sentences.
214
-
215
- #### Types of Tokenization:
216
- - **Word Tokenization**: Splits text into words.
217
- - **Sentence Tokenization**: Splits text into sentences.
218
-
219
- #### Libraries for Tokenization:
220
- - NLTK, SpaCy, and Hugging Face Transformers.
221
-
222
- #### Example:
223
- Sentence: "NLP is exciting."
224
- - Word Tokens: ["NLP", "is", "exciting", "."]
225
-
226
- #### Applications:
227
- - Preprocessing for machine learning models.
228
-
229
- #### Challenges:
230
- - Handling complex text like abbreviations and multilingual data.
231
  """
232
  )
233
  elif page == "stop_words":
@@ -237,26 +187,15 @@ def show_page(page):
237
  ### Stop Words
238
 
239
  Stop words are commonly used words in a language that are often removed during text preprocessing.
240
-
241
- #### Examples of Stop Words:
242
- - English: "is", "the", "and", "in".
243
- - Spanish: "es", "el", "y", "en".
244
-
245
- #### Why Remove Stop Words?
246
- - To reduce noise in text data.
247
-
248
- #### Applications:
249
- - Sentiment analysis, text classification, and search engines.
250
-
251
- #### Challenges:
252
- - Some stop words might carry context-specific importance.
253
  """
254
  )
255
 
256
- query_params = st.query_params()
257
- page = query_params.get("page", ["home"])[0]
 
258
 
259
- if page == "home":
 
260
  show_home_page()
261
  else:
262
- show_page(page)
 
15
  )
16
 
17
  if st.button("NLP Terminologies"):
18
+ st.session_state["page"] = "terminologies"
19
  if st.button("One-Hot Vectorization"):
20
+ st.session_state["page"] = "one_hot"
21
  if st.button("Bag of Words"):
22
+ st.session_state["page"] = "bow"
23
  if st.button("TF-IDF Vectorizer"):
24
+ st.session_state["page"] = "tfidf"
25
  if st.button("Word2Vec"):
26
+ st.session_state["page"] = "word2vec"
27
  if st.button("FastText"):
28
+ st.session_state["page"] = "fasttext"
29
  if st.button("Tokenization"):
30
+ st.session_state["page"] = "tokenization"
31
  if st.button("Stop Words"):
32
+ st.session_state["page"] = "stop_words"
33
 
34
  def show_page(page):
35
  if page == "terminologies":
 
60
  - **Named Entity Recognition (NER)**: Identifying entities like names, locations, and organizations in text.
61
 
62
  - **Parsing**: Analyzing grammatical structure and relationships between words.
 
63
  """
64
  )
65
  elif page == "one_hot":
 
138
  - **Term Frequency (TF)**: Number of times a term appears in a document divided by total terms in the document.
139
  - **Inverse Document Frequency (IDF)**: Logarithm of total documents divided by the number of documents containing the term.
140
 
 
 
 
 
 
 
 
 
 
 
 
141
  #### Applications:
142
  - Search engines, information retrieval, and document classification.
143
  """
 
154
  - **CBOW (Continuous Bag of Words)**: Predicts the target word from its context.
155
  - **Skip-gram**: Predicts the context from the target word.
156
 
 
 
 
 
 
 
 
 
157
  #### Applications:
158
  - Text classification, sentiment analysis, and recommendation systems.
 
 
 
159
  """
160
  )
161
  elif page == "fasttext":
 
166
 
167
  FastText is an extension of Word2Vec that represents words as a combination of character n-grams.
168
 
 
 
 
 
 
 
 
169
  #### Applications:
170
  - Multilingual text processing.
171
  - Handling noisy and incomplete data.
 
 
 
172
  """
173
  )
174
  elif page == "tokenization":
 
178
  ### Tokenization
179
 
180
  Tokenization is the process of breaking text into smaller units (tokens) such as words, phrases, or sentences.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  """
182
  )
183
  elif page == "stop_words":
 
187
  ### Stop Words
188
 
189
  Stop words are commonly used words in a language that are often removed during text preprocessing.
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  """
191
  )
192
 
193
+ # Initialize session state for page navigation
194
+ if "page" not in st.session_state:
195
+ st.session_state["page"] = "home"
196
 
197
+ # Show appropriate page
198
+ if st.session_state["page"] == "home":
199
  show_home_page()
200
  else:
201
+ show_page(st.session_state["page"])