Update app.py
Browse files
app.py
CHANGED
|
@@ -5,7 +5,7 @@ st.title("NLP Theory Blog")
|
|
| 5 |
|
| 6 |
# Sidebar for navigation
|
| 7 |
st.sidebar.title("Navigation")
|
| 8 |
-
pages = ["Introduction to NLP", "NLP
|
| 9 |
page = st.sidebar.radio("Go to:", pages)
|
| 10 |
|
| 11 |
# Content for each page
|
|
@@ -24,120 +24,191 @@ if page == "Introduction to NLP":
|
|
| 24 |
NLP combines computational linguistics with machine learning and deep learning techniques to process language.
|
| 25 |
""")
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
elif page == "NLP Techniques":
|
| 28 |
st.header("Common NLP Techniques")
|
|
|
|
|
|
|
| 29 |
st.write("""
|
| 30 |
-
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
4. **Part-of-Speech (POS) Tagging:** Identifying grammatical parts of speech in a text.
|
| 36 |
-
5. **Named Entity Recognition (NER):** Extracting named entities like people, organizations, and locations.
|
| 37 |
-
6. **Sentiment Analysis:** Determining the sentiment (positive, negative, neutral) of a text.
|
| 38 |
-
7. **Text Summarization:** Producing a summary of a longer text.
|
| 39 |
-
8. **Machine Translation:** Translating text from one language to another.
|
| 40 |
|
| 41 |
-
|
| 42 |
""")
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
""
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
Convert text data into numerical form for model consumption:
|
| 87 |
-
- Bag of Words (BoW)
|
| 88 |
-
- TF-IDF (Term Frequency-Inverse Document Frequency)
|
| 89 |
-
- Word embeddings (Word2Vec, GloVe)
|
| 90 |
-
- Contextual embeddings (BERT, GPT)
|
| 91 |
-
""")
|
| 92 |
-
|
| 93 |
-
elif life_cycle_page == "Modeling":
|
| 94 |
-
st.write("""
|
| 95 |
-
Train machine learning or deep learning models using the preprocessed text data:
|
| 96 |
-
- Supervised learning (e.g., Logistic Regression, SVM)
|
| 97 |
-
- Unsupervised learning (e.g., K-means clustering)
|
| 98 |
-
- Deep learning (e.g., RNNs, LSTMs, BERT)
|
| 99 |
-
""")
|
| 100 |
-
|
| 101 |
-
elif life_cycle_page == "Model Evaluation":
|
| 102 |
-
st.write("""
|
| 103 |
-
Evaluate the model's performance using metrics like:
|
| 104 |
-
- Accuracy
|
| 105 |
-
- Precision, Recall, F1-Score
|
| 106 |
-
- Confusion Matrix
|
| 107 |
-
- Cross-validation
|
| 108 |
-
""")
|
| 109 |
-
|
| 110 |
-
elif life_cycle_page == "Model Optimization":
|
| 111 |
-
st.write("""
|
| 112 |
-
Improve model performance by:
|
| 113 |
-
- Hyperparameter tuning (e.g., grid search)
|
| 114 |
-
- Regularization (e.g., L2 regularization, dropout)
|
| 115 |
-
- Ensemble methods (e.g., Random Forest, XGBoost)
|
| 116 |
-
""")
|
| 117 |
-
|
| 118 |
-
elif life_cycle_page == "Model Deployment":
|
| 119 |
-
st.write("""
|
| 120 |
-
Deploy the trained model into production:
|
| 121 |
-
- Expose the model via APIs (using Flask or FastAPI)
|
| 122 |
-
- Integrate with applications (e.g., chatbots, recommendation systems)
|
| 123 |
-
- Monitor the model's performance
|
| 124 |
-
""")
|
| 125 |
-
|
| 126 |
-
elif life_cycle_page == "Post-Deployment Maintenance":
|
| 127 |
-
st.write("""
|
| 128 |
-
Keep the model updated with new data:
|
| 129 |
-
- Retraining the model with fresh data
|
| 130 |
-
- Error analysis and model refinement
|
| 131 |
-
- Collecting user feedback for continuous improvement
|
| 132 |
-
""")
|
| 133 |
-
|
| 134 |
-
elif life_cycle_page == "End-User Interaction":
|
| 135 |
-
st.write("""
|
| 136 |
-
Present the model's results in an understandable way:
|
| 137 |
-
- Data visualization (e.g., charts, word clouds)
|
| 138 |
-
- Interactive dashboards (e.g., using Streamlit or Dash)
|
| 139 |
-
- Interface design (e.g., web or mobile apps)
|
| 140 |
-
""")
|
| 141 |
|
| 142 |
# Footer
|
| 143 |
st.sidebar.write("---")
|
|
|
|
| 5 |
|
| 6 |
# Sidebar for navigation
|
| 7 |
st.sidebar.title("Navigation")
|
| 8 |
+
pages = ["Introduction to NLP", "NLP Life Cycle", "NLP Techniques"]
|
| 9 |
page = st.sidebar.radio("Go to:", pages)
|
| 10 |
|
| 11 |
# Content for each page
|
|
|
|
| 24 |
NLP combines computational linguistics with machine learning and deep learning techniques to process language.
|
| 25 |
""")
|
| 26 |
|
| 27 |
+
elif page == "NLP Life Cycle":
|
| 28 |
+
st.header("NLP Life Cycle")
|
| 29 |
+
|
| 30 |
+
st.subheader("1. Problem Definition")
|
| 31 |
+
st.write("""
|
| 32 |
+
In this phase, the problem you're trying to solve with NLP is defined. Examples include:
|
| 33 |
+
- Sentiment analysis
|
| 34 |
+
- Named entity recognition (NER)
|
| 35 |
+
- Text classification
|
| 36 |
+
- Machine translation
|
| 37 |
+
- Language generation
|
| 38 |
+
""")
|
| 39 |
+
|
| 40 |
+
st.subheader("2. Data Collection")
|
| 41 |
+
st.write("""
|
| 42 |
+
Gather relevant textual data. Sources include:
|
| 43 |
+
- Web scraping (e.g., using BeautifulSoup or Scrapy)
|
| 44 |
+
- APIs (e.g., Twitter API)
|
| 45 |
+
- Pre-existing datasets (e.g., Kaggle, UCI repositories)
|
| 46 |
+
- User-generated content (e.g., reviews, social media)
|
| 47 |
+
""")
|
| 48 |
+
|
| 49 |
+
st.subheader("3. Data Preprocessing")
|
| 50 |
+
st.write("""
|
| 51 |
+
Prepare the data for modeling by performing tasks such as:
|
| 52 |
+
- Text cleaning (removing unnecessary characters, punctuation)
|
| 53 |
+
- Tokenization (splitting text into words/sentences)
|
| 54 |
+
- Stopword removal
|
| 55 |
+
- Stemming or lemmatization
|
| 56 |
+
- Part-of-speech tagging
|
| 57 |
+
""")
|
| 58 |
+
|
| 59 |
+
st.subheader("4. Feature Engineering")
|
| 60 |
+
st.write("""
|
| 61 |
+
Convert text data into numerical form for model consumption:
|
| 62 |
+
- Bag of Words (BoW)
|
| 63 |
+
- TF-IDF (Term Frequency-Inverse Document Frequency)
|
| 64 |
+
- Word embeddings (Word2Vec, GloVe)
|
| 65 |
+
- Contextual embeddings (BERT, GPT)
|
| 66 |
+
""")
|
| 67 |
+
|
| 68 |
+
st.subheader("5. Modeling")
|
| 69 |
+
st.write("""
|
| 70 |
+
Train machine learning or deep learning models using the preprocessed text data:
|
| 71 |
+
- Supervised learning (e.g., Logistic Regression, SVM)
|
| 72 |
+
- Unsupervised learning (e.g., K-means clustering)
|
| 73 |
+
- Deep learning (e.g., RNNs, LSTMs, BERT)
|
| 74 |
+
""")
|
| 75 |
+
|
| 76 |
+
st.subheader("6. Model Evaluation")
|
| 77 |
+
st.write("""
|
| 78 |
+
Evaluate the model's performance using metrics like:
|
| 79 |
+
- Accuracy
|
| 80 |
+
- Precision, Recall, F1-Score
|
| 81 |
+
- Confusion Matrix
|
| 82 |
+
- Cross-validation
|
| 83 |
+
""")
|
| 84 |
+
|
| 85 |
+
st.subheader("7. Model Optimization")
|
| 86 |
+
st.write("""
|
| 87 |
+
Improve model performance by:
|
| 88 |
+
- Hyperparameter tuning (e.g., grid search)
|
| 89 |
+
- Regularization (e.g., L2 regularization, dropout)
|
| 90 |
+
- Ensemble methods (e.g., Random Forest, XGBoost)
|
| 91 |
+
""")
|
| 92 |
+
|
| 93 |
+
st.subheader("8. Model Deployment")
|
| 94 |
+
st.write("""
|
| 95 |
+
Deploy the trained model into production:
|
| 96 |
+
- Expose the model via APIs (using Flask or FastAPI)
|
| 97 |
+
- Integrate with applications (e.g., chatbots, recommendation systems)
|
| 98 |
+
- Monitor the model's performance
|
| 99 |
+
""")
|
| 100 |
+
|
| 101 |
+
st.subheader("9. Post-Deployment Maintenance")
|
| 102 |
+
st.write("""
|
| 103 |
+
Keep the model updated with new data:
|
| 104 |
+
- Retraining the model with fresh data
|
| 105 |
+
- Error analysis and model refinement
|
| 106 |
+
- Collecting user feedback for continuous improvement
|
| 107 |
+
""")
|
| 108 |
+
|
| 109 |
+
st.subheader("10. End-User Interaction")
|
| 110 |
+
st.write("""
|
| 111 |
+
Present the model's results in an understandable way:
|
| 112 |
+
- Data visualization (e.g., charts, word clouds)
|
| 113 |
+
- Interactive dashboards (e.g., using Streamlit or Dash)
|
| 114 |
+
- Interface design (e.g., web or mobile apps)
|
| 115 |
+
""")
|
| 116 |
+
|
| 117 |
elif page == "NLP Techniques":
|
| 118 |
st.header("Common NLP Techniques")
|
| 119 |
+
|
| 120 |
+
st.subheader("1. Tokenization")
|
| 121 |
st.write("""
|
| 122 |
+
Tokenization is the process of breaking text into smaller units like words, phrases, or sentences. This is a crucial first step in many NLP tasks.
|
| 123 |
|
| 124 |
+
**Example:**
|
| 125 |
+
Text: "Natural Language Processing is amazing!"
|
| 126 |
+
Tokenized text: ["Natural", "Language", "Processing", "is", "amazing"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
+
Tokenization helps in making the text more manageable and ready for further processing.
|
| 129 |
""")
|
| 130 |
|
| 131 |
+
st.subheader("2. Stopword Removal")
|
| 132 |
+
st.write("""
|
| 133 |
+
Stopword removal involves eliminating common words (e.g., 'the', 'is', 'in') that may not contribute significantly to the meaning of the text.
|
| 134 |
+
|
| 135 |
+
**Example:**
|
| 136 |
+
Text: "The quick brown fox jumps over the lazy dog."
|
| 137 |
+
After stopword removal: ["quick", "brown", "fox", "jumps", "lazy", "dog"]
|
| 138 |
+
|
| 139 |
+
Removing stopwords helps reduce the size of the dataset and focuses on meaningful terms.
|
| 140 |
+
""")
|
| 141 |
+
|
| 142 |
+
st.subheader("3. Stemming and Lemmatization")
|
| 143 |
+
st.write("""
|
| 144 |
+
Both stemming and lemmatization are techniques for reducing words to their base or root form.
|
| 145 |
+
|
| 146 |
+
- **Stemming**: Cuts off prefixes or suffixes. For example, "running" becomes "run".
|
| 147 |
+
- **Lemmatization**: Uses a dictionary to find the base form of a word. For example, "better" becomes "good".
|
| 148 |
+
|
| 149 |
+
**Example:**
|
| 150 |
+
Word: "running"
|
| 151 |
+
- Stemming: "run"
|
| 152 |
+
- Lemmatization: "run"
|
| 153 |
+
""")
|
| 154 |
+
|
| 155 |
+
st.subheader("4. Part-of-Speech (POS) Tagging")
|
| 156 |
+
st.write("""
|
| 157 |
+
Part-of-speech tagging assigns a part-of-speech label (e.g., noun, verb, adjective) to each word in a sentence.
|
| 158 |
+
|
| 159 |
+
**Example:**
|
| 160 |
+
Text: "The cat sat on the mat."
|
| 161 |
+
POS tags: [("The", "DT"), ("cat", "NN"), ("sat", "VBD"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
|
| 162 |
+
|
| 163 |
+
POS tagging is useful for tasks like Named Entity Recognition (NER) or syntactic parsing.
|
| 164 |
+
""")
|
| 165 |
+
|
| 166 |
+
st.subheader("5. Named Entity Recognition (NER)")
|
| 167 |
+
st.write("""
|
| 168 |
+
Named Entity Recognition (NER) is the task of identifying named entities such as people, organizations, locations, etc., in text.
|
| 169 |
+
|
| 170 |
+
**Example:**
|
| 171 |
+
Text: "Apple is looking to buy a startup in London."
|
| 172 |
+
NER output: [("Apple", "ORG"), ("London", "LOC")]
|
| 173 |
|
| 174 |
+
NER is crucial for information extraction tasks like identifying company names or locations in a text.
|
| 175 |
+
""")
|
| 176 |
+
|
| 177 |
+
st.subheader("6. Sentiment Analysis")
|
| 178 |
+
st.write("""
|
| 179 |
+
Sentiment Analysis is the process of determining the sentiment expressed in a text, typically classified as positive, negative, or neutral.
|
| 180 |
+
|
| 181 |
+
**Example:**
|
| 182 |
+
Text: "I love this phone, it's amazing!"
|
| 183 |
+
Sentiment: Positive
|
| 184 |
+
|
| 185 |
+
Sentiment analysis is widely used in social media monitoring, customer feedback analysis, and product reviews.
|
| 186 |
+
""")
|
| 187 |
+
|
| 188 |
+
st.subheader("7. Text Summarization")
|
| 189 |
+
st.write("""
|
| 190 |
+
Text summarization generates a shorter version of a given text, maintaining the most important information.
|
| 191 |
+
|
| 192 |
+
- **Extractive summarization**: Extracts important sentences from the original text.
|
| 193 |
+
- **Abstractive summarization**: Generates new sentences to summarize the original content.
|
| 194 |
+
|
| 195 |
+
**Example (Extractive):**
|
| 196 |
+
Original text: "Natural Language Processing is a subfield of AI. It deals with how computers understand human language."
|
| 197 |
+
Summarized: "NLP is a subfield of AI that deals with human language."
|
| 198 |
+
|
| 199 |
+
Text summarization helps in condensing large documents into key points.
|
| 200 |
+
""")
|
| 201 |
+
|
| 202 |
+
st.subheader("8. Machine Translation")
|
| 203 |
+
st.write("""
|
| 204 |
+
Machine Translation is the task of translating text from one language to another.
|
| 205 |
+
|
| 206 |
+
**Example:**
|
| 207 |
+
Text in English: "Hello, how are you?"
|
| 208 |
+
Translated text in Spanish: "Hola, 驴c贸mo est谩s?"
|
| 209 |
+
|
| 210 |
+
Machine translation systems like Google Translate use deep learning models to produce translations.
|
| 211 |
+
""")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 212 |
|
| 213 |
# Footer
|
| 214 |
st.sidebar.write("---")
|