Update app.py
Browse files
app.py
CHANGED
|
@@ -138,10 +138,31 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 138 |
|
| 139 |
**Example**: Scraping customer reviews from Amazon to analyze sentiment and feedback about a product.
|
| 140 |
""")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
elif lifecycle_option == "Text Preprocessing":
|
| 143 |
st.write("""
|
| 144 |
-
#### π§Ή
|
| 145 |
Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
|
| 146 |
- **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
|
| 147 |
- **Stop Words Removal**: Removing common words that donβt contribute much information.
|
|
@@ -157,7 +178,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 157 |
|
| 158 |
elif lifecycle_option == "Text Representation":
|
| 159 |
st.write("""
|
| 160 |
-
#### π
|
| 161 |
After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
|
| 162 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 163 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
|
@@ -170,7 +191,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 170 |
|
| 171 |
elif lifecycle_option == "Model Training":
|
| 172 |
st.write("""
|
| 173 |
-
#### ποΈββοΈ
|
| 174 |
In the model training stage, machine learning algorithms are trained on the preprocessed and represented text data. The choice of model depends on the task:
|
| 175 |
- **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
|
| 176 |
- **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
|
|
@@ -181,7 +202,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 181 |
|
| 182 |
elif lifecycle_option == "Evaluation":
|
| 183 |
st.write("""
|
| 184 |
-
#### π
|
| 185 |
After training the model, it's important to evaluate its performance using metrics such as accuracy, precision, recall, and F1-score.
|
| 186 |
- **Accuracy**: The percentage of correct predictions.
|
| 187 |
- **Precision**: The percentage of relevant instances among the retrieved instances.
|
|
@@ -193,7 +214,7 @@ elif st.session_state.selected_page == "NLP Lifecycle":
|
|
| 193 |
|
| 194 |
elif lifecycle_option == "Deployment":
|
| 195 |
st.write("""
|
| 196 |
-
#### π
|
| 197 |
Once the model is evaluated and tuned, it is deployed into production where it can be used by end users. Deployment involves:
|
| 198 |
- **Integration** with web applications, chatbots, or other tools.
|
| 199 |
- **API Development**: Exposing the model through an API for real-time predictions.
|
|
|
|
| 138 |
|
| 139 |
**Example**: Scraping customer reviews from Amazon to analyze sentiment and feedback about a product.
|
| 140 |
""")
|
| 141 |
+
elif lifecycle_option == "Simple EDA":
|
| 142 |
+
st.write("""
|
| 143 |
+
#### π 3. Simple EDA
|
| 144 |
+
Simple Exploratory Data Analysis (Simple EDA) provides a quick overview of the dataset. It focuses on understanding the basic structure, spotting missing values, checking data types, and visualizing distributions.
|
| 145 |
+
- **Basic Data Inspection**: Viewing data types, first few rows, and general structure.
|
| 146 |
+
- **Summary Statistics**: Quick summary of key metrics like mean, median, and standard deviation.
|
| 147 |
+
- **Basic Visualizations**: Simple charts like histograms and boxplots to explore variable distributions.
|
| 148 |
+
- **Missing Values Check**: Identifying columns with missing values.
|
| 149 |
+
- **Outlier Detection**: Visual identification of outliers.
|
| 150 |
+
|
| 151 |
+
**Example**: In a sales dataset:
|
| 152 |
+
- Basic Data Inspection:
|
| 153 |
+
- Shape of the dataset: (1000, 5)
|
| 154 |
+
- First few rows: [Sales, Marketing Spend, Date, etc.]
|
| 155 |
+
- Summary Statistics:
|
| 156 |
+
- Mean Sales: 1000
|
| 157 |
+
- Median Sales: 950
|
| 158 |
+
- Visualizations:
|
| 159 |
+
- Histogram for sales distribution
|
| 160 |
+
- Boxplot for outlier detection
|
| 161 |
+
""")
|
| 162 |
|
| 163 |
elif lifecycle_option == "Text Preprocessing":
|
| 164 |
st.write("""
|
| 165 |
+
#### π§Ή 4. Text Preprocessing
|
| 166 |
Text preprocessing prepares raw text for further analysis. This stage involves cleaning and transforming the data into a structured format that machine learning models can understand.
|
| 167 |
- **Tokenization**: Splitting text into smaller units (e.g., words, phrases).
|
| 168 |
- **Stop Words Removal**: Removing common words that donβt contribute much information.
|
|
|
|
| 178 |
|
| 179 |
elif lifecycle_option == "Text Representation":
|
| 180 |
st.write("""
|
| 181 |
+
#### π 5. Text Representation
|
| 182 |
After preprocessing, the text data needs to be converted into a numerical format for use in machine learning models. There are several methods for text representation:
|
| 183 |
- **Bag of Words (BoW)**: Converts text into a matrix of word frequencies.
|
| 184 |
- **TF-IDF**: Weighs words based on their frequency in a specific document relative to their frequency across the entire dataset.
|
|
|
|
| 191 |
|
| 192 |
elif lifecycle_option == "Model Training":
|
| 193 |
st.write("""
|
| 194 |
+
#### ποΈββοΈ 6. Model Training
|
| 195 |
In the model training stage, machine learning algorithms are trained on the preprocessed and represented text data. The choice of model depends on the task:
|
| 196 |
- **Text Classification**: Naive Bayes, Support Vector Machines (SVM), or neural networks.
|
| 197 |
- **Named Entity Recognition (NER)**: Conditional Random Fields (CRF), LSTMs, or transformers.
|
|
|
|
| 202 |
|
| 203 |
elif lifecycle_option == "Evaluation":
|
| 204 |
st.write("""
|
| 205 |
+
#### π
7. Evaluation
|
| 206 |
After training the model, it's important to evaluate its performance using metrics such as accuracy, precision, recall, and F1-score.
|
| 207 |
- **Accuracy**: The percentage of correct predictions.
|
| 208 |
- **Precision**: The percentage of relevant instances among the retrieved instances.
|
|
|
|
| 214 |
|
| 215 |
elif lifecycle_option == "Deployment":
|
| 216 |
st.write("""
|
| 217 |
+
#### π 8. Deployment
|
| 218 |
Once the model is evaluated and tuned, it is deployed into production where it can be used by end users. Deployment involves:
|
| 219 |
- **Integration** with web applications, chatbots, or other tools.
|
| 220 |
- **API Development**: Exposing the model through an API for real-time predictions.
|