Update pages/Pipeline.py
Browse files- pages/Pipeline.py +47 -1
pages/Pipeline.py
CHANGED
|
@@ -2,4 +2,50 @@ import streamlit as st
|
|
| 2 |
|
| 3 |
st.header("**Natural Language Processing Pipeline**")
|
| 4 |
|
| 5 |
-
st.image("1726065094370.png",use_container_width = True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
st.header("**Natural Language Processing Pipeline**")
|
| 4 |
|
| 5 |
+
st.image("1726065094370.png",use_container_width = True)
|
| 6 |
+
|
| 7 |
+
st.write("""
|
| 8 |
+
##NLP Pipeline Steps
|
| 9 |
+
|
| 10 |
+
1. **Text Input and Data Collection**
|
| 11 |
+
- **What it is**: Collecting the text data to be analyzed or processed.
|
| 12 |
+
- **Sources**: Websites, documents, emails, social media, etc.
|
| 13 |
+
- **Why it’s important**: Provides the raw data necessary to build an NLP system.
|
| 14 |
+
|
| 15 |
+
2. **Text Preprocessing**
|
| 16 |
+
- **What it is**: Cleaning and preparing the raw text for analysis.
|
| 17 |
+
- **Examples**:
|
| 18 |
+
- Removing punctuation, numbers, and stopwords.
|
| 19 |
+
- Lowercasing text, tokenization, and stemming.
|
| 20 |
+
- **Why it’s important**: Ensures that the data is clean and structured for better results.
|
| 21 |
+
|
| 22 |
+
3. **Text Representation**
|
| 23 |
+
- **What it is**: Converting text into a numerical format that a machine can understand.
|
| 24 |
+
- **Examples**: Techniques like Bag of Words, TF-IDF, or word embeddings (Word2Vec, BERT).
|
| 25 |
+
- **Why it’s important**: Machines work with numbers, not raw text, so this step is essential.
|
| 26 |
+
|
| 27 |
+
4. **Feature Selection**
|
| 28 |
+
- **What it is**: Selecting the most relevant pieces of data or words for the task.
|
| 29 |
+
- **Examples**: Choosing keywords or focusing on specific phrases that matter for classification or prediction.
|
| 30 |
+
- **Why it’s important**: Reduces noise in the data and improves the model’s performance.
|
| 31 |
+
|
| 32 |
+
5. **Model Selection and Training**
|
| 33 |
+
- **What it is**: Choosing an appropriate machine learning or deep learning model and training it on the data.
|
| 34 |
+
- **Examples**: Algorithms like Logistic Regression, SVM, or deep learning models like BERT.
|
| 35 |
+
- **Why it’s important**: The model learns patterns in the data to perform tasks like classification or translation.
|
| 36 |
+
|
| 37 |
+
6. **Model Deployment and Inference**
|
| 38 |
+
- **What it is**: Deploying the trained model to a real-world environment to make predictions or analyze text.
|
| 39 |
+
- **Examples**: A chatbot responding to queries or a search engine ranking results.
|
| 40 |
+
- **Why it’s important**: Makes the model usable for solving real-world problems.
|
| 41 |
+
|
| 42 |
+
7. **Evaluation and Optimization**
|
| 43 |
+
- **What it is**: Assessing the model’s performance and fine-tuning it for better results.
|
| 44 |
+
- **Examples**: Using metrics like accuracy, precision, recall, or F1-score to evaluate the model.
|
| 45 |
+
- **Why it’s important**: Ensures the model is reliable and effective in its task.
|
| 46 |
+
|
| 47 |
+
8. **Iteration and Improvements**
|
| 48 |
+
- **What it is**: Continuously updating and improving the model based on new data or feedback.
|
| 49 |
+
- **Examples**: Retraining the model when new data is available or tweaking features to improve performance.
|
| 50 |
+
- **Why it’s important**: Keeps the system relevant and accurate over time.
|
| 51 |
+
""")
|