LakshmiHarika commited on
Commit
475aeed
Β·
verified Β·
1 Parent(s): 276329f

Create 3.Terminology

Browse files
Files changed (1) hide show
  1. pages/3.Terminology +136 -0
pages/3.Terminology ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ st.markdown("""
4
+ <style>
5
+ /* Set a soft background color */
6
+ body {
7
+ background-color: #eef2f7;
8
+ }
9
+ /* Style for main title */
10
+ h1 {
11
+ color: black;
12
+ font-family: 'Roboto', sans-serif;
13
+ font-weight: 700;
14
+ text-align: center;
15
+ margin-bottom: 25px;
16
+ }
17
+ /* Style for headers */
18
+ h2 {
19
+ color: red;
20
+ font-family: 'Roboto', sans-serif;
21
+ font-weight: 600;
22
+ margin-top: 30px;
23
+ }
24
+
25
+ /* Style for subheaders */
26
+ h3 {
27
+ color: violet;
28
+ font-family: 'Roboto', sans-serif;
29
+ font-weight: 500;
30
+ margin-top: 20px;
31
+ }
32
+ .custom-subheader {
33
+ color: violet;
34
+ font-family: 'Roboto', sans-serif;
35
+ font-weight: 600;
36
+ margin-bottom: 15px;
37
+ }
38
+ /* Paragraph styling */
39
+ p {
40
+ font-family: 'Georgia', serif;
41
+ line-height: 1.8;
42
+ color: black;
43
+ margin-bottom: 20px;
44
+ }
45
+ /* List styling with checkmark bullets */
46
+ .icon-bullet {
47
+ list-style-type: none;
48
+ padding-left: 20px;
49
+ }
50
+ .icon-bullet li {
51
+ font-family: 'Georgia', serif;
52
+ font-size: 1.1em;
53
+ margin-bottom: 10px;
54
+ color: black;
55
+ }
56
+ .icon-bullet li::before {
57
+ content: "β—†";
58
+ padding-right: 10px;
59
+ color: black;
60
+ }
61
+ /* Sidebar styling */
62
+ .sidebar .sidebar-content {
63
+ background-color: #ffffff;
64
+ border-radius: 10px;
65
+ padding: 15px;
66
+ }
67
+ .sidebar h2 {
68
+ color: #495057;
69
+ }
70
+ /* Custom button style */
71
+ .streamlit-button {
72
+ background-color: #00FFFF;
73
+ color: #000000;
74
+ font-weight: bold;
75
+ }
76
+ </style>
77
+ """, unsafe_allow_html=True)
78
+
79
+
80
+ st.markdown("<h1 class='title'>πŸ“– NLP Terminology</h1>", unsafe_allow_html=True)
81
+ st.markdown("<p class='caption'>✨ Explore essential terms in Natural Language Processing and their meanings!...</p>", unsafe_allow_html=True)
82
+
83
+ st.header("πŸ“ Corpus")
84
+ st.markdown("- **A corpus** is a collection of documents.")
85
+
86
+ st.header("πŸ“„ Document")
87
+ st.markdown("- **A document** is a collection of sentences, paragraphs, single words, or even single characters.")
88
+
89
+ st.header("πŸ“ Paragraph")
90
+ st.markdown("- **A paragraph** consists of multiple sentences.")
91
+
92
+ st.header("πŸ“’ Sentence")
93
+ st.markdown("- **A sentence** is a collection of words.")
94
+
95
+ st.header("πŸ”€ Word")
96
+ st.markdown("- **Words** are made up of characters.")
97
+
98
+ st.header("πŸ”  Character")
99
+ st.markdown("- **A character** can be a number, alphabet, or special symbol.")
100
+
101
+ st.header("βœ‚οΈ Tokenization")
102
+ st.markdown("- **Tokenization** is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.")
103
+
104
+ st.subheader("πŸ› οΈ Types of Tokenization")
105
+ st.markdown("""
106
+ - πŸ”Ή **Sentence Tokenization** – Splits text into sentences.
107
+ - πŸ”Ή **Word Tokenization** – Splits sentences into words.
108
+ - πŸ”Ή **Character Tokenization** – Splits words into individual characters.
109
+ """)
110
+
111
+ st.subheader("πŸ“ Sentence Tokenization")
112
+ st.markdown("- **Breaks a large text into meaningful sentence units.**")
113
+
114
+ st.subheader("πŸ“– Word Tokenization")
115
+ st.markdown("- **Splits a sentence into individual words.**")
116
+
117
+ st.subheader("πŸ”‘ Character Tokenization")
118
+ st.markdown("- **Breaks words into separate characters.**")
119
+
120
+ st.header("🚫 Stop Words")
121
+ st.markdown("- **Common words** (e.g., 'the', 'is', 'and') that do not add meaning to the text but maintain grammatical structure.")
122
+
123
+ st.header("πŸ“Š Vectorization")
124
+ st.markdown("- **Transforms text into numerical representation** for machine learning models.")
125
+
126
+ st.subheader("πŸ”’ Different Types of Vectorization Techniques")
127
+ st.markdown("""
128
+ - 🎯 **One-Hot Encoding**
129
+ - 🏷️ **Bag of Words (BoW)**
130
+ - πŸ“Š **TF-IDF (Term Frequency-Inverse Document Frequency)**
131
+ - 🧠 **Word2Vec**
132
+ - 🌍 **GloVe**
133
+ - ⚑ **FastText**
134
+ """)
135
+
136
+ st.success("πŸš€ Mastering these **NLP terminologies** will help you build powerful text-processing applications!")