Harika22 commited on
Commit
5f0db14
·
verified ·
1 Parent(s): 7b57fd0

Update pages/3_Terminology.py

Browse files
Files changed (1) hide show
  1. pages/3_Terminology.py +77 -12
pages/3_Terminology.py CHANGED
@@ -26,6 +26,47 @@ st.markdown(
26
  color: black
27
  animation: fadeIn 2s ease-in-out;
28
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  .section {
30
  font-size: 1.1rem;
31
  text-align: justify;
@@ -40,7 +81,7 @@ st.markdown(
40
  }
41
  .term {
42
  font-weight: bold;
43
- color: black
44
  animation: fadeIn 3s ease-in-out;
45
  }
46
  .definition {
@@ -63,7 +104,7 @@ st.markdown(
63
  st.markdown(
64
  """
65
  <p class="section"><span class="term">Documents</span><br>
66
- It is a collection of sentence / paragraph / single word / single character
67
  </p>
68
  """,
69
  unsafe_allow_html=True,
@@ -71,8 +112,8 @@ st.markdown(
71
 
72
  st.markdown(
73
  """
74
- <p class="section"><span class="term">Stemming</span><br>
75
- Stemming is the process of reducing words to their base or root form. For example, "running" becomes "run." It helps in reducing the complexity of text data by grouping similar words together.
76
  </p>
77
  """,
78
  unsafe_allow_html=True,
@@ -80,8 +121,8 @@ st.markdown(
80
 
81
  st.markdown(
82
  """
83
- <p class="section"><span class="term">Lemmatization</span><br>
84
- Lemmatization is a more advanced form of stemming that reduces words to their base form by considering the context and meaning. For example, "better" becomes "good" based on its usage in a sentence.
85
  </p>
86
  """,
87
  unsafe_allow_html=True,
@@ -89,8 +130,8 @@ st.markdown(
89
 
90
  st.markdown(
91
  """
92
- <p class="section"><span class="term">Named Entity Recognition (NER)</span><br>
93
- NER is the task of identifying and classifying named entities in text, such as person names, locations, organizations, and dates. This technique is useful in tasks like information retrieval and summarization.
94
  </p>
95
  """,
96
  unsafe_allow_html=True,
@@ -98,8 +139,8 @@ st.markdown(
98
 
99
  st.markdown(
100
  """
101
- <p class="section"><span class="term">Part-of-Speech (POS) Tagging</span><br>
102
- POS tagging involves labeling each word in a sentence with its grammatical category, such as noun, verb, or adjective. It helps in understanding the syntactic structure of the text.
103
  </p>
104
  """,
105
  unsafe_allow_html=True,
@@ -107,12 +148,36 @@ st.markdown(
107
 
108
  st.markdown(
109
  """
110
- <p class="section"><span class="term">Word Embeddings</span><br>
111
- Word embeddings are numerical representations of words in a continuous vector space, where similar words are closer together. Common techniques include Word2Vec, GloVe, and FastText.
112
  </p>
113
  """,
114
  unsafe_allow_html=True,
115
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
  st.markdown(
118
  """
 
26
  color: black
27
  animation: fadeIn 2s ease-in-out;
28
  }
29
+
30
+ /* Style for headers */
31
+ h2 {
32
+ color: violet;
33
+ font-family: 'Roboto', sans-serif;
34
+ font-weight: 600;
35
+ margin-top: 30px;
36
+ }
37
+
38
+ /* Style for subheaders */
39
+ h3 {
40
+ color: green;
41
+ font-family: 'Roboto', sans-serif;
42
+ font-weight: 500;
43
+ margin-top: 20px;
44
+ }
45
+ .custom-subheader {
46
+ color: #00FFFF;
47
+ font-family: 'Roboto', sans-serif;
48
+ font-weight: 600;
49
+ margin-bottom: 15px;
50
+ }
51
+
52
+ .icon-bullet {
53
+ list-style-type: none;
54
+ padding-left: 20px;
55
+ }
56
+
57
+ .icon-bullet li {
58
+ font-family: 'Georgia', serif;
59
+ font-size: 1.1em;
60
+ margin-bottom: 10px;
61
+ color: black;
62
+ }
63
+
64
+ .icon-bullet li::before {
65
+ content: "◆";
66
+ padding-right: 10px;
67
+ color: black;
68
+ }
69
+
70
  .section {
71
  font-size: 1.1rem;
72
  text-align: justify;
 
81
  }
82
  .term {
83
  font-weight: bold;
84
+ color: red
85
  animation: fadeIn 3s ease-in-out;
86
  }
87
  .definition {
 
104
  st.markdown(
105
  """
106
  <p class="section"><span class="term">Documents</span><br>
107
+ Document is defined as collection of sentence / paragraph / single word / single character
108
  </p>
109
  """,
110
  unsafe_allow_html=True,
 
112
 
113
  st.markdown(
114
  """
115
+ <p class="section"><span class="term">Paragraph</span><br>
116
+ Paragraph is defined as collection of sentence.
117
  </p>
118
  """,
119
  unsafe_allow_html=True,
 
121
 
122
  st.markdown(
123
  """
124
+ <p class="section"><span class="term">Sentence</span><br>
125
+ Sentence is defined as collection of words.
126
  </p>
127
  """,
128
  unsafe_allow_html=True,
 
130
 
131
  st.markdown(
132
  """
133
+ <p class="section"><span class="term">Words</span><br>
134
+ Words are defined as collection of characters
135
  </p>
136
  """,
137
  unsafe_allow_html=True,
 
139
 
140
  st.markdown(
141
  """
142
+ <p class="section"><span class="term">Character</span><br>
143
+ Character can either be in number , alphabets or special symbol.
144
  </p>
145
  """,
146
  unsafe_allow_html=True,
 
148
 
149
  st.markdown(
150
  """
151
+ <p class="section"><span class="term">Tokenization</span><br>
152
+ It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens.
153
  </p>
154
  """,
155
  unsafe_allow_html=True,
156
  )
157
+ st.header("Types of Tokenization")
158
+ st.markdown("""
159
+ <ul class="icon-bullet">
160
+ <li>Sentence tokenization</li>
161
+ <li>Word tokennization</li>
162
+ <li>Character tokenization </li>
163
+ </ul>
164
+ """, unsafe_allow_html=True)
165
+
166
+ st.subheader("Sentence tokenization")
167
+ st.markdown('''
168
+ - It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in sentence.
169
+ ''')
170
+
171
+ st.subheader("Word tokenization")
172
+ st.markdown('''
173
+ - It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are words.
174
+ ''')
175
+
176
+ st.subheader("Character tokenization")
177
+ st.markdown('''
178
+ - It is a technique by using which we can convert a huge chunk into small entity where those small entities are known as tokens which are in characters.
179
+ ''')
180
+
181
 
182
  st.markdown(
183
  """