Spaces:
Running
Running
updated FAQ
#1
by silvia-st - opened
app.py
CHANGED
|
@@ -380,7 +380,7 @@ if selected == "About":
|
|
| 380 |
|
| 381 |
This interface was developed in the framework of Silvia Stopponi’s PhD project, \
|
| 382 |
supervised by Saskia Peels-Matthey and Malvina Nissim at the University of Groningen (The Netherlands). \
|
| 383 |
-
The aim of this tool is to make
|
| 384 |
|
| 385 |
The following people were involved in the creation of this interface:
|
| 386 |
|
|
@@ -415,8 +415,8 @@ if selected == "FAQ":
|
|
| 415 |
|
| 416 |
with st.expander(r"$\textsf{\Large What is this interface based on?}$"):
|
| 417 |
st.write(
|
| 418 |
-
"This interface is based on
|
| 419 |
-
|
| 420 |
This happens during the training phase, in which models process a corpus of texts in the \
|
| 421 |
target language(s). Once trained, linguistic information can be extracted from the models, or \
|
| 422 |
the models can be used to perform specific linguistic tasks. In this interface, we focus on the \
|
|
@@ -427,12 +427,12 @@ if selected == "FAQ":
|
|
| 427 |
|
| 428 |
with st.expander(r"$\textsf{\Large What are Word Embeddings?}$"):
|
| 429 |
st.write(
|
| 430 |
-
"Word Embeddings are representations of words obtained via
|
| 431 |
-
detail, they are
|
| 432 |
represent each word in the training corpus in a multi-dimensional space. Words that are more \
|
| 433 |
similar in meaning will be closer to one another in this vector space (or semantic space) than \
|
| 434 |
words that are less similar in meaning. The term *word embeddings* is often used as a \
|
| 435 |
-
synonym of *predict models*, a type of
|
| 436 |
with the Word2Vec architecture. This interface is built upon Word2Vec models."
|
| 437 |
)
|
| 438 |
|
|
@@ -536,7 +536,7 @@ if selected == "FAQ":
|
|
| 536 |
meaning, in its specific training corpus. \
|
| 537 |
\
|
| 538 |
Please take into account that the results for words occurring very rarely may be inaccurate. \
|
| 539 |
-
|
| 540 |
may not provide enough evidence to obtain reliable results. But it has been observed that an \
|
| 541 |
extremely high word frequency can also affect the results. It often happens that the nearest \
|
| 542 |
neighbours to words occurring very often are other high-frequency words, such as stop \
|
|
|
|
| 380 |
|
| 381 |
This interface was developed in the framework of Silvia Stopponi’s PhD project, \
|
| 382 |
supervised by Saskia Peels-Matthey and Malvina Nissim at the University of Groningen (The Netherlands). \
|
| 383 |
+
The aim of this tool is to make distributional semantic models trained on Ancient Greek available to all interested people, respectless of their coding skills. \
|
| 384 |
|
| 385 |
The following people were involved in the creation of this interface:
|
| 386 |
|
|
|
|
| 415 |
|
| 416 |
with st.expander(r"$\textsf{\Large What is this interface based on?}$"):
|
| 417 |
st.write(
|
| 418 |
+
"This interface is based on distributional semantic models. Distributional semantic models \
|
| 419 |
+
are computatinoal models that store statistical information about word co-occurrences. \
|
| 420 |
This happens during the training phase, in which models process a corpus of texts in the \
|
| 421 |
target language(s). Once trained, linguistic information can be extracted from the models, or \
|
| 422 |
the models can be used to perform specific linguistic tasks. In this interface, we focus on the \
|
|
|
|
| 427 |
|
| 428 |
with st.expander(r"$\textsf{\Large What are Word Embeddings?}$"):
|
| 429 |
st.write(
|
| 430 |
+
"Word Embeddings are representations of words obtained via training on a corpus of texts. More in \
|
| 431 |
+
detail, they are ordered sequences of numbers (called *vectors*) produced by a model to \
|
| 432 |
represent each word in the training corpus in a multi-dimensional space. Words that are more \
|
| 433 |
similar in meaning will be closer to one another in this vector space (or semantic space) than \
|
| 434 |
words that are less similar in meaning. The term *word embeddings* is often used as a \
|
| 435 |
+
synonym of *predict models*, a type of distributional semantic models introduced by Mikolov *et al.* (2013) \
|
| 436 |
with the Word2Vec architecture. This interface is built upon Word2Vec models."
|
| 437 |
)
|
| 438 |
|
|
|
|
| 536 |
meaning, in its specific training corpus. \
|
| 537 |
\
|
| 538 |
Please take into account that the results for words occurring very rarely may be inaccurate. \
|
| 539 |
+
Distributional semantic models learn on a statistical basis, so that a word with only few occurrences \
|
| 540 |
may not provide enough evidence to obtain reliable results. But it has been observed that an \
|
| 541 |
extremely high word frequency can also affect the results. It often happens that the nearest \
|
| 542 |
neighbours to words occurring very often are other high-frequency words, such as stop \
|