Spaces:
Runtime error
Runtime error
Creating description
Browse files
app.py
CHANGED
|
@@ -476,7 +476,26 @@ with gr.Blocks(title="Automatic Literacy and Speech Assesmen") as demo:
|
|
| 476 |
text = gr.Textbox()
|
| 477 |
phones = gr.Textbox()
|
| 478 |
|
| 479 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 480 |
|
| 481 |
|
| 482 |
grade.click(reading_difficulty, inputs=in_text, outputs=diff_output)
|
|
|
|
| 476 |
text = gr.Textbox()
|
| 477 |
phones = gr.Textbox()
|
| 478 |
|
| 479 |
+
gr.Markdown("""**Reading Difficulty**- Automatically determining how difficult something is to read is a difficult task as underlying
|
| 480 |
+
semantics are relevant. To efficiently compute text difficulty, a Distil-Bert pre-trained model is fine-tuned for regression
|
| 481 |
+
using The CommonLit Ease of Readability (CLEAR) Corpus. This model scores the text on how difficult it would be for a student
|
| 482 |
+
to understand.
|
| 483 |
+
""")
|
| 484 |
+
gr.Markdown("""**Lexical Diversity**- The lexical diversity score is computed by taking the ratio of unique similar words to total similar words
|
| 485 |
+
squared. The similarity is computed as if the cosine similarity of the word2vec embeddings is greater than .75. It is bad writing/speech
|
| 486 |
+
practice to repeat the same words when it's possible not to. Vocabulary diversity is generally computed by taking the ratio of unique
|
| 487 |
+
strings/ total strings. This does not give an indication if the person has a large vocabulary or if the topic does not require a diverse
|
| 488 |
+
vocabulary to express it. This algorithm only scores the text based on how many times a unique word was chosen for a semantic idea, e.g.,
|
| 489 |
+
"Forest" and "Trees" are 2 words to represent one semantic idea, so this would receive a 100% lexical diversity score, vs using the word
|
| 490 |
+
"Forest" twice would yield you a 25% diversity score, (1 unique word/ 2 total words)^2
|
| 491 |
+
""")
|
| 492 |
+
gr.Markdown("""**Speech Pronunciation Scoring-**- The Wave2Vec 2.0 model is utilized to convert audio into text in real-time. The model predicts words or phonemes
|
| 493 |
+
(smallest unit of speech distinguishing one word (or word element) from another) from the input audio from the user. Due to the nature of the model,
|
| 494 |
+
users with poor pronunciation get inaccurate results. This project attempts to score pronunciation by asking a user to read a target excerpt into the
|
| 495 |
+
microphone. We then pass this audio through Wave2Vec to get the inferred intended words. We measure the loss as the Levenshtein distance between the
|
| 496 |
+
target and actual transcripts- the Levenshtein distance between two words is the minimum number of single-character edits required to change one word
|
| 497 |
+
into the other.
|
| 498 |
+
""")
|
| 499 |
|
| 500 |
|
| 501 |
grade.click(reading_difficulty, inputs=in_text, outputs=diff_output)
|