Spaces:
Running
Running
More renaming phone_distance to phone_errors
Browse files- README.md +7 -7
- phone_errors.py +5 -5
README.md
CHANGED
|
@@ -13,7 +13,7 @@ app_file: app.py
|
|
| 13 |
pinned: false
|
| 14 |
---
|
| 15 |
|
| 16 |
-
# Metric Card for Phone
|
| 17 |
|
| 18 |
## Metric Description
|
| 19 |
Error rates in terms of distance between articulatory phonological features can help understand differences between strings in the International Phonetic Alphabet (IPA) in a linguistically motivated way.
|
|
@@ -23,8 +23,8 @@ This is useful when evaluating speech recognition or orthographic to IPA convers
|
|
| 23 |
|
| 24 |
```python
|
| 25 |
import evaluate
|
| 26 |
-
|
| 27 |
-
|
| 28 |
```
|
| 29 |
|
| 30 |
### Inputs
|
|
@@ -51,7 +51,7 @@ The computation returns a dictionary with the following key and values:
|
|
| 51 |
|
| 52 |
Simplest use case to compute phone error rates between two IPA strings:
|
| 53 |
```python
|
| 54 |
-
>>>
|
| 55 |
{'phone_error_rates': [0.6666666666666666, 0.5, 0.25], 'mean_phone_error_rate': 0.47222222222222215,
|
| 56 |
'phone_feature_error_rates': [0.08333333333333333, 0.125, 0.041666666666666664], 'mean_phone_feature_error_rate': 0.08333333333333333,
|
| 57 |
'feature_error_rates': [0.027777777777777776, 0.0625, 0.30208333333333337], 'mean_feature_error_rate': 0.13078703703703706}
|
|
@@ -59,7 +59,7 @@ Simplest use case to compute phone error rates between two IPA strings:
|
|
| 59 |
|
| 60 |
Normalize phone feature error rate by the length of the reference string:
|
| 61 |
```python
|
| 62 |
-
>>>
|
| 63 |
{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333,
|
| 64 |
'phone_feature_error_rates': [0.027777777777777776, 0.0625], 'mean_phone_feature_error_rate': 0.04513888888888889,
|
| 65 |
'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rate': 0.04513888888888889}
|
|
@@ -67,7 +67,7 @@ Normalize phone feature error rate by the length of the reference string:
|
|
| 67 |
|
| 68 |
Error rates may be greater than 1.0 if the reference string is shorter than the prediction string:
|
| 69 |
```python
|
| 70 |
-
>>>
|
| 71 |
{'phone_error_rates': [1.0], 'mean_phone_error_rate': 1.0,
|
| 72 |
'phone_feature_error_rates': [1.0416666666666667], 'mean_phone_feature_error_rate': 1.0416666666666667,
|
| 73 |
'feature_error_rates': [0.020833333333333332], 'mean_feature_error_rate': 0.020833333333333332}
|
|
@@ -75,7 +75,7 @@ Error rates may be greater than 1.0 if the reference string is shorter than the
|
|
| 75 |
|
| 76 |
Empty reference strings will cause an ValueError, you should handle them separately:
|
| 77 |
```python
|
| 78 |
-
>>>
|
| 79 |
Traceback (most recent call last):
|
| 80 |
...
|
| 81 |
raise ValueError("one or more references are empty strings")
|
|
|
|
| 13 |
pinned: false
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# Metric Card for Phone Errors
|
| 17 |
|
| 18 |
## Metric Description
|
| 19 |
Error rates in terms of distance between articulatory phonological features can help understand differences between strings in the International Phonetic Alphabet (IPA) in a linguistically motivated way.
|
|
|
|
| 23 |
|
| 24 |
```python
|
| 25 |
import evaluate
|
| 26 |
+
phone_errors = evaluate.load("ginic/phone_errors")
|
| 27 |
+
phone_errors.compute(predictions=["bob", "ði"], references=["pop", "ðə"])
|
| 28 |
```
|
| 29 |
|
| 30 |
### Inputs
|
|
|
|
| 51 |
|
| 52 |
Simplest use case to compute phone error rates between two IPA strings:
|
| 53 |
```python
|
| 54 |
+
>>> phone_errors.compute(predictions=["bob", "ði", "spin"], references=["pop", "ðə", "spʰin"])
|
| 55 |
{'phone_error_rates': [0.6666666666666666, 0.5, 0.25], 'mean_phone_error_rate': 0.47222222222222215,
|
| 56 |
'phone_feature_error_rates': [0.08333333333333333, 0.125, 0.041666666666666664], 'mean_phone_feature_error_rate': 0.08333333333333333,
|
| 57 |
'feature_error_rates': [0.027777777777777776, 0.0625, 0.30208333333333337], 'mean_feature_error_rate': 0.13078703703703706}
|
|
|
|
| 59 |
|
| 60 |
Normalize phone feature error rate by the length of the reference string:
|
| 61 |
```python
|
| 62 |
+
>>> phone_errors.compute(predictions=["bob", "ði"], references=["pop", "ðə"], is_normalize_pfer=True)
|
| 63 |
{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333,
|
| 64 |
'phone_feature_error_rates': [0.027777777777777776, 0.0625], 'mean_phone_feature_error_rate': 0.04513888888888889,
|
| 65 |
'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rate': 0.04513888888888889}
|
|
|
|
| 67 |
|
| 68 |
Error rates may be greater than 1.0 if the reference string is shorter than the prediction string:
|
| 69 |
```python
|
| 70 |
+
>>> phone_errors.compute(predictions=["bob"], references=["po"])
|
| 71 |
{'phone_error_rates': [1.0], 'mean_phone_error_rate': 1.0,
|
| 72 |
'phone_feature_error_rates': [1.0416666666666667], 'mean_phone_feature_error_rate': 1.0416666666666667,
|
| 73 |
'feature_error_rates': [0.020833333333333332], 'mean_feature_error_rate': 0.020833333333333332}
|
|
|
|
| 75 |
|
| 76 |
Empty reference strings will cause an ValueError, you should handle them separately:
|
| 77 |
```python
|
| 78 |
+
>>> phone_errors.compute(predictions=["bob"], references=[""])
|
| 79 |
Traceback (most recent call last):
|
| 80 |
...
|
| 81 |
raise ValueError("one or more references are empty strings")
|
phone_errors.py
CHANGED
|
@@ -71,13 +71,13 @@ Returns:
|
|
| 71 |
|
| 72 |
Examples:
|
| 73 |
Compare articulatory differences in voicing in "bob" vs. "pop" and different pronunciations of "the":
|
| 74 |
-
>>>
|
| 75 |
-
>>>
|
| 76 |
{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333, 'phone_feature_error_rates': [0.08333333333333333, 0.125], 'mean_phone_feature_error_rate': 0.10416666666666666, 'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rate': 0.04513888888888889}
|
| 77 |
|
| 78 |
Normalize PFER by the length of string with largest number of phones:
|
| 79 |
-
>>>
|
| 80 |
-
>>>
|
| 81 |
|
| 82 |
"""
|
| 83 |
|
|
@@ -132,7 +132,7 @@ class PhoneDistance(evaluate.Metric):
|
|
| 132 |
'references': datasets.Value('string', id="sequence"),
|
| 133 |
}),
|
| 134 |
# Additional links to the codebase or references
|
| 135 |
-
codebase_urls=["https://github.com/dmort27/panphon", "https://huggingface.co/spaces/ginic/
|
| 136 |
reference_urls=["https://pypi.org/project/panphon/", "https://arxiv.org/abs/2308.03917"]
|
| 137 |
)
|
| 138 |
|
|
|
|
| 71 |
|
| 72 |
Examples:
|
| 73 |
Compare articulatory differences in voicing in "bob" vs. "pop" and different pronunciations of "the":
|
| 74 |
+
>>> phone_errors = evaluate.load("ginic/phone_errors")
|
| 75 |
+
>>> phone_errors.compute(predictions=["bob", "ði"], references=["pop", "ðə"])
|
| 76 |
{'phone_error_rates': [0.6666666666666666, 0.5], 'mean_phone_error_rate': 0.5833333333333333, 'phone_feature_error_rates': [0.08333333333333333, 0.125], 'mean_phone_feature_error_rate': 0.10416666666666666, 'feature_error_rates': [0.027777777777777776, 0.0625], 'mean_feature_error_rate': 0.04513888888888889}
|
| 77 |
|
| 78 |
Normalize PFER by the length of string with largest number of phones:
|
| 79 |
+
>>> phone_errors = evaluate.load("ginic/phone_errors")
|
| 80 |
+
>>> phone_errors.compute(predictions=["bob", "ði"], references=["pop", "ðə"], is_normalize_pfer=True)
|
| 81 |
|
| 82 |
"""
|
| 83 |
|
|
|
|
| 132 |
'references': datasets.Value('string', id="sequence"),
|
| 133 |
}),
|
| 134 |
# Additional links to the codebase or references
|
| 135 |
+
codebase_urls=["https://github.com/dmort27/panphon", "https://huggingface.co/spaces/ginic/phone_errors/tree/main"],
|
| 136 |
reference_urls=["https://pypi.org/project/panphon/", "https://arxiv.org/abs/2308.03917"]
|
| 137 |
)
|
| 138 |
|