marksverdhei commited on
Commit
44444ec
·
verified ·
1 Parent(s): cfc0751

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +111 -6
README.md CHANGED
@@ -1,12 +1,117 @@
1
  ---
2
- title: Errant Gec
3
- emoji: 👀
4
- colorFrom: green
5
- colorTo: pink
6
  sdk: gradio
7
- sdk_version: 6.0.2
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ERRANT GEC
3
+ emoji: "📝"
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 3.19.1
8
  app_file: app.py
9
  pinned: false
10
+ tags:
11
+ - evaluate
12
+ - metric
13
+ - grammatical-error-correction
14
+ - gec
15
+ description: ERRANT metric for evaluating grammatical error correction systems
16
  ---
17
 
18
+ # ERRANT GEC Metric
19
+
20
+ ERRANT (ERRor ANnotation Toolkit) is a metric for evaluating grammatical error correction (GEC) systems.
21
+
22
+ ## Description
23
+
24
+ This metric computes precision, recall, and F-score by comparing the edit operations needed to transform source sentences into predictions versus the edit operations needed to transform source sentences into references.
25
+
26
+ The metric uses the [ERRANT library](https://github.com/chrisjbryant/errant) to extract and compare edits.
27
+
28
+ ## Installation
29
+
30
+ ```bash
31
+ pip install evaluate errant spacy
32
+ # Install the appropriate spaCy model for your language
33
+ python -m spacy download en_core_web_sm # English
34
+ python -m spacy download nb_core_news_sm # Norwegian
35
+ ```
36
+
37
+ ## Usage
38
+
39
+ ```python
40
+ import evaluate
41
+
42
+ errant_gec = evaluate.load("marksverdhei/errant_gec")
43
+
44
+ results = errant_gec.compute(
45
+ sources=["This are a sentence ."],
46
+ predictions=["This is a sentence ."],
47
+ references=["This is a sentence ."],
48
+ lang="en"
49
+ )
50
+
51
+ print(results)
52
+ # {'precision': 1.0, 'recall': 1.0, 'f0.5': 1.0}
53
+ ```
54
+
55
+ ## Inputs
56
+
57
+ - **sources** (`list[str]`): The original (uncorrected) sentences
58
+ - **predictions** (`list[str]`): The model's corrected sentences
59
+ - **references** (`list[str]`): The gold standard corrected sentences
60
+ - **lang** (`str`, optional): Language code for spaCy model. Default: `"en"`
61
+ - `"en"`: English (requires `en_core_web_sm`)
62
+ - `"nb"`: Norwegian Bokmål (requires `nb_core_news_sm`)
63
+ - `"de"`: German (requires `de_core_news_sm`)
64
+ - etc. (any language with a spaCy model)
65
+ - **beta** (`float`, optional): Beta value for F-score calculation. Default: `0.5`
66
+
67
+ ## Outputs
68
+
69
+ - **precision** (`float`): Fraction of predicted edits that are correct
70
+ - **recall** (`float`): Fraction of gold edits that were predicted
71
+ - **f{beta}** (`float`): F-score with the specified beta value (default key: `f0.5`)
72
+
73
+ ## Example with Norwegian
74
+
75
+ ```python
76
+ import evaluate
77
+
78
+ errant_gec = evaluate.load("marksverdhei/errant_gec")
79
+
80
+ results = errant_gec.compute(
81
+ sources=["Jeg har spist mye mat i går ."],
82
+ predictions=["Jeg spiste mye mat i går ."],
83
+ references=["Jeg spiste mye mat i går ."],
84
+ lang="nb"
85
+ )
86
+ ```
87
+
88
+ ## Why F0.5?
89
+
90
+ In grammatical error correction, precision is typically weighted more heavily than recall (beta=0.5) because:
91
+ - False positives (incorrect "corrections") are more harmful to the user experience
92
+ - It's better to miss some errors than to introduce new ones
93
+
94
+ ## Limitations
95
+
96
+ - Requires the appropriate spaCy model to be installed for the target language
97
+ - ERRANT was originally designed for English; performance on other languages depends on the quality of the spaCy model
98
+ - The metric operates at the edit level, not the sentence level
99
+
100
+ ## Citation
101
+
102
+ ```bibtex
103
+ @inproceedings{bryant-etal-2017-automatic,
104
+ title = "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction",
105
+ author = "Bryant, Christopher and
106
+ Felice, Mariano and
107
+ Briscoe, Ted",
108
+ booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
109
+ month = jul,
110
+ year = "2017",
111
+ address = "Vancouver, Canada",
112
+ publisher = "Association for Computational Linguistics",
113
+ url = "https://aclanthology.org/P17-1074",
114
+ doi = "10.18653/v1/P17-1074",
115
+ pages = "793--805",
116
+ }
117
+ ```