Spaces:
No application file
No application file
| title: ERRANT GEC | |
| emoji: "📝" | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 3.19.1 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - evaluate | |
| - metric | |
| - grammatical-error-correction | |
| - gec | |
| description: ERRANT metric for evaluating grammatical error correction systems | |
| # ERRANT GEC Metric | |
| ERRANT (ERRor ANnotation Toolkit) is a metric for evaluating grammatical error correction (GEC) systems. | |
| ## Description | |
| This metric computes precision, recall, and F-score by comparing the edit operations needed to transform source sentences into predictions versus the edit operations needed to transform source sentences into references. | |
| The metric uses the [ERRANT library](https://github.com/chrisjbryant/errant) to extract and compare edits. | |
| ## Installation | |
| ```bash | |
| pip install evaluate errant spacy | |
| # Install the appropriate spaCy model for your language | |
| python -m spacy download en_core_web_sm # English | |
| python -m spacy download nb_core_news_sm # Norwegian | |
| ``` | |
| ## Usage | |
| ```python | |
| import evaluate | |
| errant_gec = evaluate.load("marksverdhei/errant_gec") | |
| results = errant_gec.compute( | |
| sources=["This are a sentence ."], | |
| predictions=["This is a sentence ."], | |
| references=["This is a sentence ."], | |
| lang="en" | |
| ) | |
| print(results) | |
| # {'precision': 1.0, 'recall': 1.0, 'f0.5': 1.0} | |
| ``` | |
| ## Inputs | |
| - **sources** (`list[str]`): The original (uncorrected) sentences | |
| - **predictions** (`list[str]`): The model's corrected sentences | |
| - **references** (`list[str]`): The gold standard corrected sentences | |
| - **lang** (`str`, optional): Language code for spaCy model. Default: `"en"` | |
| - `"en"`: English (requires `en_core_web_sm`) | |
| - `"nb"`: Norwegian Bokmål (requires `nb_core_news_sm`) | |
| - `"de"`: German (requires `de_core_news_sm`) | |
| - etc. (any language with a spaCy model) | |
| - **beta** (`float`, optional): Beta value for F-score calculation. Default: `0.5` | |
| ## Outputs | |
| - **precision** (`float`): Fraction of predicted edits that are correct | |
| - **recall** (`float`): Fraction of gold edits that were predicted | |
| - **f{beta}** (`float`): F-score with the specified beta value (default key: `f0.5`) | |
| ## Example with Norwegian | |
| ```python | |
| import evaluate | |
| errant_gec = evaluate.load("marksverdhei/errant_gec") | |
| results = errant_gec.compute( | |
| sources=["Jeg har spist mye mat i går ."], | |
| predictions=["Jeg spiste mye mat i går ."], | |
| references=["Jeg spiste mye mat i går ."], | |
| lang="nb" | |
| ) | |
| ``` | |
| ## Why F0.5? | |
| In grammatical error correction, precision is typically weighted more heavily than recall (beta=0.5) because: | |
| - False positives (incorrect "corrections") are more harmful to the user experience | |
| - It's better to miss some errors than to introduce new ones | |
| ## Limitations | |
| - Requires the appropriate spaCy model to be installed for the target language | |
| - ERRANT was originally designed for English; performance on other languages depends on the quality of the spaCy model | |
| - The metric operates at the edit level, not the sentence level | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{bryant-etal-2017-automatic, | |
| title = "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction", | |
| author = "Bryant, Christopher and | |
| Felice, Mariano and | |
| Briscoe, Ted", | |
| booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", | |
| month = jul, | |
| year = "2017", | |
| address = "Vancouver, Canada", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/P17-1074", | |
| doi = "10.18653/v1/P17-1074", | |
| pages = "793--805", | |
| } | |
| ``` | |