errant_gec / README.md
marksverdhei's picture
Upload README.md with huggingface_hub
44444ec verified
---
title: ERRANT GEC
emoji: "📝"
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
- grammatical-error-correction
- gec
description: ERRANT metric for evaluating grammatical error correction systems
---
# ERRANT GEC Metric
ERRANT (ERRor ANnotation Toolkit) is a metric for evaluating grammatical error correction (GEC) systems.
## Description
This metric computes precision, recall, and F-score by comparing the edit operations needed to transform source sentences into predictions versus the edit operations needed to transform source sentences into references.
The metric uses the [ERRANT library](https://github.com/chrisjbryant/errant) to extract and compare edits.
## Installation
```bash
pip install evaluate errant spacy
# Install the appropriate spaCy model for your language
python -m spacy download en_core_web_sm # English
python -m spacy download nb_core_news_sm # Norwegian
```
## Usage
```python
import evaluate
errant_gec = evaluate.load("marksverdhei/errant_gec")
results = errant_gec.compute(
sources=["This are a sentence ."],
predictions=["This is a sentence ."],
references=["This is a sentence ."],
lang="en"
)
print(results)
# {'precision': 1.0, 'recall': 1.0, 'f0.5': 1.0}
```
## Inputs
- **sources** (`list[str]`): The original (uncorrected) sentences
- **predictions** (`list[str]`): The model's corrected sentences
- **references** (`list[str]`): The gold standard corrected sentences
- **lang** (`str`, optional): Language code for spaCy model. Default: `"en"`
- `"en"`: English (requires `en_core_web_sm`)
- `"nb"`: Norwegian Bokmål (requires `nb_core_news_sm`)
- `"de"`: German (requires `de_core_news_sm`)
- etc. (any language with a spaCy model)
- **beta** (`float`, optional): Beta value for F-score calculation. Default: `0.5`
## Outputs
- **precision** (`float`): Fraction of predicted edits that are correct
- **recall** (`float`): Fraction of gold edits that were predicted
- **f{beta}** (`float`): F-score with the specified beta value (default key: `f0.5`)
## Example with Norwegian
```python
import evaluate
errant_gec = evaluate.load("marksverdhei/errant_gec")
results = errant_gec.compute(
sources=["Jeg har spist mye mat i går ."],
predictions=["Jeg spiste mye mat i går ."],
references=["Jeg spiste mye mat i går ."],
lang="nb"
)
```
## Why F0.5?
In grammatical error correction, precision is typically weighted more heavily than recall (beta=0.5) because:
- False positives (incorrect "corrections") are more harmful to the user experience
- It's better to miss some errors than to introduce new ones
## Limitations
- Requires the appropriate spaCy model to be installed for the target language
- ERRANT was originally designed for English; performance on other languages depends on the quality of the spaCy model
- The metric operates at the edit level, not the sentence level
## Citation
```bibtex
@inproceedings{bryant-etal-2017-automatic,
title = "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction",
author = "Bryant, Christopher and
Felice, Mariano and
Briscoe, Ted",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P17-1074",
doi = "10.18653/v1/P17-1074",
pages = "793--805",
}
```