Spaces:
No application file
No application file
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: ERRANT GEC
emoji: 📝
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
- grammatical-error-correction
- gec
description: ERRANT metric for evaluating grammatical error correction systems
ERRANT GEC Metric
ERRANT (ERRor ANnotation Toolkit) is a metric for evaluating grammatical error correction (GEC) systems.
Description
This metric computes precision, recall, and F-score by comparing the edit operations needed to transform source sentences into predictions versus the edit operations needed to transform source sentences into references.
The metric uses the ERRANT library to extract and compare edits.
Installation
pip install evaluate errant spacy
# Install the appropriate spaCy model for your language
python -m spacy download en_core_web_sm # English
python -m spacy download nb_core_news_sm # Norwegian
Usage
import evaluate
errant_gec = evaluate.load("marksverdhei/errant_gec")
results = errant_gec.compute(
sources=["This are a sentence ."],
predictions=["This is a sentence ."],
references=["This is a sentence ."],
lang="en"
)
print(results)
# {'precision': 1.0, 'recall': 1.0, 'f0.5': 1.0}
Inputs
- sources (
list[str]): The original (uncorrected) sentences - predictions (
list[str]): The model's corrected sentences - references (
list[str]): The gold standard corrected sentences - lang (
str, optional): Language code for spaCy model. Default:"en""en": English (requiresen_core_web_sm)"nb": Norwegian Bokmål (requiresnb_core_news_sm)"de": German (requiresde_core_news_sm)- etc. (any language with a spaCy model)
- beta (
float, optional): Beta value for F-score calculation. Default:0.5
Outputs
- precision (
float): Fraction of predicted edits that are correct - recall (
float): Fraction of gold edits that were predicted - f{beta} (
float): F-score with the specified beta value (default key:f0.5)
Example with Norwegian
import evaluate
errant_gec = evaluate.load("marksverdhei/errant_gec")
results = errant_gec.compute(
sources=["Jeg har spist mye mat i går ."],
predictions=["Jeg spiste mye mat i går ."],
references=["Jeg spiste mye mat i går ."],
lang="nb"
)
Why F0.5?
In grammatical error correction, precision is typically weighted more heavily than recall (beta=0.5) because:
- False positives (incorrect "corrections") are more harmful to the user experience
- It's better to miss some errors than to introduce new ones
Limitations
- Requires the appropriate spaCy model to be installed for the target language
- ERRANT was originally designed for English; performance on other languages depends on the quality of the spaCy model
- The metric operates at the edit level, not the sentence level
Citation
@inproceedings{bryant-etal-2017-automatic,
title = "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction",
author = "Bryant, Christopher and
Felice, Mariano and
Briscoe, Ted",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P17-1074",
doi = "10.18653/v1/P17-1074",
pages = "793--805",
}