errant_gec / README.md
marksverdhei's picture
Upload README.md with huggingface_hub
44444ec verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: ERRANT GEC
emoji: 📝
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
  - grammatical-error-correction
  - gec
description: ERRANT metric for evaluating grammatical error correction systems

ERRANT GEC Metric

ERRANT (ERRor ANnotation Toolkit) is a metric for evaluating grammatical error correction (GEC) systems.

Description

This metric computes precision, recall, and F-score by comparing the edit operations needed to transform source sentences into predictions versus the edit operations needed to transform source sentences into references.

The metric uses the ERRANT library to extract and compare edits.

Installation

pip install evaluate errant spacy
# Install the appropriate spaCy model for your language
python -m spacy download en_core_web_sm  # English
python -m spacy download nb_core_news_sm  # Norwegian

Usage

import evaluate

errant_gec = evaluate.load("marksverdhei/errant_gec")

results = errant_gec.compute(
    sources=["This are a sentence ."],
    predictions=["This is a sentence ."],
    references=["This is a sentence ."],
    lang="en"
)

print(results)
# {'precision': 1.0, 'recall': 1.0, 'f0.5': 1.0}

Inputs

  • sources (list[str]): The original (uncorrected) sentences
  • predictions (list[str]): The model's corrected sentences
  • references (list[str]): The gold standard corrected sentences
  • lang (str, optional): Language code for spaCy model. Default: "en"
    • "en": English (requires en_core_web_sm)
    • "nb": Norwegian Bokmål (requires nb_core_news_sm)
    • "de": German (requires de_core_news_sm)
    • etc. (any language with a spaCy model)
  • beta (float, optional): Beta value for F-score calculation. Default: 0.5

Outputs

  • precision (float): Fraction of predicted edits that are correct
  • recall (float): Fraction of gold edits that were predicted
  • f{beta} (float): F-score with the specified beta value (default key: f0.5)

Example with Norwegian

import evaluate

errant_gec = evaluate.load("marksverdhei/errant_gec")

results = errant_gec.compute(
    sources=["Jeg har spist mye mat i går ."],
    predictions=["Jeg spiste mye mat i går ."],
    references=["Jeg spiste mye mat i går ."],
    lang="nb"
)

Why F0.5?

In grammatical error correction, precision is typically weighted more heavily than recall (beta=0.5) because:

  • False positives (incorrect "corrections") are more harmful to the user experience
  • It's better to miss some errors than to introduce new ones

Limitations

  • Requires the appropriate spaCy model to be installed for the target language
  • ERRANT was originally designed for English; performance on other languages depends on the quality of the spaCy model
  • The metric operates at the edit level, not the sentence level

Citation

@inproceedings{bryant-etal-2017-automatic,
    title = "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction",
    author = "Bryant, Christopher  and
      Felice, Mariano  and
      Briscoe, Ted",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P17-1074",
    doi = "10.18653/v1/P17-1074",
    pages = "793--805",
}