Spaces:

marksverdhei
/

errant_gec

No application file

App Files Files Community

errant_gec / README.md

marksverdhei

Upload README.md with huggingface_hub

44444ec verified 2 months ago

preview code

raw

history blame contribute delete

3.6 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: ERRANT GEC
emoji: 📝
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
  - grammatical-error-correction
  - gec
description: ERRANT metric for evaluating grammatical error correction systems

ERRANT GEC Metric

ERRANT (ERRor ANnotation Toolkit) is a metric for evaluating grammatical error correction (GEC) systems.

Description

This metric computes precision, recall, and F-score by comparing the edit operations needed to transform source sentences into predictions versus the edit operations needed to transform source sentences into references.

The metric uses the ERRANT library to extract and compare edits.

Installation

pip install evaluate errant spacy
# Install the appropriate spaCy model for your language
python -m spacy download en_core_web_sm  # English
python -m spacy download nb_core_news_sm  # Norwegian

Usage

import evaluate

errant_gec = evaluate.load("marksverdhei/errant_gec")

results = errant_gec.compute(
    sources=["This are a sentence ."],
    predictions=["This is a sentence ."],
    references=["This is a sentence ."],
    lang="en"
)

print(results)
# {'precision': 1.0, 'recall': 1.0, 'f0.5': 1.0}

Inputs

sources (list[str]): The original (uncorrected) sentences
predictions (list[str]): The model's corrected sentences
references (list[str]): The gold standard corrected sentences
lang (str, optional): Language code for spaCy model. Default: "en"
- "en": English (requires en_core_web_sm)
- "nb": Norwegian Bokmål (requires nb_core_news_sm)
- "de": German (requires de_core_news_sm)
- etc. (any language with a spaCy model)
beta (float, optional): Beta value for F-score calculation. Default: 0.5

Outputs

precision (float): Fraction of predicted edits that are correct
recall (float): Fraction of gold edits that were predicted
f{beta} (float): F-score with the specified beta value (default key: f0.5)

Example with Norwegian

import evaluate

errant_gec = evaluate.load("marksverdhei/errant_gec")

results = errant_gec.compute(
    sources=["Jeg har spist mye mat i går ."],
    predictions=["Jeg spiste mye mat i går ."],
    references=["Jeg spiste mye mat i går ."],
    lang="nb"
)

Why F0.5?

In grammatical error correction, precision is typically weighted more heavily than recall (beta=0.5) because:

False positives (incorrect "corrections") are more harmful to the user experience
It's better to miss some errors than to introduce new ones

Limitations

Requires the appropriate spaCy model to be installed for the target language
ERRANT was originally designed for English; performance on other languages depends on the quality of the spaCy model
The metric operates at the edit level, not the sentence level

Citation

@inproceedings{bryant-etal-2017-automatic,
    title = "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction",
    author = "Bryant, Christopher  and
      Felice, Mariano  and
      Briscoe, Ted",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P17-1074",
    doi = "10.18653/v1/P17-1074",
    pages = "793--805",
}