Spaces:

marksverdhei
/

errant_gec

No application file

App Files Files Community

errant_gec / README.md

marksverdhei

Upload README.md with huggingface_hub

44444ec verified 2 months ago

preview code

raw

history blame contribute delete

3.6 kB

	---
	title: ERRANT GEC
	emoji: "📝"
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 3.19.1
	app_file: app.py
	pinned: false
	tags:
	- evaluate
	- metric
	- grammatical-error-correction
	- gec
	description: ERRANT metric for evaluating grammatical error correction systems
	---

	# ERRANT GEC Metric

	ERRANT (ERRor ANnotation Toolkit) is a metric for evaluating grammatical error correction (GEC) systems.

	## Description

	This metric computes precision, recall, and F-score by comparing the edit operations needed to transform source sentences into predictions versus the edit operations needed to transform source sentences into references.

	The metric uses the [ERRANT library](https://github.com/chrisjbryant/errant) to extract and compare edits.

	## Installation

	```bash
	pip install evaluate errant spacy
	# Install the appropriate spaCy model for your language
	python -m spacy download en_core_web_sm # English
	python -m spacy download nb_core_news_sm # Norwegian
	```

	## Usage

	```python
	import evaluate

	errant_gec = evaluate.load("marksverdhei/errant_gec")

	results = errant_gec.compute(
	sources=["This are a sentence ."],
	predictions=["This is a sentence ."],
	references=["This is a sentence ."],
	lang="en"
	)

	print(results)
	# {'precision': 1.0, 'recall': 1.0, 'f0.5': 1.0}
	```

	## Inputs

	- sources (`list[str]`): The original (uncorrected) sentences
	- predictions (`list[str]`): The model's corrected sentences
	- references (`list[str]`): The gold standard corrected sentences
	- lang (`str`, optional): Language code for spaCy model. Default: `"en"`
	- `"en"`: English (requires `en_core_web_sm`)
	- `"nb"`: Norwegian Bokmål (requires `nb_core_news_sm`)
	- `"de"`: German (requires `de_core_news_sm`)
	- etc. (any language with a spaCy model)
	- beta (`float`, optional): Beta value for F-score calculation. Default: `0.5`

	## Outputs

	- precision (`float`): Fraction of predicted edits that are correct
	- recall (`float`): Fraction of gold edits that were predicted
	- f{beta} (`float`): F-score with the specified beta value (default key: `f0.5`)

	## Example with Norwegian

	```python
	import evaluate

	errant_gec = evaluate.load("marksverdhei/errant_gec")

	results = errant_gec.compute(
	sources=["Jeg har spist mye mat i går ."],
	predictions=["Jeg spiste mye mat i går ."],
	references=["Jeg spiste mye mat i går ."],
	lang="nb"
	)
	```

	## Why F0.5?

	In grammatical error correction, precision is typically weighted more heavily than recall (beta=0.5) because:
	- False positives (incorrect "corrections") are more harmful to the user experience
	- It's better to miss some errors than to introduce new ones

	## Limitations

	- Requires the appropriate spaCy model to be installed for the target language
	- ERRANT was originally designed for English; performance on other languages depends on the quality of the spaCy model
	- The metric operates at the edit level, not the sentence level

	## Citation

	```bibtex
	@inproceedings{bryant-etal-2017-automatic,
	title = "Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction",
	author = "Bryant, Christopher and
	Felice, Mariano and
	Briscoe, Ted",
	booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
	month = jul,
	year = "2017",
	address = "Vancouver, Canada",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/P17-1074",
	doi = "10.18653/v1/P17-1074",
	pages = "793--805",
	}
	```