Spaces:

nowhuggingface
/

LLM1-Fine-tuning-and-deployment

Sleeping

App Files Files Community

LLM1-Fine-tuning-and-deployment / evaluate /metrics /character /README.md

nowhuggingface

Add my folder

d733479 2 months ago

preview code

raw

history blame contribute delete

3.83 kB

	---
	title: CharacTER
	emoji: 🔤
	colorFrom: orange
	colorTo: red
	sdk: gradio
	sdk_version: 3.19.1
	app_file: app.py
	pinned: false
	tags:
	- evaluate
	- metric
	- machine-translation
	description: >-
	CharacTer is a character-level metric inspired by the commonly applied translation edit rate (TER).
	---

	# Metric Card for CharacTER

	## Metric Description
	CharacTer is a character-level metric inspired by the translation edit rate (TER) metric. It is
	defined as the minimum number of character edits required to adjust a hypothesis, until it completely matches the
	reference, normalized by the length of the hypothesis sentence. CharacTer calculates the character level edit
	distance while performing the shift edit on word level. Unlike the strict matching criterion in TER, a hypothesis
	word is considered to match a reference word and could be shifted, if the edit distance between them is below a
	threshold value. The Levenshtein distance between the reference and the shifted hypothesis sequence is computed on the
	character level. In addition, the lengths of hypothesis sequences instead of reference sequences are used for
	normalizing the edit distance, which effectively counters the issue that shorter translations normally achieve lower
	TER.

	## Intended Uses
	CharacTER was developed for machine translation evaluation.

	## How to Use

	```python
	import evaluate
	character = evaluate.load("character")

	# Single hyp/ref
	preds = ["this week the saudis denied information published in the new york times"]
	refs = ["saudi arabia denied this week information published in the american new york times"]
	results = character.compute(references=refs, predictions=preds)

	# Corpus example
	preds = ["this week the saudis denied information published in the new york times",
	"this is in fact an estimate"]
	refs = ["saudi arabia denied this week information published in the american new york times",
	"this is actually an estimate"]
	results = character.compute(references=refs, predictions=preds)
	```

	### Inputs
	- predictions: a single prediction or a list of predictions to score. Each prediction should be a string with
	tokens separated by spaces.
	- references: a single reference or a list of reference for each prediction. Each reference should be a string with
	tokens separated by spaces.


	### Output Values

	*=only when a list of references/hypotheses are given

	- count (*): how many parallel sentences were processed
	- mean (*): the mean CharacTER score
	- median (*): the median score
	- std (*): standard deviation of the score
	- min (*): smallest score
	- max (*): largest score
	- cer_scores: all scores, one per ref/hyp pair

	### Output Example
	```python
	{
	'count': 2,
	'mean': 0.3127282211789254,
	'median': 0.3127282211789254,
	'std': 0.07561653111280243,
	'min': 0.25925925925925924,
	'max': 0.36619718309859156,
	'cer_scores': [0.36619718309859156, 0.25925925925925924]
	}
	```

	## Citation
	```bibtex
	@inproceedings{wang-etal-2016-character,
	title = "{C}harac{T}er: Translation Edit Rate on Character Level",
	author = "Wang, Weiyue and
	Peter, Jan-Thorsten and
	Rosendahl, Hendrik and
	Ney, Hermann",
	booktitle = "Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers",
	month = aug,
	year = "2016",
	address = "Berlin, Germany",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/W16-2342",
	doi = "10.18653/v1/W16-2342",
	pages = "505--510",
	}
	```

	## Further References
	- Repackaged version that is used in this HF implementation: [https://github.com/bramvanroy/CharacTER](https://github.com/bramvanroy/CharacTER)
	- Original version: [https://github.com/rwth-i6/CharacTER](https://github.com/rwth-i6/CharacTER)