Spaces:
Runtime error
Runtime error
| title: codebleu | |
| tags: | |
| - evaluate | |
| - metric | |
| description: "CodeBLEU" | |
| sdk: gradio | |
| sdk_version: 3.0.2 | |
| app_file: app.py | |
| pinned: false | |
| # Metric Card for CodeBLEU | |
| ## Metric Description | |
| CodeBLEU from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator) | |
| and from article [CodeBLEU: a Method for Automatic Evaluation of Code Synthesis](https://arxiv.org/abs/2009.10297) | |
| NOTE: currently works on Linux machines only due to dependency from languages .so | |
| ## How to Use | |
| ```python | |
| module = evaluate.load("dvitel/codebleu") | |
| src = 'class AcidicSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§' | |
| tgt = 'class AcidSwampOoze(MinionCard):§ def __init__(self):§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))§§ def create_minion(self, player):§ return Minion(3, 2)§' | |
| src = src.replace("§","\n") | |
| tgt = tgt.replace("§","\n") | |
| res = module.compute(predictions = [tgt], references = [[src]]) | |
| print(res) | |
| #{'CodeBLEU': 0.9473264567644872, 'ngram_match_score': 0.8915993127600096, 'weighted_ngram_match_score': 0.8977065142979394, 'syntax_match_score': 1.0, 'dataflow_match_score': 1.0} | |
| ``` | |
| ### Inputs | |
| - **predictions** (`list` of `str`s): Translations to score. | |
| - **references** (`list` of `list`s of `str`s): references for each translation. | |
| - **lang** programming language in ['java','js','c_sharp','php','go','python','ruby'] | |
| - **tokenizer**: approach used for standardizing `predictions` and `references`. | |
| The default tokenizer is `tokenizer_13a`, a relatively minimal tokenization approach that is however equivalent to `mteval-v13a`, used by WMT. | |
| This can be replaced by another tokenizer from a source such as [SacreBLEU](https://github.com/mjpost/sacrebleu/tree/master/sacrebleu/tokenizers). | |
| - **params**: str, weights for averaging(see CodeBLEU paper). | |
| Defaults to equal weights "0.25,0.25,0.25,0.25". | |
| ### Output Values | |
| - CodeBLEU: resulting score, | |
| - ngram_match_score: See paper CodeBLEU, | |
| - weighted_ngram_match_score: See paper CodeBLEU, | |
| - syntax_match_score: See paper CodeBLEU, | |
| - dataflow_match_score: See paper CodeBLEU, | |
| #### Values from Popular Papers | |
| *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.* | |
| ### Examples | |
| *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* | |
| ## Limitations and Bias | |
| Linux OS only. See above a set of programming languages supported. | |
| ## Citation | |
| ```bibtex | |
| @InProceedings{huggingface:module, | |
| title = {CodeBLEU: A Metric for Evaluating Code Generation}, | |
| authors={Sedykh, Ivan}, | |
| year={2022} | |
| } | |
| ``` | |
| ## Further References | |
| *Add any useful further references.* | |