Spaces:
Runtime error
Runtime error
allow references to be simple list
Browse files- README.md +35 -12
- dataflow_match.py +2 -2
- my_codebleu.py +1 -1
README.md
CHANGED
|
@@ -12,25 +12,42 @@ pinned: false
|
|
| 12 |
|
| 13 |
# Metric Card for CodeBLEU
|
| 14 |
|
| 15 |
-
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
| 16 |
-
|
| 17 |
## Metric Description
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## How to Use
|
| 21 |
-
*Give general statement of how to use the metric*
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
### Inputs
|
| 26 |
-
|
| 27 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
### Output Values
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
|
|
|
|
|
| 34 |
|
| 35 |
#### Values from Popular Papers
|
| 36 |
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
|
|
@@ -39,10 +56,16 @@ pinned: false
|
|
| 39 |
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
|
| 40 |
|
| 41 |
## Limitations and Bias
|
| 42 |
-
|
| 43 |
|
| 44 |
## Citation
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
## Further References
|
| 48 |
*Add any useful further references.*
|
|
|
|
| 12 |
|
| 13 |
# Metric Card for CodeBLEU
|
| 14 |
|
|
|
|
|
|
|
| 15 |
## Metric Description
|
| 16 |
+
|
| 17 |
+
CodeBLEU from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator)
|
| 18 |
+
and from article [CodeBLEU: a Method for Automatic Evaluation of Code Synthesis](https://arxiv.org/abs/2009.10297)
|
| 19 |
+
|
| 20 |
+
NOTE: currently works on Linux machines only due to dependency on languages.so.
|
| 21 |
|
| 22 |
## How to Use
|
|
|
|
| 23 |
|
| 24 |
+
```python
|
| 25 |
+
src = 'class AcidicSwampOoze(MinionCard):Β§ def __init__(self):Β§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))Β§Β§ def create_minion(self, player):Β§ return Minion(3, 2)Β§'
|
| 26 |
+
tgt = 'class AcidSwampOoze(MinionCard):Β§ def __init__(self):Β§ super().__init__("Acidic Swamp Ooze", 2, CHARACTER_CLASS.ALL, CARD_RARITY.COMMON, battlecry=Battlecry(Destroy(), WeaponSelector(EnemyPlayer())))Β§Β§ def create_minion(self, player):Β§ return Minion(3, 2)Β§'
|
| 27 |
+
src = src.replace("Β§","\n")
|
| 28 |
+
tgt = tgt.replace("Β§","\n")
|
| 29 |
+
res = module.compute(predictions = [tgt], references = [[src]])
|
| 30 |
+
print(res)
|
| 31 |
+
#{'CodeBLEU': 0.9473264567644872, 'ngram_match_score': 0.8915993127600096, 'weighted_ngram_match_score': 0.8977065142979394, 'syntax_match_score': 1.0, 'dataflow_match_score': 1.0}
|
| 32 |
+
```
|
| 33 |
|
| 34 |
### Inputs
|
| 35 |
+
- **predictions** (`list` of `str`s): Translations to score.
|
| 36 |
+
- **references** (`list` of `list`s of `str`s): references for each translation.
|
| 37 |
+
- **lang** programming language in ['java','js','c_sharp','php','go','python','ruby']
|
| 38 |
+
- **tokenizer**: approach used for standardizing `predictions` and `references`.
|
| 39 |
+
The default tokenizer is `tokenizer_13a`, a relatively minimal tokenization approach that is however equivalent to `mteval-v13a`, used by WMT.
|
| 40 |
+
This can be replaced by another tokenizer from a source such as [SacreBLEU](https://github.com/mjpost/sacrebleu/tree/master/sacrebleu/tokenizers).
|
| 41 |
+
- **params**: str, weights for averaging(see CodeBLEU paper).
|
| 42 |
+
Defaults to equal weights "0.25,0.25,0.25,0.25".
|
| 43 |
|
| 44 |
### Output Values
|
| 45 |
|
| 46 |
+
- CodeBLEU: resulting score,
|
| 47 |
+
- ngram_match_score: See paper CodeBLEU,
|
| 48 |
+
- weighted_ngram_match_score: See paper CodeBLEU,
|
| 49 |
+
- syntax_match_score: See paper CodeBLEU,
|
| 50 |
+
- dataflow_match_score: See paper CodeBLEU,
|
| 51 |
|
| 52 |
#### Values from Popular Papers
|
| 53 |
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
|
|
|
|
| 56 |
*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
|
| 57 |
|
| 58 |
## Limitations and Bias
|
| 59 |
+
Linux OS only. See above a set of programming languages supported.
|
| 60 |
|
| 61 |
## Citation
|
| 62 |
+
```bibtex
|
| 63 |
+
@InProceedings{huggingface:module,
|
| 64 |
+
title = {CodeBLEU: A Metric for Evaluating Code Generation},
|
| 65 |
+
authors={Sedykh, Ivan},
|
| 66 |
+
year={2022}
|
| 67 |
+
}
|
| 68 |
+
```
|
| 69 |
|
| 70 |
## Further References
|
| 71 |
*Add any useful further references.*
|
dataflow_match.py
CHANGED
|
@@ -36,11 +36,11 @@ def corpus_dataflow_match(references, candidates, lang, langso_dir):
|
|
| 36 |
candidate = candidates[i]
|
| 37 |
for reference in references_sample:
|
| 38 |
try:
|
| 39 |
-
candidate=remove_comments_and_docstrings(candidate,
|
| 40 |
except:
|
| 41 |
pass
|
| 42 |
try:
|
| 43 |
-
reference=remove_comments_and_docstrings(reference,
|
| 44 |
except:
|
| 45 |
pass
|
| 46 |
|
|
|
|
| 36 |
candidate = candidates[i]
|
| 37 |
for reference in references_sample:
|
| 38 |
try:
|
| 39 |
+
candidate=remove_comments_and_docstrings(candidate,lang)
|
| 40 |
except:
|
| 41 |
pass
|
| 42 |
try:
|
| 43 |
+
reference=remove_comments_and_docstrings(reference,lang)
|
| 44 |
except:
|
| 45 |
pass
|
| 46 |
|
my_codebleu.py
CHANGED
|
@@ -24,7 +24,7 @@ def calc_codebleu(predictions, references, lang, tokenizer=None, params='0.25,0.
|
|
| 24 |
alpha, beta, gamma, theta = [float(x) for x in params.split(',')]
|
| 25 |
|
| 26 |
# preprocess inputs
|
| 27 |
-
references = [[x.strip() for x in ref] for ref in references]
|
| 28 |
hypothesis = [x.strip() for x in predictions]
|
| 29 |
|
| 30 |
if not len(references) == len(hypothesis):
|
|
|
|
| 24 |
alpha, beta, gamma, theta = [float(x) for x in params.split(',')]
|
| 25 |
|
| 26 |
# preprocess inputs
|
| 27 |
+
references = [[x.strip() for x in ref] if type(ref) == list else [ref.strip()] for ref in references]
|
| 28 |
hypothesis = [x.strip() for x in predictions]
|
| 29 |
|
| 30 |
if not len(references) == len(hypothesis):
|