Spaces:
Runtime error
Runtime error
| title: action_generation | |
| datasets: | |
| - none | |
| tags: | |
| - evaluate | |
| - metric | |
| description: 'TODO: add a description here' | |
| sdk: gradio | |
| sdk_version: 5.6.0 | |
| app_file: app.py | |
| pinned: false | |
| # Metric Card for action_generation | |
| ## Metric Description | |
| Evaluate the result of action generation task. | |
| Consider the output format `/class/phrase`. Compute the scores for both `/class` and `phrase` separately, and then perform a weighted sum of these scores. | |
| ## How to Use | |
| ```python | |
| import evaluate | |
| valid_labels = [ | |
| "/開箱", | |
| "/教學", | |
| "/表達", | |
| "/分享/外部資訊", | |
| "/分享/個人資訊", | |
| "/推薦/產品", | |
| "/推薦/服務", | |
| "/推薦/其他", | |
| "" | |
| ] | |
| predictions = [ | |
| ["/開箱/xxx", "/教學/yyy", "/表達/zzz"], | |
| ["/分享/外部資訊/aaa", "/教學/yyy", "/表達/zzz", "/分享/個人資訊/bbb"] | |
| ] | |
| references = [ | |
| ["/開箱/xxx", "/教學/yyy", "/表達/zzz"], | |
| ["/推薦/產品/bbb", "/教學/yyy", "/表達/zzz"] | |
| ] | |
| metric = evaluate.load("DarrenChensformer/action_generation") | |
| result = metric.compute(predictions=predictions, references=references, valid_labels=valid_labels, detailed_scores=True) | |
| print(result) | |
| ``` | |
| ``` | |
| {'class': {'precision': 0.7143, 'recall': 0.8333, 'f1': 0.7692}, 'phrase': {'precision': 0.8571, 'recall': 1.0, 'f1': 0.9231}, 'weighted_sum': {'precision': 0.7429, 'recall': 0.8666, 'f1': 0.8}} | |
| ``` | |
| ### Inputs | |
| *List all input arguments in the format below* | |
| - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).* | |
| ### Output Values | |
| *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}* | |
| *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."* | |
| ### Examples | |
| *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.* | |
| ## Limitations and Bias | |
| *Note any known limitations or biases that the metric has, with links and references if possible.* | |
| ## Citation | |
| *Cite the source where this metric was introduced.* | |
| ## Further References | |
| *Add any useful further references.* |