codewraith / data /eval_report_comparison_v2.md
slenk's picture
Upload folder using huggingface_hub
eeef81e verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

CodeWraith Model Evaluation Report

Summary

Metric CodeWraith-3b-v2 (Llama-3.2-3B-Instruct) CodeWraith-8b-v2 (Llama-3.1-8B-Instruct)
Avg Structural Score 0.93 0.92
Function Coverage 0.84 0.85
Class Coverage 0.97 0.84
Argument Coverage 0.91 0.93
Return Type Coverage 0.97 0.97
Good Scores (>=80%) 25 24
Avg Inference Time (s) 20.01 21.91

CodeWraith-3b-v2 (Llama-3.2-3B-Instruct)

  • Examples evaluated: 31
  • Valid (parseable): 28
  • Perfect scores: 15
  • Total inference time: 620.2s

CodeWraith-8b-v2 (Llama-3.1-8B-Instruct)

  • Examples evaluated: 31
  • Valid (parseable): 28
  • Perfect scores: 15
  • Total inference time: 679.2s