GiuLeo01 commited on
Commit
4a73058
·
verified ·
1 Parent(s): 42e2fc0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -29,6 +29,13 @@ According to the current demo version of the FortranHumanEval benchmark:
29
  | GPT-4o mini | 18.90% | 43.90% |
30
  | GPT-4o | 32.31% | 17.07% |
31
 
 
 
 
 
 
 
 
32
 
33
  ## Uses
34
 
 
29
  | GPT-4o mini | 18.90% | 43.90% |
30
  | GPT-4o | 32.31% | 17.07% |
31
 
32
+ Compared to its base model (Qwen 2.5 Coder 3B Instruct), FortranCodeGen 3B shows a strong improvement, increasing pass@1 accuracy from 5.48% to 23.17% and reducing the compile error rate from 63.41% to 17.68%. This highlights the effectiveness of this simple fine-tuning process, even though it was performed with limited resources: no human-labeled data, small synthetic dataset, and training on a single consumer GPU (L4 :'( ).
33
+
34
+ When compared to GPT-4o mini, FortranCodeGen 3B outperforms it in terms of both pass@1 accuracy (23.17% vs. 18.90%) and compile reliability (17.68% vs. 43.90%). This suggests that task-specific fine-tuning can produce better results than more general, (probably) larger models.
35
+
36
+ While it doesn't yet match the overall performance of GPT-4o, which achieves 32.31% pass@1, FortranCodeGen 3B reaches a comparable level of compilation correctness (17.68% vs. 17.07%), suggesting that its outputs are syntactically robust and close to executable, even when they don’t solve the full task.
37
+
38
+ These results confirm that targeted specialization can significantly enhance model performance on underrepresented tasks, and suggest a promising direction for very-low-resource fine-tuning in legacy or niche programming languages.
39
 
40
  ## Uses
41