GiuLeo01
/

FortranCodeGen-3B-SynthData

Text Generation

reinforcement learning

text-generation-inference

Model card Files Files and versions

GiuLeo01 commited on May 19

Commit

4a73058

·

verified ·

1 Parent(s): 42e2fc0

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -29,6 +29,13 @@ According to the current demo version of the FortranHumanEval benchmark:
 | GPT-4o mini           |    18.90%   |           43.90%        |
 | GPT-4o                |    32.31%   |        17.07%           |
 ## Uses

 | GPT-4o mini           |    18.90%   |           43.90%        |
 | GPT-4o                |    32.31%   |        17.07%           |
+Compared to its base model (Qwen 2.5 Coder 3B Instruct), FortranCodeGen 3B shows a strong improvement, increasing pass@1 accuracy from 5.48% to 23.17% and reducing the compile error rate from 63.41% to 17.68%. This highlights the effectiveness of this simple fine-tuning process, even though it was performed with limited resources: no human-labeled data, small synthetic dataset, and training on a single consumer GPU (L4 :'( ).
+When compared to GPT-4o mini, FortranCodeGen 3B outperforms it in terms of both pass@1 accuracy (23.17% vs. 18.90%) and compile reliability (17.68% vs. 43.90%). This suggests that task-specific fine-tuning can produce better results than more general, (probably) larger models.
+While it doesn't yet match the overall performance of GPT-4o, which achieves 32.31% pass@1, FortranCodeGen 3B reaches a comparable level of compilation correctness (17.68% vs. 17.07%), suggesting that its outputs are syntactically robust and close to executable, even when they don’t solve the full task.
+These results confirm that targeted specialization can significantly enhance model performance on underrepresented tasks, and suggest a promising direction for very-low-resource fine-tuning in legacy or niche programming languages.
 ## Uses