vkerkez
/

GitVac-R-32B

Model card Files Files and versions

vkerkez commited on Mar 4, 2025

Commit

13db494

·

verified ·

1 Parent(s): 84fd90f

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-license: cc-by-nc-2.0
 ---
 # GitVac
 Don't forget to vacuum your git repo.
@@ -13,7 +13,11 @@ GitVac is like a vacuum cleaner for code fixes. It's a series of 3B, 8B, 14B, an
 # How were the models made?
- I distilled samples from r1 through multiple rounds of trial and error. About 2.4k questions fired off, with 1.1k making the verification cut. My rough estimate puts it around 45%.
 # How is verification done?
 A lot of models are already trained on function calling syntax.
@@ -540,6 +544,4 @@ This does a few things:
 3. Updates the reasoning to further improve its reasoning
 With this dataset, we can fine-tune to get a base model. This model can then be further improved through RLHF (Reinforcement Learning from Human Feedback) and GRPO (Guided Reward Policy Optimization) training, where it will continuously learn from new datasets generated by the pipeline. This creates a virtuous cycle of improvement, with each iteration building on the knowledge gained from previous runs.
-I should probably write up a whole separate post on this extended pipeline someday. For now enjoy this repo!
-Disclaimer: These are purely made for this sience project and were not meant to be used comercially.

 ---
+license: apache-2.0
 ---
 # GitVac
 Don't forget to vacuum your git repo.
 # How were the models made?
+ I distilled samples from r1 through multiple rounds of trial and error. About 2.4k questions fired off, with 1.1k making the verification cut. My rough estimate puts r1 around 15% after multiple tries.
+# Training Data
+*The data to train the models came from r1 outputs via distalation.*
+However to gauge accuracuy of the model o3/r1 were used to do eval with the same prompts.
 # How is verification done?
 A lot of models are already trained on function calling syntax.
 3. Updates the reasoning to further improve its reasoning
 With this dataset, we can fine-tune to get a base model. This model can then be further improved through RLHF (Reinforcement Learning from Human Feedback) and GRPO (Guided Reward Policy Optimization) training, where it will continuously learn from new datasets generated by the pipeline. This creates a virtuous cycle of improvement, with each iteration building on the knowledge gained from previous runs.
+I should probably write up a whole separate post on this extended pipeline someday. For now enjoy this repo!