mindware
/

arc-codet5-660m-scr

Model card Files Files and versions

mindware commited on Oct 29, 2025

Commit

e3a9e28

·

verified ·

1 Parent(s): 8d3ec52

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ datasets:
 - Dahoas/instruct-synthetic-prompt-responses
 - pankajmathur/WizardLM_Orca
 ---
-This is the second model in the ensemble for the MindsAI @ Tufa Labs team for the ARC Prize 2025 competition.  It was originally based on the CodeT5 model from Salesforce.  It was modified to have 16 layers in the decoder from the original 24 layers.  Testing demonstrated that removing layers was more harmful to performance when removed from the encoder, but was able to fully recover when removing decoder layers.
 - **Span-Corruption Refinement (SCR)**: The model was trained with an additional
 - pretraining objective I call SCR (chosen because of the model's deep history of

 - Dahoas/instruct-synthetic-prompt-responses
 - pankajmathur/WizardLM_Orca
 ---
+This is the second model in the ensemble for the MindsAI @ Tufa Labs team for the ARC Prize 2025 competition.  It was originally based on the CodeT5 model from Salesforce.  It was modified to have 16 layers in the decoder from the original 24 layers.  Testing demonstrated that removing layers was more harmful to performance when removed from the encoder, but was able to fully recover when removing decoder layers.  The model has been trained for approximately 2 years on a V4-64 TPU.  Google TPU Research cloud was very generous to provide TPUs for training and research.  It would have been impossible to develop TTFT, AIRV, train the models, and many other things without the generosity of Google and the TPU Research Cloud program.
 - **Span-Corruption Refinement (SCR)**: The model was trained with an additional
 - pretraining objective I call SCR (chosen because of the model's deep history of