mindware commited on
Commit
e3a9e28
·
verified ·
1 Parent(s): 8d3ec52

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -30,7 +30,7 @@ datasets:
30
  - Dahoas/instruct-synthetic-prompt-responses
31
  - pankajmathur/WizardLM_Orca
32
  ---
33
- This is the second model in the ensemble for the MindsAI @ Tufa Labs team for the ARC Prize 2025 competition. It was originally based on the CodeT5 model from Salesforce. It was modified to have 16 layers in the decoder from the original 24 layers. Testing demonstrated that removing layers was more harmful to performance when removed from the encoder, but was able to fully recover when removing decoder layers.
34
 
35
  - **Span-Corruption Refinement (SCR)**: The model was trained with an additional
36
  - pretraining objective I call SCR (chosen because of the model's deep history of
 
30
  - Dahoas/instruct-synthetic-prompt-responses
31
  - pankajmathur/WizardLM_Orca
32
  ---
33
+ This is the second model in the ensemble for the MindsAI @ Tufa Labs team for the ARC Prize 2025 competition. It was originally based on the CodeT5 model from Salesforce. It was modified to have 16 layers in the decoder from the original 24 layers. Testing demonstrated that removing layers was more harmful to performance when removed from the encoder, but was able to fully recover when removing decoder layers. The model has been trained for approximately 2 years on a V4-64 TPU. Google TPU Research cloud was very generous to provide TPUs for training and research. It would have been impossible to develop TTFT, AIRV, train the models, and many other things without the generosity of Google and the TPU Research Cloud program.
34
 
35
  - **Span-Corruption Refinement (SCR)**: The model was trained with an additional
36
  - pretraining objective I call SCR (chosen because of the model's deep history of