MarioBarbeque commited on
Commit
9b6013a
·
verified ·
1 Parent(s): d007fd6
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -98,10 +98,10 @@ This code outputs the following:
98
 
99
  ### Training Data / Preprocessing
100
 
101
- The data used comes from Google DeepMind and the 🤗 hub. The model card can be found [here](https://huggingface.co/datasets/deepmind/mathematics). Th Deepmind Mathematics DatasetDict object is composed of a vast variety of underlying mathematics datasets.
102
  Each of the underlying datasets contains a specific class of mathematical problems and their solutions. For the CyberSolve LinAlg *1.x* family of models, we are interested specifically in the solving of one-dimensional linear equations, so we use the *algebra__linear_1d* split.
103
 
104
- The training and evaluation splits of the 1D linear algebra dataset split is preprocessed in the following way: we format the raw problems and their solutions of the form `"b'Solve 65*l - 361 + 881 = 0 for l.\\n'"` and `"b'-8\\n'"` into the much cleanear `"Solve 65*l - 361 + 881 = 0 for l."` and `"-8"`.
105
  All inputs and labels are then tokenized. We subsequently evaluate the length of each *input_ids* vector and each *labels* vector to ensure there are no outliers and no inputs that need to be truncated. For later ease of loading, we upload these preprocessed and tokenized training and evaluation datasets
106
  to the 🤗 hub at the following locations: [MarioBarbeque/DeepMind-LinAlg-1D-train](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-train) and [MarioBarbeque/DeepMind-LinAlg-1D-eval](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-eval).
107
 
 
98
 
99
  ### Training Data / Preprocessing
100
 
101
+ The data used comes from Google DeepMind and the 🤗 hub. The model card can be found [here](https://huggingface.co/datasets/deepmind/mathematics). The Deepmind Mathematics DatasetDict object is composed of a vast variety of underlying mathematics datasets.
102
  Each of the underlying datasets contains a specific class of mathematical problems and their solutions. For the CyberSolve LinAlg *1.x* family of models, we are interested specifically in the solving of one-dimensional linear equations, so we use the *algebra__linear_1d* split.
103
 
104
+ The training and evaluation splits of the 1D linear algebra dataset split are preprocessed in the following way: we format the raw problems and their solutions of the form `"b'Solve 65*l - 361 + 881 = 0 for l.\\n'"` and `"b'-8\\n'"` into the much cleanear `"Solve 65*l - 361 + 881 = 0 for l."` and `"-8"`.
105
  All inputs and labels are then tokenized. We subsequently evaluate the length of each *input_ids* vector and each *labels* vector to ensure there are no outliers and no inputs that need to be truncated. For later ease of loading, we upload these preprocessed and tokenized training and evaluation datasets
106
  to the 🤗 hub at the following locations: [MarioBarbeque/DeepMind-LinAlg-1D-train](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-train) and [MarioBarbeque/DeepMind-LinAlg-1D-eval](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-eval).
107