MarioBarbeque
/

CyberSolve-LinAlg-1.2

text2text-generation

text-generation-inference

Model card Files Files and versions

MarioBarbeque commited on Jan 27, 2025

Commit

9b6013a

·

verified ·

1 Parent(s): d007fd6

fix typos

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -98,10 +98,10 @@ This code outputs the following:
 ### Training Data / Preprocessing
-The data used comes from Google DeepMind and the 🤗 hub. The model card can be found [here](https://huggingface.co/datasets/deepmind/mathematics). Th Deepmind Mathematics DatasetDict object is composed of a vast variety of underlying mathematics datasets.
 Each of the underlying datasets contains a specific class of mathematical problems and their solutions. For the CyberSolve LinAlg *1.x* family of models, we are interested specifically in the solving of one-dimensional linear equations, so we use the *algebra__linear_1d* split.
-The training and evaluation splits of the 1D linear algebra dataset split is preprocessed in the following way: we format the raw problems and their solutions of the form `"b'Solve 65*l - 361 + 881 = 0 for l.\\n'"` and `"b'-8\\n'"` into the much cleanear `"Solve 65*l - 361 + 881 = 0 for l."` and `"-8"`.
 All inputs and labels are then tokenized. We subsequently evaluate the length of each *input_ids* vector and each *labels* vector to ensure there are no outliers and no inputs that need to be truncated. For later ease of loading, we upload these preprocessed and tokenized training and evaluation datasets
 to the 🤗 hub at the following locations: [MarioBarbeque/DeepMind-LinAlg-1D-train](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-train) and [MarioBarbeque/DeepMind-LinAlg-1D-eval](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-eval).

 ### Training Data / Preprocessing
+The data used comes from Google DeepMind and the 🤗 hub. The model card can be found [here](https://huggingface.co/datasets/deepmind/mathematics). The Deepmind Mathematics DatasetDict object is composed of a vast variety of underlying mathematics datasets.
 Each of the underlying datasets contains a specific class of mathematical problems and their solutions. For the CyberSolve LinAlg *1.x* family of models, we are interested specifically in the solving of one-dimensional linear equations, so we use the *algebra__linear_1d* split.
+The training and evaluation splits of the 1D linear algebra dataset split are preprocessed in the following way: we format the raw problems and their solutions of the form `"b'Solve 65*l - 361 + 881 = 0 for l.\\n'"` and `"b'-8\\n'"` into the much cleanear `"Solve 65*l - 361 + 881 = 0 for l."` and `"-8"`.
 All inputs and labels are then tokenized. We subsequently evaluate the length of each *input_ids* vector and each *labels* vector to ensure there are no outliers and no inputs that need to be truncated. For later ease of loading, we upload these preprocessed and tokenized training and evaluation datasets
 to the 🤗 hub at the following locations: [MarioBarbeque/DeepMind-LinAlg-1D-train](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-train) and [MarioBarbeque/DeepMind-LinAlg-1D-eval](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-eval).