added quite a few updates
Browse files
README.md
CHANGED
|
@@ -29,8 +29,8 @@ The model weights of *CyberSolve LinAlg 1.2* are a further downstream checkpoint
|
|
| 29 |
To construct **CyberSolve LinAlg 1.2**, the *FLAN-T5 large* model is fined-tuned using a custom PyTorch training loop optimized for multiple GPUs. We supervise a training of *FLAN-T5 large* on the *algebra__linear_1d* split of the Google DeepMind mathematics dataset, an open source
|
| 30 |
dateset from Google DeepMind available through the π€ hub [deepmind/math_dataset](https://huggingface.co/datasets/deepmind/math_dataset). This large dataset consists of code generating mathematical problems and their solutions to a variety of tasks across unique mathematical disciplines.
|
| 31 |
|
| 32 |
-
In this preliminary family of CyberSolve models, we are specifically interested in understanding the ability of neural models to solve non-trivial mathematical tasks. As such, the CyberSolve **LinAlg 1.x** family of models are trained on a set of 2M simpler, one-dimension linear equations.
|
| 33 |
-
downsampled set of the dataset before training for multiple epochs over the dataset's entirety. This model in particular has been trained for 2 additional epochs, limited only by funds, beyond the original *CyberSolve LinAlg 1.1* checkpoint.
|
| 34 |
|
| 35 |
Version 1.2 is the most capable version of CyberSolve LinAlg, scoring a **90.75** exact match score on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
|
| 36 |
|
|
@@ -53,16 +53,21 @@ Version 1.2 is the most capable version of CyberSolve LinAlg, scoring a **90.75*
|
|
| 53 |
|
| 54 |
### Direct Use
|
| 55 |
|
| 56 |
-
In order to effectively query the model's ability to solve linear equations, a string of the format `Solve <any one-dimensional linear equation
|
| 57 |
-
The model will attempt to solve the equation, outputting its prediction in a simple numeric format. See the example below.
|
| 58 |
|
| 59 |
## How to Use and Query the Model
|
| 60 |
|
| 61 |
-
Use the code below to get started with the model.
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
the
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
``` python
|
| 68 |
|
|
@@ -93,34 +98,38 @@ This code outputs the following:
|
|
| 93 |
|
| 94 |
### Training Data / Preprocessing
|
| 95 |
|
| 96 |
-
The data used comes from Google DeepMind and the π€ hub. The model card can be found [here](https://huggingface.co/datasets/deepmind/mathematics).
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
and
|
|
|
|
|
|
|
|
|
|
| 100 |
|
| 101 |
### Training Procedure
|
| 102 |
|
| 103 |
-
The model was trained locally on a single-node with multiple Nvidia A100 GPUs using π€ Transformers, π€ Tokenizers, and a custom PyTorch training loop that made use of π€ Accelerate.
|
| 104 |
|
| 105 |
|
| 106 |
#### Training Hyperparameters
|
| 107 |
|
| 108 |
- **Precision:** We use FP32 precision, the same precision of the base "google/flan-t5-large" model.
|
| 109 |
-
- **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia
|
| 110 |
-
- **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of
|
| 111 |
-
- **Batch Size:**
|
| 112 |
-
- **Number of Training Steps**:
|
| 113 |
|
| 114 |
|
| 115 |
## Evaluation / Metrics
|
| 116 |
|
| 117 |
-
We evaluate our
|
| 118 |
-
|
| 119 |
-
See the wikipedia links for perplexity and cross-entropy below for more a detailed discussion and various other definitions.
|
| 120 |
-
|
| 121 |
-
Cross-entropy: [https://en.wikipedia.org/wiki/Cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy)
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
|
| 125 |
|
| 126 |
### Testing Data, Factors & Metrics
|
|
@@ -140,6 +149,18 @@ We find the following perplexity metrics over 3 training epochs:
|
|
| 140 |
|1 | 16.28 |
|
| 141 |
|2 | 15.78 |
|
| 142 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
#### Summary
|
| 144 |
|
| 145 |
We train this model for the purpose of attempting a local training of a masked language model using both the π€ ecosystem and a custom PyTorch training and evaluation loop.
|
|
|
|
| 29 |
To construct **CyberSolve LinAlg 1.2**, the *FLAN-T5 large* model is fined-tuned using a custom PyTorch training loop optimized for multiple GPUs. We supervise a training of *FLAN-T5 large* on the *algebra__linear_1d* split of the Google DeepMind mathematics dataset, an open source
|
| 30 |
dateset from Google DeepMind available through the π€ hub [deepmind/math_dataset](https://huggingface.co/datasets/deepmind/math_dataset). This large dataset consists of code generating mathematical problems and their solutions to a variety of tasks across unique mathematical disciplines.
|
| 31 |
|
| 32 |
+
In this preliminary family of CyberSolve models, we are specifically interested in understanding the ability of neural models to solve non-trivial mathematical tasks. As such, the CyberSolve **LinAlg 1.x** family of models are trained on a set of 2M simpler, one-dimension linear equations.
|
| 33 |
+
We preprocessed the data and simulated the training on a smaller, downsampled set of the dataset before training for multiple epochs over the dataset's entirety. This model in particular has been trained for 2 additional epochs, limited only by funds, beyond the original *CyberSolve LinAlg 1.1* checkpoint.
|
| 34 |
|
| 35 |
Version 1.2 is the most capable version of CyberSolve LinAlg, scoring a **90.75** exact match score on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
|
| 36 |
|
|
|
|
| 53 |
|
| 54 |
### Direct Use
|
| 55 |
|
| 56 |
+
In order to effectively query the model's ability to solve linear equations, a string of the format `"Solve <any one-dimensional linear equation of variable x> for x."` should be tokenized and passed to the model's `generate` attribute.
|
| 57 |
+
An example input string is `input_text = "Solve 24 = 1601*c - 1605*c for c."`. The model will attempt to solve the equation, outputting its prediction in a simple numeric format. See the example below.
|
| 58 |
|
| 59 |
## How to Use and Query the Model
|
| 60 |
|
| 61 |
+
Use the code below to get started with the model. Users pass an `input_text` string (again, of the form `input_text = "Solve 24 = 1601*c - 1605*c for c."`) which prompts the model to solve a one-dimensional linear equation.
|
| 62 |
+
Model prediction is significantly faster on a GPU, and so usage of the `.to('cuda')` commands to make sure both the model and all input ids are on the GPU is best practice.
|
| 63 |
+
|
| 64 |
+
Furthermore, the FLAN-T5 model architecture makes use
|
| 65 |
+
of many normalization layers, as is common in the transformer architecture. By default, CyberSolve uses the T5 model's `T5LayerNorm` Python class; it is highly recommended that user install the Nvidia `Apex` package for Nvidia GPUs
|
| 66 |
+
or the ROCm `Apex` package for AMD GPUs. Once installed, the model will default to using the `apex.normalization.FusedRMSNorm` class when computing the normalization layers. The `FusedRMSNorm` class from `apex` makes use of an optimized fushed kernel
|
| 67 |
+
that is much faster than the standard `T5LayerNorm` class, thereby significantly improving both inference and training.
|
| 68 |
+
|
| 69 |
+
The base FLAN-T5 model is capable of answering a variety of prompts, but the domain-adapted CyberSolve LinAlg model is designed specifically for solving linear equations. As such, users must be considerate in their prompt
|
| 70 |
+
engineering to issue a coherent, relevant query as outlined above and below.
|
| 71 |
|
| 72 |
``` python
|
| 73 |
|
|
|
|
| 98 |
|
| 99 |
### Training Data / Preprocessing
|
| 100 |
|
| 101 |
+
The data used comes from Google DeepMind and the π€ hub. The model card can be found [here](https://huggingface.co/datasets/deepmind/mathematics). Th Deepmind Mathematics DatasetDict object is composed of a vast variety of underlying mathematics datasets.
|
| 102 |
+
Each of the underlying datasets contains a specific class of mathematical problems and their solutions. For the CyberSolve LinAlg *1.x* family of models, we are interested specifically in the solving of one-dimensional linear equations, so we use the *algebra__linear_1d* split.
|
| 103 |
+
|
| 104 |
+
The training and evaluation splits of the 1D linear algebra dataset split is preprocessed in the following way: we format the raw problems and their solutions of the form `"b'Solve 65*l - 361 + 881 = 0 for l.\\n'"` and `"b'-8\\n'"` into the much cleanear `"Solve 65*l - 361 + 881 = 0 for l."` and `"-8"`.
|
| 105 |
+
All inputs and labels are then tokenized. We subsequently evaluate the length of each *input_ids* vector and each *labels* vector to ensure there are no outliers and no inputs that need to be truncated. For later ease of loading, we upload these preprocessed and tokenized training and evaluation datasets
|
| 106 |
+
to the π€ hub at the following locations: [MarioBarbeque/DeepMind-LinAlg-1D-train](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-train) and [MarioBarbeque/DeepMind-LinAlg-1D-eval](https://huggingface.co/datasets/MarioBarbeque/DeepMind-LinAlg-1D-eval).
|
| 107 |
+
|
| 108 |
|
| 109 |
### Training Procedure
|
| 110 |
|
| 111 |
+
The model was trained locally on a single-node with multiple Nvidia A100 GPUs using π€ Transformers, π€ Tokenizers, and a custom PyTorch training loop that made use of both Nvidia Apex and π€ Accelerate.
|
| 112 |
|
| 113 |
|
| 114 |
#### Training Hyperparameters
|
| 115 |
|
| 116 |
- **Precision:** We use FP32 precision, the same precision of the base "google/flan-t5-large" model.
|
| 117 |
+
- **Optimizer:** `apex.optimizers.FusedAdam`, a fused kernel version of the AdamW optimizer from Nvidia Apex
|
| 118 |
+
- **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 1e-4 to further adjust the CyberSolve LinAlg **1.1** weights
|
| 119 |
+
- **Batch Size:** 64
|
| 120 |
+
- **Number of Training Steps**: 1918 training steps over 2 additional epochs (CyberSolve LinAlg **1.2**) - beyond the original 2877 total steps over 3 epochs (CyberSolve LinAlg **1.1**)
|
| 121 |
|
| 122 |
|
| 123 |
## Evaluation / Metrics
|
| 124 |
|
| 125 |
+
We evaluate our text-to-text linear equation solver by using the `exact_match` metric by comparing the model's decoded predicted tokens with their numeric labels. *CyberSolve LinAlg 1.2* scores a **90.75** exact match score
|
| 126 |
+
on the evaluation set of 10k linear equations from the DeepMind *algebra__linear_1d* split. This is a non-trivial improvement from the exact match score of **86.56** attained by *CyberSolve LinAlg 1.1*.
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
+
Additionally, we construct a partial correctness dataset available at the following model card: [MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.2-correctness-benchmark).
|
| 129 |
+
This dataset was created with the goal of analyzing both the token-to-token and decoded-sequence-to-decoded-sequence partial correctness of CyberSolve's predicitions in detail beyond just its ability to get answers flat out right or wrong. Similar partial correctness benchmark datasets were created for the
|
| 130 |
+
intial [FLAN-T5 model](https://huggingface.co/datasets/MarioBarbeque/FLAN-T5-DeepMind-LinAlg-1D-benchmark), the preliminary, [zeroth-generation downsampled training](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-DeepMind-LinAlg-1D-downsample-benchmark-v2) of CyberSolve, and
|
| 131 |
+
the [1.1 version](https://huggingface.co/datasets/MarioBarbeque/CyberSolve-LinAlg-1.1-correctness-benchmark) of the model. *We have yet to complete partial correctness analysis of the various model versions and their predicitions, but we look forward to better understanding the mathematical
|
| 132 |
+
reasoning capabilities of models and publishing our results when complete!*
|
| 133 |
|
| 134 |
|
| 135 |
### Testing Data, Factors & Metrics
|
|
|
|
| 149 |
|1 | 16.28 |
|
| 150 |
|2 | 15.78 |
|
| 151 |
|
| 152 |
+
We find the following benchmark scores for our each of our neural models.
|
| 153 |
+
|
| 154 |
+
**CyberSolve LinAlg 1.2**
|
| 155 |
+
| epoch | exact_match score |
|
| 156 |
+
|-------|-------------------|
|
| 157 |
+
|0 | 17.38 |
|
| 158 |
+
|1 | 16.28 |
|
| 159 |
+
|2 | 15.78 |
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
|
| 163 |
+
|
| 164 |
#### Summary
|
| 165 |
|
| 166 |
We train this model for the purpose of attempting a local training of a masked language model using both the π€ ecosystem and a custom PyTorch training and evaluation loop.
|