Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ library_name: transformers
|
|
| 9 |
|
| 10 |
## Introduction π
|
| 11 |
|
| 12 |
-
In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. **Boomer-4b**, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from
|
| 13 |
|
| 14 |
## Quick Start π
|
| 15 |
|
|
@@ -35,13 +35,9 @@ print(tokenizer.decode(sample[0]))
|
|
| 35 |
- **Sequence Length**: 2048
|
| 36 |
- **Intermediate Size**: 11008
|
| 37 |
|
| 38 |
-
## Tokenizer Excellence π
|
| 39 |
-
|
| 40 |
-
The CodeGen model utilizes an advanced tokenizer optimized for parsing and understanding a mix of natural language and programming code, facilitating seamless program synthesis. This tokenizer is crucial for enabling the model's conversational program synthesis capabilities, where specifications in natural language guide the generation of precise programming outputs.
|
| 41 |
-
|
| 42 |
## Training Configuration π
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
- **Per Device Train Batch Size**: 6
|
| 47 |
- **Gradient Accumulation Steps**: 1
|
|
|
|
| 9 |
|
| 10 |
## Introduction π
|
| 11 |
|
| 12 |
+
In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. **Boomer-4b**, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from custom synthetic data generated with textbook style. This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation.
|
| 13 |
|
| 14 |
## Quick Start π
|
| 15 |
|
|
|
|
| 35 |
- **Sequence Length**: 2048
|
| 36 |
- **Intermediate Size**: 11008
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
## Training Configuration π
|
| 39 |
|
| 40 |
+
The training was finely tuned with the following hyperparameters:
|
| 41 |
|
| 42 |
- **Per Device Train Batch Size**: 6
|
| 43 |
- **Gradient Accumulation Steps**: 1
|