dittops commited on
Commit
0c22885
Β·
verified Β·
1 Parent(s): c181ff1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -9,7 +9,7 @@ library_name: transformers
9
 
10
  ## Introduction πŸŽ‰
11
 
12
- In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. **Boomer-4b**, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from a dataset comprising 9B tokens across diverse sources like Wikipedia, stories, arXiv, mathematics, and coding, Boomer-4b integrates flash attention and an enlarged intermediate MLP layer dimension, setting new standards in model design.This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation.
13
 
14
  ## Quick Start πŸš€
15
 
@@ -35,13 +35,9 @@ print(tokenizer.decode(sample[0]))
35
  - **Sequence Length**: 2048
36
  - **Intermediate Size**: 11008
37
 
38
- ## Tokenizer Excellence πŸ“
39
-
40
- The CodeGen model utilizes an advanced tokenizer optimized for parsing and understanding a mix of natural language and programming code, facilitating seamless program synthesis. This tokenizer is crucial for enabling the model's conversational program synthesis capabilities, where specifications in natural language guide the generation of precise programming outputs.
41
-
42
  ## Training Configuration πŸ“Š
43
 
44
- Boomer-4b underwent rigorous training, leveraging 8 A100 80GB GPUs for approximately 32 hours. The training was finely tuned with the following hyperparameters:
45
 
46
  - **Per Device Train Batch Size**: 6
47
  - **Gradient Accumulation Steps**: 1
 
9
 
10
  ## Introduction πŸŽ‰
11
 
12
+ In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. **Boomer-4b**, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from custom synthetic data generated with textbook style. This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation.
13
 
14
  ## Quick Start πŸš€
15
 
 
35
  - **Sequence Length**: 2048
36
  - **Intermediate Size**: 11008
37
 
 
 
 
 
38
  ## Training Configuration πŸ“Š
39
 
40
+ The training was finely tuned with the following hyperparameters:
41
 
42
  - **Per Device Train Batch Size**: 6
43
  - **Gradient Accumulation Steps**: 1