budecosystem
/

boomer-4b

Text Generation

Model card Files Files and versions

dittops commited on Feb 27, 2024

Commit

0c22885

·

verified ·

1 Parent(s): c181ff1

Update README.md

Files changed (1) hide show

README.md +2 -6

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ library_name: transformers
 ## Introduction 🎉
-In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. **Boomer-4b**, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from a dataset comprising 9B tokens across diverse sources like Wikipedia, stories, arXiv, mathematics, and coding, Boomer-4b integrates flash attention and an enlarged intermediate MLP layer dimension, setting new standards in model design.This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation.
 ## Quick Start 🚀
@@ -35,13 +35,9 @@ print(tokenizer.decode(sample[0]))
 - **Sequence Length**: 2048
 - **Intermediate Size**: 11008
-## Tokenizer Excellence 📝
-The CodeGen model utilizes an advanced tokenizer optimized for parsing and understanding a mix of natural language and programming code, facilitating seamless program synthesis. This tokenizer is crucial for enabling the model's conversational program synthesis capabilities, where specifications in natural language guide the generation of precise programming outputs.
 ## Training Configuration 📊
-Boomer-4b underwent rigorous training, leveraging 8 A100 80GB GPUs for approximately 32 hours. The training was finely tuned with the following hyperparameters:
 - **Per Device Train Batch Size**: 6
 - **Gradient Accumulation Steps**: 1

 ## Introduction 🎉
+In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. **Boomer-4b**, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from custom synthetic data generated with textbook style. This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation.
 ## Quick Start 🚀
 - **Sequence Length**: 2048
 - **Intermediate Size**: 11008
 ## Training Configuration 📊
+The training was finely tuned with the following hyperparameters:
 - **Per Device Train Batch Size**: 6
 - **Gradient Accumulation Steps**: 1