Update README.md
Browse files
README.md
CHANGED
|
@@ -11,24 +11,28 @@ inference:
|
|
| 11 |
temperature: 0.7
|
| 12 |
---
|
| 13 |
|
| 14 |
-
## Model sheet for AstraQuasar-
|
| 15 |
|
| 16 |
-
**AstraQuasar-
|
| 17 |
-
AstraQuasar-
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
Our model's architecture is fully compatible with leading training frameworks such as Axolotl and LLaMA Factory, ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
|
| 24 |
|
| 25 |
## Example:
|
| 26 |
-
AstraQuasar-
|
| 27 |
-
|
| 28 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 29 |
|
| 30 |
-
model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-
|
| 31 |
-
tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-
|
| 32 |
|
| 33 |
# you can optionally disable the duplicate trick
|
| 34 |
# model.model.duplicate_trick = False
|
|
@@ -46,12 +50,15 @@ AstraQuasar-4.5B-v.0.1 can be easily instantiated using the Hugging Face Transfo
|
|
| 46 |
generate_ids = model.generate(inputs.input_ids, max_length=30)
|
| 47 |
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
| 48 |
|
| 49 |
-
Pre-training and fine-tuning can be performed using **accelerate**.
|
| 50 |
|
| 51 |
## Notice
|
| 52 |
|
| 53 |
-
It's important to note that AstraQuasar-
|
| 54 |
|
| 55 |
## NEWS
|
| 56 |
|
| 57 |
Stay tuned for exciting developments! A new architecture, **AstraPulsar**, is on the horizon, promising further advancements in language modeling.
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
temperature: 0.7
|
| 12 |
---
|
| 13 |
|
| 14 |
+
## Model sheet for AstraQuasar-4B
|
| 15 |
|
| 16 |
+
**AstraQuasar-4B** is our first pre-trained Large Language Model (LLM) for text generation. It is a model with **4B parameters**, whithout embeddings.
|
| 17 |
+
AstraQuasar-4B-v.0.1 is built upon the foundation of the Phi-2 architecture, with **significant enhancements including an increased number of layers and the innovative introduction of a novel technique known as the duplicate trick.**
|
| 18 |
|
| 19 |
+
<p align="center">
|
| 20 |
+
<img src="https://cdn.discordapp.com/attachments/1196455895559839774/1207665552387080293/image.png?ex=65e07931&is=65ce0431&hm=cf9dcfada83cff4b70eda8302108b3a4267f066bac57455348b09f4cbb92e702&" width="800"/>
|
| 21 |
+
</p>
|
| 22 |
|
| 23 |
+
|
| 24 |
+
AstraQuasar-4B-v.0.1 at the moment is an under trained model. Serving as a demonstration of the potential of the duplication trick and its implications for future advancements in language modeling. Despite its nascent status, our model has already demonstrated superior performance compared to both the base Phi-2 model and earlier iterations of AstraQuasar-4B that do not utilize the duplication trick.
|
| 25 |
+
|
| 26 |
+
One of the key milestones achieved by AstraQuasar-4B is its successful application of backpropagation on the duplication trick, setting a precedent for future research and development in this area.
|
| 27 |
|
| 28 |
Our model's architecture is fully compatible with leading training frameworks such as Axolotl and LLaMA Factory, ensuring seamless integration into existing workflows leveraging the standard Hugging Face Transformers library.
|
| 29 |
|
| 30 |
## Example:
|
| 31 |
+
AstraQuasar-4B can be easily instantiated using the Hugging Face Transformers library:
|
|
|
|
| 32 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 33 |
|
| 34 |
+
model = AutoModelForCausalLM.from_pretrained("AstraMindAI/AstraQuasar-4B", trust_remote_code=True)
|
| 35 |
+
tokenizer = AutoTokenizer.from_pretrained("AstraMindAI/AstraQuasar-4B")
|
| 36 |
|
| 37 |
# you can optionally disable the duplicate trick
|
| 38 |
# model.model.duplicate_trick = False
|
|
|
|
| 50 |
generate_ids = model.generate(inputs.input_ids, max_length=30)
|
| 51 |
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
| 52 |
|
| 53 |
+
Pre-training and fine-tuning can be performed using **accelerate** or **deepspeed**.
|
| 54 |
|
| 55 |
## Notice
|
| 56 |
|
| 57 |
+
It's important to note that AstraQuasar-4B is a pre-trained base model and does not incorporate any moderation mechanisms.
|
| 58 |
|
| 59 |
## NEWS
|
| 60 |
|
| 61 |
Stay tuned for exciting developments! A new architecture, **AstraPulsar**, is on the horizon, promising further advancements in language modeling.
|
| 62 |
+
|
| 63 |
+
## Credits:
|
| 64 |
+
- (Undi95)[https://huggingface.co/Undi95] for helping us to understanding the process self-calling layers.
|