AstraMindAI
/

AstraQuasar-4B

Text Generation

text-generation-inference

Model card Files Files and versions

mlinmg commited on Feb 15, 2024

Commit

4f37118

·

verified ·

1 Parent(s): a99a049

Update configuration_quasar.py

Files changed (1) hide show

configuration_quasar.py +16 -7

configuration_quasar.py CHANGED Viewed

@@ -18,9 +18,7 @@ QUASAR_PRETRAINED_CONFIG_ARCHIVE_MAP = {
 class QuasarConfig(PretrainedConfig):
     r"""
     This is the configuration class to store the configuration of a [`QuasarModel`]. It is used to instantiate an Quasar
-    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
-    defaults will yield a similar configuration to that of the Quasar
-    [microsoft/quasar-1](https://huggingface.co/microsoft/quasar-1).
     Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
     documentation from [`PretrainedConfig`] for more information.
@@ -83,15 +81,26 @@ class QuasarConfig(PretrainedConfig):
             Denotes beginning of sequences token id.
         eos_token_id (`int`, *optional*, defaults to 2):
             Denotes end of sequences token id.
     Example:
     ```python
-    >>> from transformers import AutoModel, QuasarConfig
-    >>> # Initializing a Quasar-1 style configuration
-    >>> configuration = QuasarConfig.from_pretrained("AstraMindAI/AstraQuasar-4.5B")
     >>> # Initializing a model from the configuration
     >>> model = QuasarModel(configuration, trust_remote_code=True)

 class QuasarConfig(PretrainedConfig):
     r"""
     This is the configuration class to store the configuration of a [`QuasarModel`]. It is used to instantiate an Quasar
+    model according to the specified arguments, defining the model architecture.
     Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
     documentation from [`PretrainedConfig`] for more information.
             Denotes beginning of sequences token id.
         eos_token_id (`int`, *optional*, defaults to 2):
             Denotes end of sequences token id.
+        duplicate_trick (`bool`, *optional*, defaults to `True`):
+            Whether to use the trick of self layers calling
+        duplicate_grad (`bool`, *optional*, defaults to `True`):
+            Whether or not to do a double grad step during training. Thi is not compatible with Gradient Checkpointing
+        remove_ff_bias (`bool`, *optional*, defaults to `True`):
+            Whether or not to remove feed forward bias
+        gated_activation (`bool`, *optional*, defaults to `False`):
+            Whether or not to use a GeluGLU Activation
+        simple_norm (`bool`, *optional*, defaults to `False`):
+            Whether or not to use a simpler version of RMS Layer Norm
+        sliding_window ('int', *optional* defaults to 2048):
+            If specified it enables a sliding context window to extend the moel context from 2048 to 32K
     Example:
     ```python
+    >>> from transformers import AutoModel, AutoConfig
+    >>> # Initializing a Quasar style configuration
+    >>> configuration = AutoConfig.from_pretrained("AstraMindAI/AstraQuasar-4B")
     >>> # Initializing a model from the configuration
     >>> model = QuasarModel(configuration, trust_remote_code=True)