Snowflake
/

snowflake-arctic-instruct

Text Generation

Mixture of Experts

Model card Files Files and versions

jeffra commited on Apr 21, 2024

Commit

a519f10

·

verified ·

1 Parent(s): 89831d5

Update README.md

Files changed (1) hide show

README.md +85 -0

README.md CHANGED Viewed

@@ -1,3 +1,88 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+tags:
+- snowflake
+- arctic
 ---
+## Model Details
+Arctic is a Dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI
+Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of
+Arctic under an Apache-2.0 license. This means you can use them freely in your own research,
+prototypes, and products.
+* [Arctic-Base](link-here)
+* [Acrtic-Instruct](link-to-instruct)
+**Model developers** Snowflake
+**License** Apache-2.0
+**Input** Models input text only.
+**Output** Models generate text and code only.
+**Model Release Date** April, 24th 2024.
+## Model Architecture
+Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B
+total and 17B active parameters chosen using a top-2 gating. For more details about Arctic's model
+architecture please see our cookbook
+## Usage
+As of 4/24/2024 we are actively working with the maintainers of `transformers` to include the Arctic
+model implementation. Until this support is released please follow these instructions to get the
+required dependencies for using Arctic:
+```python
+pip install git+https://github.com/Snowflake-Labs/transformers.git
+```
+Arctic leverages several features from [DeepSpeed](https://github.com/microsoft/DeepSpeed), you will need to
+install the latest version of DeepSpeed to get all of these required features:
+```python
+pip install "deepspeed>=0.15.0"
+```
+### Inference
+To get the best performance with Arctic we highly recommend using TRT-LLM or vLLM for inference. However you
+can also use `transformers` to load
+the model for text generation. Due to the model size we recommend using a single 8xH100 instance from your
+favorite cloud provider such as: AWS [p5.48xlarge](https://aws.amazon.com/ec2/instance-types/p5/),
+Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc.
+In addition, if you would like to access Acrtic via API we have colloborated with several inference API
+providers to host Acrtic such as AWS, Microsoft Azure, NVIDIA Foundry, Lamini, Perplexity, Replicate and Together.
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("snowflake/arctic")
+model = AutoModelForCausalLM.from_pretrained("snowflake/arctic", device_map="auto", torch_dtype=torch.bfloat16)
+input_text = "Hello my name is "
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids, max_new_tokens=20)
+print(tokenizer.decode(outputs[0]))
+```
+### Fine-Tuning
+TODO: add link and extra details about fine-tuning scripts
+## Metrics
+TODO: add summary of metrics here, we don't necessarily need to compare to others but we can if we want
+## Training Data
+TODO: add short description and links to training data related cookbook(s)