Instructions to use stabilityai/StableBeluga1-Delta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stabilityai/StableBeluga1-Delta with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="stabilityai/StableBeluga1-Delta")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("stabilityai/StableBeluga1-Delta") model = AutoModelForCausalLM.from_pretrained("stabilityai/StableBeluga1-Delta") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use stabilityai/StableBeluga1-Delta with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stabilityai/StableBeluga1-Delta" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/StableBeluga1-Delta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/stabilityai/StableBeluga1-Delta
- SGLang
How to use stabilityai/StableBeluga1-Delta with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "stabilityai/StableBeluga1-Delta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/StableBeluga1-Delta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "stabilityai/StableBeluga1-Delta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stabilityai/StableBeluga1-Delta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use stabilityai/StableBeluga1-Delta with Docker Model Runner:
docker model run hf.co/stabilityai/StableBeluga1-Delta
Create README.md
#1
by dmayhem93 - opened
README.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
datasets:
|
| 4 |
+
- conceptofmind/cot_submix_original
|
| 5 |
+
- conceptofmind/flan2021_submix_original
|
| 6 |
+
- conceptofmind/t0_submix_original
|
| 7 |
+
- conceptofmind/niv2_submix_original
|
| 8 |
+
language:
|
| 9 |
+
- en
|
| 10 |
+
pipeline_tag: text-generation
|
| 11 |
+
---
|
| 12 |
+
# FreeWilly
|
| 13 |
+
|
| 14 |
+
## Model Description
|
| 15 |
+
|
| 16 |
+
`FreeWilly` is a Llama65B model fine-tuned on an Orca style Dataset
|
| 17 |
+
|
| 18 |
+
## Usage
|
| 19 |
+
|
| 20 |
+
### Apply Delta Weights
|
| 21 |
+
|
| 22 |
+
FreeWilly1 cannot be used from the `stabilityai/FreeWilly1-Delta-SafeTensor` weights alone. To obtain the correct model, one must add back the difference between LLaMA 65B and `stabilityai/FreeWilly1-Delta-SafeTensor` weights. We provide the [`apply_delta.py`](https://huggingface.co/stabilityai/FreeWilly1-Delta-SafeTensor/raw/main/apply_delta.py) script to automate the conversion, which you can run as:
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
```sh
|
| 26 |
+
python3 apply_delta.py --base-model-path /path/to/model_weights/llama-65b --target-model-path FreeWilly1 --delta-path stabilityai/FreeWilly1-Delta-SafeTensor
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
Start chatting with `FreeWilly` using the following code snippet:
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
import torch
|
| 35 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 36 |
+
|
| 37 |
+
tokenizer = AutoTokenizer.from_pretrained("your_path_to_freewilly", use_fast=False)
|
| 38 |
+
model = AutoModelForCausalLM.from_pretrained("your_path_to_freewilly", torch_dtype=torch.float16, low_cpu_mem_usage=True, use_accelerate=True)
|
| 39 |
+
|
| 40 |
+
system_prompt = "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n"
|
| 41 |
+
system_prompt += "### Instruction:\nYou are Free Willy, an AI that follows instructions extremely well. Help as much as you can. Remember, be safe, and don't do anything illegal.\n\n"
|
| 42 |
+
|
| 43 |
+
message = "Write me a poem please"
|
| 44 |
+
prompt = f"{system_prompt}### Input: {message}\n\n### Response:\n"
|
| 45 |
+
|
| 46 |
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
| 47 |
+
output = model.generate(**inputs, do_sample=True, top_p=0.95, top_k=0, max_new_tokens=256)
|
| 48 |
+
|
| 49 |
+
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
FreeWilly should be used with prompts formatted similarly to Alpaca as below:
|
| 53 |
+
```
|
| 54 |
+
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
|
| 55 |
+
|
| 56 |
+
## Instruction:
|
| 57 |
+
This is a system prompt, please behave and help the user.
|
| 58 |
+
|
| 59 |
+
### Input:
|
| 60 |
+
Your prompt here
|
| 61 |
+
|
| 62 |
+
### Response
|
| 63 |
+
The output of FreeWilly
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## Model Details
|
| 67 |
+
|
| 68 |
+
* **Developed by**: [Stability AI](https://stability.ai/)
|
| 69 |
+
* **Model type**: FreeWilly is an auto-regressive language model fine-tuned on LLaMA65B.
|
| 70 |
+
* **Language(s)**: English
|
| 71 |
+
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
|
| 72 |
+
* **License**: Fine-tuned checkpoints (`FreeWilly`) is licensed under the Non-Commercial Creative Commons license ([CC BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/))
|
| 73 |
+
* **Contact**: For questions and comments about the model, please email `lm@stability.ai`
|
| 74 |
+
|
| 75 |
+
### Training Dataset
|
| 76 |
+
|
| 77 |
+
`FreeWilly` is trained on our internal Orca-style dataset
|
| 78 |
+
|
| 79 |
+
### Training Procedure
|
| 80 |
+
|
| 81 |
+
Models are learned via supervised fine-tuning on the aforementioned datasets, trained in mixed-precision (BF16), and optimized with AdamW. We outline the following hyperparameters:
|
| 82 |
+
|
| 83 |
+
| Dataset | Batch Size | Learning Rate |Learning Rate Decay| Warm-up | Weight Decay | Betas |
|
| 84 |
+
|-------------------|------------|---------------|-------------------|---------|--------------|-------------|
|
| 85 |
+
| Orca pt1 packed | 512 | 3e-5 | Cosine to 3e-6 | 100 | 1e-6 | (0.9, 0.95) |
|
| 86 |
+
| Orca pt2 unpacked | 512 | 3e-5 | Cosine to 3e-6 | 100 | 1e-6 | (0.9, 0.95) |
|
| 87 |
+
|
| 88 |
+
## Use and Limitations
|
| 89 |
+
|
| 90 |
+
### Intended Use
|
| 91 |
+
|
| 92 |
+
These models are intended for research only, in adherence with the [CC BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
|
| 93 |
+
|
| 94 |
+
### Limitations and bias
|
| 95 |
+
|
| 96 |
+
Although the aforementioned dataset helps to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use it responsibly.
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
## Citations
|
| 101 |
+
|
| 102 |
+
```bibtext
|
| 103 |
+
@misc{touvron2023llama,
|
| 104 |
+
title={LLaMA: Open and Efficient Foundation Language Models},
|
| 105 |
+
author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie-Anne Lachaux and Timothée Lacroix and Baptiste Rozière and Naman Goyal and Eric Hambro and Faisal Azhar and Aurelien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
|
| 106 |
+
year={2023},
|
| 107 |
+
eprint={2302.13971},
|
| 108 |
+
archivePrefix={arXiv},
|
| 109 |
+
primaryClass={cs.CL}
|
| 110 |
+
}
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
```bibtext
|
| 114 |
+
@misc{mukherjee2023orca,
|
| 115 |
+
title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
|
| 116 |
+
author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
|
| 117 |
+
year={2023},
|
| 118 |
+
eprint={2306.02707},
|
| 119 |
+
archivePrefix={arXiv},
|
| 120 |
+
primaryClass={cs.CL}
|
| 121 |
+
}
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
```bibtex
|
| 125 |
+
@misc{alpaca,
|
| 126 |
+
author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
|
| 127 |
+
title = {Stanford Alpaca: An Instruction-following LLaMA model},
|
| 128 |
+
year = {2023},
|
| 129 |
+
publisher = {GitHub},
|
| 130 |
+
journal = {GitHub repository},
|
| 131 |
+
howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
|
| 132 |
+
}
|
| 133 |
+
```
|
| 134 |
+
|