Instructions to use lambda/pythia-12b-deduped-synthetic-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lambda/pythia-12b-deduped-synthetic-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lambda/pythia-12b-deduped-synthetic-instruct")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("lambda/pythia-12b-deduped-synthetic-instruct") model = AutoModelForCausalLM.from_pretrained("lambda/pythia-12b-deduped-synthetic-instruct") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use lambda/pythia-12b-deduped-synthetic-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lambda/pythia-12b-deduped-synthetic-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lambda/pythia-12b-deduped-synthetic-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/lambda/pythia-12b-deduped-synthetic-instruct
- SGLang
How to use lambda/pythia-12b-deduped-synthetic-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lambda/pythia-12b-deduped-synthetic-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lambda/pythia-12b-deduped-synthetic-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lambda/pythia-12b-deduped-synthetic-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lambda/pythia-12b-deduped-synthetic-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use lambda/pythia-12b-deduped-synthetic-instruct with Docker Model Runner:
docker model run hf.co/lambda/pythia-12b-deduped-synthetic-instruct
| language: | |
| - en | |
| tags: | |
| - pytorch | |
| - causal-lm | |
| - pythia | |
| license: apache-2.0 | |
| datasets: | |
| - Dahoas/synthetic-instruct-gptj-pairwise | |
| This model is created by finetuning [`EleutherAI/pythia-12b-deduped`](https://huggingface.co/EleutherAI/pythia-12b-deduped) on the [`Dahoas/synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise). | |
| You can try a [demo](https://cloud.lambdalabs.com/demos/ml/gpt-neox-side-by-side) of the model hosted on [Lambda Cloud](https://lambdalabs.com/service/gpu-cloud). | |
| ### Model Details | |
| - Finetuned by: [Lambda](https://lambdalabs.com/) | |
| - Model type: Transformer-based Language Model | |
| - Language: English | |
| - Pre-trained model: [EleutherAI/pythia-12b-deduped](https://huggingface.co/EleutherAI/pythia-12b-deduped) | |
| - Dataset: [Dahoas/synthetic-instruct-gptj-pairwise](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) | |
| - Library: [transformers](https://huggingface.co/docs/transformers/index) | |
| - License: Apache 2.0 | |
| ### Prerequisites | |
| Running inference with the model takes ~24GB of GPU memory. | |
| ### Quick Start | |
| ``` | |
| import torch | |
| from transformers import AutoTokenizer, pipeline, StoppingCriteria, StoppingCriteriaList | |
| device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") | |
| model_name = "lambdalabs/pythia-12b-deduped-synthetic-instruct" | |
| max_new_tokens = 1536 | |
| stop_token = "<|stop|>" | |
| class KeywordsStoppingCriteria(StoppingCriteria): | |
| def __init__(self, keywords_ids: list): | |
| self.keywords = keywords_ids | |
| def __call__( | |
| self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs | |
| ) -> bool: | |
| if input_ids[0][-1] in self.keywords: | |
| return True | |
| return False | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| model_name, | |
| ) | |
| tokenizer.pad_token = tokenizer.eos_token | |
| tokenizer.add_tokens([stop_token]) | |
| stop_ids = [tokenizer.encode(w)[0] for w in [stop_token]] | |
| stop_criteria = KeywordsStoppingCriteria(stop_ids) | |
| generator = pipeline( | |
| "text-generation", | |
| model=model_name, | |
| device=device, | |
| max_new_tokens=max_new_tokens, | |
| torch_dtype=torch.float16, | |
| stopping_criteria=StoppingCriteriaList([stop_criteria]), | |
| ) | |
| example = "How can I make an omelette." | |
| text = "Question: {}\nAnswer:".format(example) | |
| result = generator( | |
| text, | |
| num_return_sequences=1, | |
| ) | |
| output = result[0]["generated_text"] | |
| print(output) | |
| ``` | |
| Output: | |
| ``` | |
| Question: How can I make an omelette. | |
| Answer:To make an omelette, start by cracking two eggs into a bowl and whisking them together with a pinch of salt and pepper. Heat a non-stick pan over medium-high heat and add a tablespoon of butter. Once the butter has melted, pour in the egg mixture and let it cook for a few minutes until the edges start to turn golden. Then, using a spatula, fold the omelette in half and let it cook for another minute or two. Finally, flip the omelette over and cook for another minute or two until the omelette is cooked through. Serve the omelette with your favorite toppings and enjoy.<|stop|> | |
| ``` | |
| ### Training | |
| The model was trained on the [`Dahoas/synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise). We split the original dataset into the train (first 32000 examples) and validation (the remaining 1144 examples) subsets. | |
| We finetune the model for 4 epoches with the help of deepspeed. This took 8xA100 80GB 17 hours, where we set `batch_size_per_gpu` to `4` (so global batch size is 32), and learning rate to `0.0000025` (with linear decay to zero at the last trainig step). You can find a Weights and Biases record [here](https://wandb.ai/chuanli11/ft-synthetic-instruct-gptj-pairwise-pythia12b-deepspeed?workspace=user-). | |