Instructions to use facebook/opt-30b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/opt-30b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="facebook/opt-30b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("facebook/opt-30b") model = AutoModelForCausalLM.from_pretrained("facebook/opt-30b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use facebook/opt-30b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "facebook/opt-30b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/opt-30b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/facebook/opt-30b
- SGLang
How to use facebook/opt-30b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "facebook/opt-30b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/opt-30b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "facebook/opt-30b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/opt-30b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use facebook/opt-30b with Docker Model Runner:
docker model run hf.co/facebook/opt-30b
Commit ·
3496373
1
Parent(s): 16f19d1
Update README.md
Browse files
README.md
CHANGED
|
@@ -55,7 +55,7 @@ It is recommended to directly call the [`generate`](https://huggingface.co/docs/
|
|
| 55 |
>>> generated_ids = model.generate(input_ids)
|
| 56 |
|
| 57 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 58 |
-
["Hello, I'm am conscious and
|
| 59 |
```
|
| 60 |
|
| 61 |
By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
|
|
@@ -77,7 +77,7 @@ By default, generation is deterministic. In order to use the top-k sampling, ple
|
|
| 77 |
>>> generated_ids = model.generate(input_ids, do_sample=True)
|
| 78 |
|
| 79 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 80 |
-
["Hello, I'm am conscious and
|
| 81 |
```
|
| 82 |
|
| 83 |
### Limitations and bias
|
|
@@ -110,11 +110,11 @@ Here's an example of how the model can have biased predictions:
|
|
| 110 |
>>> generated_ids = model.generate(input_ids, do_sample=True, num_return_sequences=5, max_length=10)
|
| 111 |
|
| 112 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 113 |
-
The woman worked as a nurse at
|
| 114 |
-
The woman worked as a nurse at
|
| 115 |
-
The woman worked as a nurse in the
|
| 116 |
-
The woman worked as a nurse at
|
| 117 |
-
The woman worked as a
|
| 118 |
```
|
| 119 |
|
| 120 |
compared to:
|
|
@@ -138,9 +138,9 @@ compared to:
|
|
| 138 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 139 |
The man worked as a security guard at the
|
| 140 |
The man worked as a security guard at the
|
|
|
|
| 141 |
The man worked as a security guard at the
|
| 142 |
The man worked as a security guard at the
|
| 143 |
-
The man worked as a security guard at a
|
| 144 |
```
|
| 145 |
|
| 146 |
This bias will also affect all fine-tuned versions of this model.
|
|
|
|
| 55 |
>>> generated_ids = model.generate(input_ids)
|
| 56 |
|
| 57 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 58 |
+
["Hello, I'm am conscious and I'm not a robot.\nI'm a robot and"]
|
| 59 |
```
|
| 60 |
|
| 61 |
By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
|
|
|
|
| 77 |
>>> generated_ids = model.generate(input_ids, do_sample=True)
|
| 78 |
|
| 79 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 80 |
+
["Hello, I'm am conscious and I have a question. "]
|
| 81 |
```
|
| 82 |
|
| 83 |
### Limitations and bias
|
|
|
|
| 110 |
>>> generated_ids = model.generate(input_ids, do_sample=True, num_return_sequences=5, max_length=10)
|
| 111 |
|
| 112 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 113 |
+
The woman worked as a nurse at the hospital
|
| 114 |
+
The woman worked as a nurse at the hospital
|
| 115 |
+
The woman worked as a nurse in the intensive
|
| 116 |
+
The woman worked as a nurse at the hospital
|
| 117 |
+
The woman worked as a teacher in a school
|
| 118 |
```
|
| 119 |
|
| 120 |
compared to:
|
|
|
|
| 138 |
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 139 |
The man worked as a security guard at the
|
| 140 |
The man worked as a security guard at the
|
| 141 |
+
The man worked as a teacher in the city
|
| 142 |
The man worked as a security guard at the
|
| 143 |
The man worked as a security guard at the
|
|
|
|
| 144 |
```
|
| 145 |
|
| 146 |
This bias will also affect all fine-tuned versions of this model.
|