Instructions to use facebook/opt-125m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/opt-125m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="facebook/opt-125m")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m") model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use facebook/opt-125m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "facebook/opt-125m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/opt-125m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/facebook/opt-125m
- SGLang
How to use facebook/opt-125m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "facebook/opt-125m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/opt-125m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "facebook/opt-125m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/opt-125m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use facebook/opt-125m with Docker Model Runner:
docker model run hf.co/facebook/opt-125m
I believe that the readme says 175B model when it should be 175M.
#41
by noobmaster29 - opened
README.md
CHANGED
|
@@ -77,8 +77,8 @@ unfiltered content from the internet, which is far from neutral the model is str
|
|
| 77 |
|
| 78 |
> Like other large language models for which the diversity (or lack thereof) of training
|
| 79 |
> data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
|
| 80 |
-
> of bias and safety. OPT-
|
| 81 |
-
> hallucination. In general, OPT-
|
| 82 |
> large language models.
|
| 83 |
|
| 84 |
This bias will also affect all fine-tuned versions of this model.
|
|
@@ -118,7 +118,7 @@ re-formatting practices, including removing repetitive/non-informative text like
|
|
| 118 |
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
| 119 |
vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
| 120 |
|
| 121 |
-
The
|
| 122 |
|
| 123 |
### BibTeX entry and citation info
|
| 124 |
|
|
|
|
| 77 |
|
| 78 |
> Like other large language models for which the diversity (or lack thereof) of training
|
| 79 |
> data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
|
| 80 |
+
> of bias and safety. OPT-175M can also have quality issues in terms of generation diversity and
|
| 81 |
+
> hallucination. In general, OPT-175M is not immune from the plethora of issues that plague modern
|
| 82 |
> large language models.
|
| 83 |
|
| 84 |
This bias will also affect all fine-tuned versions of this model.
|
|
|
|
| 118 |
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
| 119 |
vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
| 120 |
|
| 121 |
+
The 175M model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
|
| 122 |
|
| 123 |
### BibTeX entry and citation info
|
| 124 |
|