Text Generation
Transformers
PyTorch
Safetensors
English
gpt_neox
causal-lm
text-generation-inference
Instructions to use EleutherAI/gpt-neox-20b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EleutherAI/gpt-neox-20b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EleutherAI/gpt-neox-20b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use EleutherAI/gpt-neox-20b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EleutherAI/gpt-neox-20b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/gpt-neox-20b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/EleutherAI/gpt-neox-20b
- SGLang
How to use EleutherAI/gpt-neox-20b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EleutherAI/gpt-neox-20b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/gpt-neox-20b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EleutherAI/gpt-neox-20b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/gpt-neox-20b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use EleutherAI/gpt-neox-20b with Docker Model Runner:
docker model run hf.co/EleutherAI/gpt-neox-20b
Add link to gpt-neox-20b MOT badge to README.md file
#29 opened 7 months ago
by
Nwoke
Gpt-neox-20b
2
#27 opened over 1 year ago
by
Shubham1611
how to use this with ollama
1
#26 opened over 1 year ago
by
Pawankumar9413
Upload FlaxGPTNeoXForCausalLM
1
#24 opened over 2 years ago
by
heegyu
I have been asked to put ketchup in pie and take vitamin S!
🤯 3
#21 opened about 3 years ago
by
sreeparna
Max context length/input token length.
#20 opened about 3 years ago
by
gsaivinay
Is it possible to train this model on a commercially available cloud machine?
1
#19 opened about 3 years ago
by
Walexum
<Response [422]>
#18 opened about 3 years ago
by
skrishna
The generated results using inference API and the webpage are very different! Is the model called from the api the same as the one called from the webpage?
#17 opened about 3 years ago
by
zouhanyi
Fine-Tuning GPT-Neox-20B using Hugging Face Transformers
1
#16 opened over 3 years ago
by
Dulanjaya
Unusual behaviour with inference using transformers library
1
#15 opened over 3 years ago
by
vmajor