Instructions to use facebook/xglm-564M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/xglm-564M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="facebook/xglm-564M")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("facebook/xglm-564M") model = AutoModelForCausalLM.from_pretrained("facebook/xglm-564M") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use facebook/xglm-564M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "facebook/xglm-564M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/xglm-564M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/facebook/xglm-564M
- SGLang
How to use facebook/xglm-564M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "facebook/xglm-564M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/xglm-564M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "facebook/xglm-564M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "facebook/xglm-564M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use facebook/xglm-564M with Docker Model Runner:
docker model run hf.co/facebook/xglm-564M
Update TF weights
Model converted by the transformers' pt_to_tf CLI.
All converted model outputs and hidden layers were validated against its Pytorch counterpart. Maximum crossload output difference=1.465e-03; Maximum converted output difference=1.465e-03.
@patrickvonplaten the weights I've uploaded before were built with an MVP of the pt-to-tf CLI, which was not converting (or checking) the model head. These weights have the model head converted properly.
Merging this PR unblocks the following GH PR. After we confirm that these weights unblock the PR above (through passing tests), we can push the conversion for other XGLM model sizes.
cc @Stancld
Thanks @joaogante