Text Generation
Transformers
Safetensors
English
weblinx
text-generation-inference
web-agents
agents
Instructions to use McGill-NLP/Sheared-LLaMA-2.7B-weblinx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use McGill-NLP/Sheared-LLaMA-2.7B-weblinx with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="McGill-NLP/Sheared-LLaMA-2.7B-weblinx")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("McGill-NLP/Sheared-LLaMA-2.7B-weblinx", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use McGill-NLP/Sheared-LLaMA-2.7B-weblinx with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "McGill-NLP/Sheared-LLaMA-2.7B-weblinx" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "McGill-NLP/Sheared-LLaMA-2.7B-weblinx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/McGill-NLP/Sheared-LLaMA-2.7B-weblinx
- SGLang
How to use McGill-NLP/Sheared-LLaMA-2.7B-weblinx with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "McGill-NLP/Sheared-LLaMA-2.7B-weblinx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "McGill-NLP/Sheared-LLaMA-2.7B-weblinx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "McGill-NLP/Sheared-LLaMA-2.7B-weblinx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "McGill-NLP/Sheared-LLaMA-2.7B-weblinx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use McGill-NLP/Sheared-LLaMA-2.7B-weblinx with Docker Model Runner:
docker model run hf.co/McGill-NLP/Sheared-LLaMA-2.7B-weblinx
Commit History
Add inputs.json c2c7a5d
xhluca commited on
Update README.md d554b80 verified
Update README.md fca5b4d verified
Update README.md 7daeb16 verified
Update README.md 4414bcf verified
Update README.md 501986f verified
Update README.md 7b498c0 verified
Create README.md b23f721 verified
Add model, tokenizer, and config files, input records. 6b3e333
xhluca commited on
Add .json to .gitattributes to track changes in json files cdbbc69
xhluca commited on