Instructions to use LLaMAX/LLaMAX2-7B-X-CSQA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLaMAX/LLaMAX2-7B-X-CSQA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLaMAX/LLaMAX2-7B-X-CSQA")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLaMAX/LLaMAX2-7B-X-CSQA") model = AutoModelForCausalLM.from_pretrained("LLaMAX/LLaMAX2-7B-X-CSQA") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LLaMAX/LLaMAX2-7B-X-CSQA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLaMAX/LLaMAX2-7B-X-CSQA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLaMAX/LLaMAX2-7B-X-CSQA", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLaMAX/LLaMAX2-7B-X-CSQA
- SGLang
How to use LLaMAX/LLaMAX2-7B-X-CSQA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLaMAX/LLaMAX2-7B-X-CSQA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLaMAX/LLaMAX2-7B-X-CSQA", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLaMAX/LLaMAX2-7B-X-CSQA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLaMAX/LLaMAX2-7B-X-CSQA", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLaMAX/LLaMAX2-7B-X-CSQA with Docker Model Runner:
docker model run hf.co/LLaMAX/LLaMAX2-7B-X-CSQA
Commit History
Update README.md 040b0f9 verified
Update README.md e1e3f2a verified
Update README.md 5f21b74 verified
Update README.md e5ae727 verified
Update README.md 05472ed verified
update readme e5b8673
update readme c10d85a
update readme c722e15
update README adcb711
First model version c758340
initial commit 6d25232 verified
TransLLaMA commited on