Instructions to use ModelSpace/GemmaX2-28-9B-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ModelSpace/GemmaX2-28-9B-v0.1 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="ModelSpace/GemmaX2-28-9B-v0.1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ModelSpace/GemmaX2-28-9B-v0.1") model = AutoModelForCausalLM.from_pretrained("ModelSpace/GemmaX2-28-9B-v0.1") - Notebooks
- Google Colab
- Kaggle
VLLM or SGLang?
#3
by dipta007 - opened
Does the model support vllm or sglang?
vllm is supported
vLLM working using docker:
services:
vllm-openai:
image: vllm/vllm-openai:v0.8.5.post1
runtime: nvidia
ports:
- "8000:8000"
volumes:
- /opt/vllm/models/:/models/
environment:
- HF_HUB_OFFLINE=1
ipc: host
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command: --model ModelSpace/GemmaX2-28-9B-v0.1 --task generate --served-model-name "GemmaX2" --gpu-memory-utilization 0.9 --cpu-offload-gb 56
Test the api using /docs (swagger) and /v1/chat/completions:
{"model":"GemmaX2","messages":[{"role":"user","content":"Translate this from Arabic to English: Arabic: أنا أحب الترجمة الآلية English:"}],"max_tokens":512}