Open-Orca/SlimOrca-Dedup
Viewer • Updated • 363k • 3.86k • 92
How to use Fizzarolli/sappha-2b-v3 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Fizzarolli/sappha-2b-v3") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Fizzarolli/sappha-2b-v3")
model = AutoModelForCausalLM.from_pretrained("Fizzarolli/sappha-2b-v3")How to use Fizzarolli/sappha-2b-v3 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Fizzarolli/sappha-2b-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Fizzarolli/sappha-2b-v3",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/Fizzarolli/sappha-2b-v3
How to use Fizzarolli/sappha-2b-v3 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Fizzarolli/sappha-2b-v3" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Fizzarolli/sappha-2b-v3",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Fizzarolli/sappha-2b-v3" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Fizzarolli/sappha-2b-v3",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use Fizzarolli/sappha-2b-v3 with Docker Model Runner:
docker model run hf.co/Fizzarolli/sappha-2b-v3
a slightly less experimental qlora instruct finetune of the gemma-2b base model. trained with unsloth.
| gemma-2b-it | sappha-2b-v3 | dolphin-2.8-gemma-2b | |
|---|---|---|---|
| MMLU (five-shot) | 36.98 | 38.02 | 37.89 |
| HellaSwag (zero-shot) | 49.22 | 51.70 | 47.79 |
| PIQA (one-shot) | 75.08 | 75.46 | 71.16 |
| TruthfulQA (zero-shot) | 37.51 | 31.65 | 37.15 |
basic chatml:
<|im_start|>system
You are a useful and helpful AI assistant.<|im_end|>
<|im_start|>user
what are LLMs?<|im_end|>
<|im_start|>assistant
LLMs, or Large Language Models, are advanced artificial intelligence systems that can perform tasks similar to human language. They are trained on vast amounts of data and can understand and respond to human queries. LLMs are often used in various applications, such as language translation, text generation, and question answering.<|im_end|>
gguf: https://huggingface.co/Fizzarolli/sappha-2b-v3-GGUF
it was a private failure :)