Instructions to use jomangbp/seldonium-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jomangbp/seldonium-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jomangbp/seldonium-3b", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("jomangbp/seldonium-3b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jomangbp/seldonium-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jomangbp/seldonium-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jomangbp/seldonium-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jomangbp/seldonium-3b
- SGLang
How to use jomangbp/seldonium-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jomangbp/seldonium-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jomangbp/seldonium-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jomangbp/seldonium-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jomangbp/seldonium-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jomangbp/seldonium-3b with Docker Model Runner:
docker model run hf.co/jomangbp/seldonium-3b
seldonium-3b
Seldonium-3b is a model that combines two existing models, rhysjones/phi-2-orange and cognitivecomputations/dolphin-2_6-phi-2. This fusion is made possible through a Colab called "LazyMergekit", which uses the Mergekit library to mix large language models (LLM). The fusion method employed in this case is "Linear", which utilizes a weighted average to combine the models. By adjusting the weight parameter, users have precise control over the contribution of each model's features to the final generated model. The fusion process involves intelligently integrating the weights and parameters of the individual models to create a new model that capitalizes on the strengths and capabilities of the original models.
π§© Configuration
models:
- model: rhysjones/phi-2-orange
parameters:
weight: 1.0
- model: cognitivecomputations/dolphin-2_6-phi-2
parameters:
weight: 0.8
merge_method: linear
dtype: float16
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "jomangbp/seldonium-3b"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 10