Instructions to use Tensoic/Cerule-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Tensoic/Cerule-v0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Tensoic/Cerule-v0.1", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Tensoic/Cerule-v0.1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Tensoic/Cerule-v0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Tensoic/Cerule-v0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tensoic/Cerule-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Tensoic/Cerule-v0.1
- SGLang
How to use Tensoic/Cerule-v0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Tensoic/Cerule-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tensoic/Cerule-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Tensoic/Cerule-v0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tensoic/Cerule-v0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Tensoic/Cerule-v0.1 with Docker Model Runner:
docker model run hf.co/Tensoic/Cerule-v0.1
fix integration with huggingface
under construction ⚙️
do not merge yet until I test it
PR related info
All good (i think), i don't have enough compute power to test this out since i'm on free tier, so let me know if everything is running like it's supposed to be
you can test this pr before merging via the following code
pip install -qU "transformers>=4.39.1" flash_attn
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Tensoic/Cerule-v0.1", trust_remote_code=True,
revision="refs/pr/2" # the revision parameter is only used to run the code from this pr
)
I also updated the readme to let people know how to use the model.
tips
when working with custom architectures I recommend using huggingface's PyTorchModelHubMixin I also made a basic template on how to use it in this github repo integrating it with pip.
If you have any more questions or feedbacks or if you have any other custom models do not hesitate to reach out
Damn THANKS A LOT! will test it out asap
Hey @not-lain why did you change the model type to phi-msft here?
https://huggingface.co/Tensoic/Cerule-v0.1/commit/4a02b161d5142cd92a2082aae885bc3cc9584aca
@adarshxs you are right this pr is absolutely useless XD.
the only thing that was broken was my colab envirenment.
all I had to do from the beginning is
!pip install -qU "transformers>=4.39.1" flash_attn
I'm closing this pr
but i'm keeping pr/3 open since the _name_or_path is essential for cases such as finetuning