Instructions to use bigscience/bloom with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigscience/bloom with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigscience/bloom")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom") model = AutoModelForCausalLM.from_pretrained("bigscience/bloom") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigscience/bloom with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigscience/bloom" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigscience/bloom
- SGLang
How to use bigscience/bloom with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigscience/bloom" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigscience/bloom" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloom", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigscience/bloom with Docker Model Runner:
docker model run hf.co/bigscience/bloom
Why can't the model be run (really slowly) on consumer hardware?
Hi! I'm curious as to why this isn't possible!
Just keep the stuff on an SSD and split the work 400/8 on a consumer GPU?
If you are not concerned about inference times you don't even need GPUs, it runs fine on CPUs given you have enough system RAM. Much discussion has been had on performance on different hardware configurations. See:
https://huggingface.co/bigscience/bloom/discussions/45
https://huggingface.co/bigscience/bloom/discussions/59
https://huggingface.co/bigscience/bloom/discussions/58
In case it helps, I wrote a blog post that shows how to run BLOOM (the largest 176B version) on a desktop computer, even if you don’t have a GPU. In my computer (i5 11gen, 16GB RAM, 1TB SSD Samsung 980 pro), the generation takes 3 minutes per token using only the CPU, which is a little slow but manageable. See the blog post link below.
In case it helps, I wrote a blog post that shows how to run BLOOM (the largest 176B version) on a desktop computer, even if you don’t have a GPU. In my computer (i5 11gen, 16GB RAM, 1TB SSD Samsung 980 pro), the generation takes 3 minutes per token using only the CPU, which is a little slow but manageable. See the blog post link below.
Hello!
I follow the guide but the tokenizer.json downloaded file is invalid, in fact not even a json file. I found it somewhere in internet but now when I do
final_lnorm.load_state_dict(get_state_dict(shard_num=72, prefix="ln_f."))
it says
File "/home/usuari/anaconda3/lib/python3.9/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
Do you know how to solve it?
Thanks!
Hi @cdani , I suspect the files were not properly downloaded and some files might be pointers instead of the actual files. The easiest way to fix this is to download the entire repo from scratch using git lfs as follows:
git lfs install
git clone https://huggingface.co/bigscience/bloom
This will download the entire repo (including some repo history). When the download is complete, make sure the size of the files matches the size in the web repository.
Closing due to lack of activity. Feel free to re-open if you feel that the discussion is not finished yet.