Feature Extraction
Transformers
bloom
How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("feature-extraction", model="microsoft/bloom-deepspeed-inference-int8")
# Load model directly
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("microsoft/bloom-deepspeed-inference-int8")
model = AutoModel.from_pretrained("microsoft/bloom-deepspeed-inference-int8")
Quick Links

This is a custom INT8 version of the original BLOOM weights to make it fast to use with the DeepSpeed-Inference engine which uses Tensor Parallelism. In this repo the tensors are split into 8 shards to target 8 GPUs.

The full BLOOM documentation is here.

To use the weights in repo, you can adapt to your needs the scripts found here (XXX: they are going to migrate soon to HF Transformers code base, so will need to update the link once moved).

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using microsoft/bloom-deepspeed-inference-int8 1