| --- |
| datasets: |
| - bigscience/xP3 |
| license: bigscience-bloom-rail-1.0 |
| language: |
| - ak |
| - ar |
| - as |
| - bm |
| - bn |
| - ca |
| - code |
| - en |
| - es |
| - eu |
| - fon |
| - fr |
| - gu |
| - hi |
| - id |
| - ig |
| - ki |
| - kn |
| - lg |
| - ln |
| - ml |
| - mr |
| - ne |
| - nso |
| - ny |
| - or |
| - pa |
| - pt |
| - rn |
| - rw |
| - sn |
| - st |
| - sw |
| - ta |
| - te |
| - tn |
| - ts |
| - tum |
| - tw |
| - ur |
| - vi |
| - wo |
| - xh |
| - yo |
| - zh |
| - zu |
| programming_language: |
| - C |
| - C++ |
| - C# |
| - Go |
| - Java |
| - JavaScript |
| - Lua |
| - PHP |
| - Python |
| - Ruby |
| - Rust |
| - Scala |
| - TypeScript |
| pipeline_tag: text-generation |
| widget: |
| - text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative?" |
| example_title: "zh-en sentiment" |
| - text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?" |
| example_title: "zh-zh sentiment" |
| - text: "Suggest at least five related search terms to \"Mạng neural nhân tạo\"." |
| example_title: "vi-en query" |
| - text: "Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels»." |
| example_title: "fr-fr query" |
| - text: "Explain in a sentence in Telugu what is backpropagation in neural networks." |
| example_title: "te-en qa" |
| - text: "Why is the sky blue?" |
| example_title: "en-en qa" |
| - text: "Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is \"Heroes Come in All Shapes and Sizes\". Story (in Spanish):" |
| example_title: "es-en fable" |
| - text: "Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \"Violence is the last refuge of the incompetent\". Fable (in Hindi):" |
| example_title: "hi-en fable" |
| --- |
| |
| # Table of Contents |
|
|
| 1. [Model Summary](#model=summary) |
| 2. [Use](#use) |
| 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations) |
| 4. [Training Details](#training-details) |
| 5. [Evaluation](#evaluation) |
| 6. [Environmental Impact](#environmental-impact) |
| 7. [Citation](#citation) |
| 9. [How To Get Started With the Model](#how-to-get-started-with-the-model) |
|
|
| # Model Summary |
|
|
| > We present BLOOMZ & mT0, a family of models capable of following human instructions in hundreds of languages. By finetuning large BLOOM & mT5 pretrained multilingual language models on our multilingual task mixture (xP3), we discover various generalization properties of our finetuned models acrosss tasks and languages. |
|
|
| - **Repository:** [bigscience-workshop/xmtf](https://github.com/bigscience-workshop/xmtf) |
| - **Paper:** [TODO] |
| - **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co) |
| - **BLOOMZ & mT0 Model Family:** |
| |Name|Explanation| |
| |----|-----------| |
| |[bloomz-560m](https://huggingface.co/bigscience/bloomz-560m)| 560M parameter multitask finetuned version of [bloom-560m](https://huggingface.co/bigscience/bloom-560m) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[bloomz-1b1](https://huggingface.co/bigscience/bloomz-1b1)| 1.1B parameter multitask finetuned version of [bloom-1b1](https://huggingface.co/bigscience/bloom-1b1) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[bloomz-1b7](https://huggingface.co/bigscience/bloomz-1b7)| 1.7B parameter multitask finetuned version of [bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[bloomz-3b](https://huggingface.co/bigscience/bloomz-3b)| 3B parameter multitask finetuned version of [bloom-3b](https://huggingface.co/bigscience/bloom-3b) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[bloomz](https://huggingface.co/bigscience/bloomz)|176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/bigscience/xP3)| |
| ||| |
| |[bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). **Better than [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1) when prompting in non-English**| |
| |[bloomz-mt](https://huggingface.co/bigscience/bloomz-mt)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). **Better than [bloomz](https://huggingface.co/bigscience/bloomz) when prompting in non-English**| |
| ||| |
| |[bloomz-7b1-p3](https://huggingface.co/bigscience/bloomz-7b1)| 7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [P3](https://huggingface.co/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)**| |
| |[bloomz-p3](https://huggingface.co/bigscience/bloomz)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [P3](https://huggingface.co/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz](https://huggingface.co/bigscience/bloomz)**| |
| ||| |
| ||| |
| |[mt0-small](https://huggingface.co/bigscience/mt0-xxl)|300M parameter multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[mt0-base](https://huggingface.co/bigscience/mt0-xxl)|580M parameter multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[mt0-large](https://huggingface.co/bigscience/mt0-xxl)|1.2B parameter multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[mt0-xl](https://huggingface.co/bigscience/mt0-xxl)|3.7B parameter multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/bigscience/xP3)| |
| |[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3)| |
| ||| |
| |[mt0-xxl-mt](https://huggingface.co/bigscience/mt0-xxl-mt)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). **Better than [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl) when prompting in non-English**| |
| ||| |
| |[mt0-xxl-p3](https://huggingface.co/bigscience/mt0-xxl-p3)| 13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/bigscience/P3). **Released for research purposes, performance is inferior to [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)**| |
| |----|-----------| |
|
|
|
|
|
|
|
|
|
|
| # Intended uses |
|
|
| You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask *"Translate this to Chinese: Je t'aime."*, and the model will hopefully generate *"我爱你"*. |
|
|
| # How to use |
|
|
| Here is how to use the model in PyTorch: |
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m") |
| model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m") |
| |
| inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt") |
| outputs = model.generate(inputs) |
| print(tokenizer.decode(outputs[0])) |
| ``` |
|
|
| To use another checkpoint, replace the path in `AutoTokenizer` and `AutoModelForCausalLM`. |
|
|
| **Note: 176B models are trained with bfloat16, while smaller models are trained with fp16. We recommend using the same precision type or fp32 at inference** |
|
|
| # Limitations |
|
|
| - Large model size may require large computational resources |
| - High performance variance depending on the prompt |
|
|
| # BibTeX entry and citation info |
|
|
| ```bibtex |
| TODO |
| ``` |