|
|
--- |
|
|
datasets: |
|
|
- Local |
|
|
license: bigscience-bloom-rail-1.0 |
|
|
language: |
|
|
- id |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Table of Contents |
|
|
|
|
|
1. [Model Summary](#model-summary) |
|
|
2. [Use](#use) |
|
|
4. [Training](#training) |
|
|
|
|
|
# Model Summary |
|
|
|
|
|
> We present KARINA, finetuned from BLOOMZ bigscience/bloomz-3b, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOMZ pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages. |
|
|
|
|
|
# Use |
|
|
|
|
|
## Intended use |
|
|
|
|
|
We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "*prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"*", the model will most likely answer "*Saya Karina. Ada yang bisa saya bantu?*". |
|
|
|
|
|
## How to use |
|
|
|
|
|
### CPU |
|
|
|
|
|
<details> |
|
|
<summary> Click to expand </summary> |
|
|
|
|
|
```python |
|
|
# pip install -q transformers |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
MODEL_NAME = "yodi/karina" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
|
|
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME) |
|
|
|
|
|
inputs = tokenizer.encode("Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n", return_tensors="pt") |
|
|
outputs = model.generate(inputs) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
### GPU in 4 bit |
|
|
|
|
|
<details> |
|
|
<summary> Click to expand </summary> |
|
|
|
|
|
```python |
|
|
# pip install -q transformers |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from transformers import pipeline |
|
|
|
|
|
MODEL_NAME = "yodi/karina" |
|
|
|
|
|
model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True) |
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
|
|
|
|
|
prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n" |
|
|
|
|
|
generator = pipeline('text-generation', |
|
|
model=model_4bit, |
|
|
tokenizer=tokenizer, |
|
|
do_sample=False) |
|
|
|
|
|
result = generator(prompt, max_length=256) |
|
|
print(result) |
|
|
|
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
### GPU in 8bit |
|
|
|
|
|
<details> |
|
|
<summary> Click to expand </summary> |
|
|
|
|
|
```python |
|
|
# pip install -q transformers |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from transformers import pipeline |
|
|
|
|
|
MODEL_NAME = "yodi/karina" |
|
|
|
|
|
model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_8bit=True) |
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
|
|
|
|
|
prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n" |
|
|
|
|
|
generator = pipeline('text-generation', |
|
|
model=model_4bit, |
|
|
tokenizer=tokenizer, |
|
|
do_sample=False) |
|
|
|
|
|
result = generator(prompt, max_length=256) |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
</details> |
|
|
|
|
|
``` |
|
|
[{'generated_text': 'Given the question:\n{ siapa kamu? }\n---\nAnswer:\nSaya Karina, asisten virtual siap membantu seputar estimasi harga atau pertanyaan lain'}] |
|
|
``` |
|
|
|
|
|
### Infer in Local with Gradio |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from transformers import pipeline |
|
|
import re |
|
|
|
|
|
import gradio as gr |
|
|
|
|
|
MODEL_NAME = "yodi/karina" |
|
|
|
|
|
model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True) |
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
|
|
|
|
|
generator = pipeline('text-generation', |
|
|
model=model_4bit, |
|
|
tokenizer=tokenizer, |
|
|
do_sample=False) |
|
|
|
|
|
def preprocess(text): |
|
|
return f"Given the question:\n{{ {text} }}\n---\nAnswer:\n" |
|
|
|
|
|
def generate(text): |
|
|
preprocess_result = preprocess(text) |
|
|
result = generator(preprocess_result, max_length=256) |
|
|
output = re.split(r'\n---\nAnswer:\n',result[0]['generated_text'])[1] |
|
|
|
|
|
return output |
|
|
|
|
|
with gr.Blocks() as demo: |
|
|
input_text = gr.Textbox(label="Input", lines=1) |
|
|
button = gr.Button("Submit") |
|
|
output_text = gr.Textbox(lines=6, label="Output") |
|
|
button.click(generate, inputs=[input_text], outputs=output_text) |
|
|
|
|
|
demo.launch(enable_queue=True, debug=True) |
|
|
``` |
|
|
And open the gradio url from browser. |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
|
- load_in_8bit: False |
|
|
- load_in_4bit: True |
|
|
- llm_int8_threshold: 6.0 |
|
|
- llm_int8_skip_modules: None |
|
|
- llm_int8_enable_fp32_cpu_offload: False |
|
|
- llm_int8_has_fp16_weight: False |
|
|
- bnb_4bit_quant_type: nf4 |
|
|
- bnb_4bit_use_double_quant: True |
|
|
- bnb_4bit_compute_dtype: float16 |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- PEFT 0.5.0.dev0 |
|
|
|
|
|
<!-- Necessary for whitespace --> |
|
|
### |
|
|
|
|
|
# Limitations |
|
|
|
|
|
**Prompt Engineering:** The performance may vary depending on the prompt and its following BLOOMZ models. |
|
|
|
|
|
# Training |
|
|
|
|
|
## Model |
|
|
|
|
|
- **Architecture:** Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file |
|
|
|