| --- |
| datasets: |
| - Local |
| license: bigscience-bloom-rail-1.0 |
| language: |
| - id |
| pipeline_tag: text-generation |
| duplicated_from: yodi/karina |
| --- |
| |
| # Table of Contents |
|
|
| 1. [Model Summary](#model-summary) |
| 2. [Use](#use) |
| 4. [Training](#training) |
|
|
| # Model Summary |
|
|
| > We present KARINA, finetuned from BLOOMZ bigscience/bloomz-3b, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOMZ pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages. |
|
|
| # Use |
|
|
| ## Intended use |
|
|
| We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "*prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"*", the model will most likely answer "*Saya Karina. Ada yang bisa saya bantu?*". |
|
|
| ## How to use |
|
|
| ### CPU |
|
|
| <details> |
| <summary> Click to expand </summary> |
|
|
| ```python |
| # pip install -q transformers |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| MODEL_NAME = "yodi/karina" |
| |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
| model = AutoModelForCausalLM.from_pretrained(MODEL_NAME) |
| |
| inputs = tokenizer.encode("Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n", return_tensors="pt") |
| outputs = model.generate(inputs) |
| print(tokenizer.decode(outputs[0])) |
| ``` |
|
|
| </details> |
|
|
| ### GPU in 4 bit |
|
|
| <details> |
| <summary> Click to expand </summary> |
|
|
| ```python |
| # pip install -q transformers |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from transformers import pipeline |
| |
| MODEL_NAME = "yodi/karina" |
| |
| model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True) |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
| |
| prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n" |
| |
| generator = pipeline('text-generation', |
| model=model_4bit, |
| tokenizer=tokenizer, |
| do_sample=False) |
| |
| result = generator(prompt, max_length=256) |
| print(result) |
| |
| ``` |
|
|
| </details> |
|
|
| ### GPU in 8bit |
|
|
| <details> |
| <summary> Click to expand </summary> |
|
|
| ```python |
| # pip install -q transformers |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from transformers import pipeline |
| |
| MODEL_NAME = "yodi/karina" |
| |
| model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_8bit=True) |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
| |
| prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n" |
| |
| generator = pipeline('text-generation', |
| model=model_4bit, |
| tokenizer=tokenizer, |
| do_sample=False) |
| |
| result = generator(prompt, max_length=256) |
| print(result) |
| ``` |
|
|
| </details> |
|
|
| ``` |
| [{'generated_text': 'Given the question:\n{ siapa kamu? }\n---\nAnswer:\nSaya Karina, asisten virtual siap membantu seputar estimasi harga atau pertanyaan lain'}] |
| ``` |
|
|
| ### Infer in Local with Gradio |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from transformers import pipeline |
| import re |
| |
| import gradio as gr |
| |
| MODEL_NAME = "yodi/karina" |
| |
| model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True) |
| tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
| |
| prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n" |
| |
| generator = pipeline('text-generation', |
| model=model_4bit, |
| tokenizer=tokenizer, |
| do_sample=False) |
| |
| def preprocess(text): |
| return f"Given the question:\n{{ {text} }}\n---\nAnswer:\n" |
| |
| def generate(text): |
| preprocess_result = preprocess(text) |
| result = generator(preprocess_result, max_length=256) |
| output = re.split(r'\Given the question:|Answer:|Answer #|Title:',result[0]['generated_text'])[2] |
| |
| return output |
| |
| with gr.Blocks() as demo: |
| input_text = gr.Textbox(label="Input", lines=1) |
| button = gr.Button("Submit") |
| output_text = gr.Textbox(lines=6, label="Output") |
| button.click(generate, inputs=[input_text], outputs=output_text) |
| |
| demo.launch(enable_queue=True, debug=True) |
| ``` |
| And open the gradio url from browser. |
|
|
| ## Training procedure |
|
|
|
|
| The following `bitsandbytes` quantization config was used during training: |
| - load_in_8bit: False |
| - load_in_4bit: True |
| - llm_int8_threshold: 6.0 |
| - llm_int8_skip_modules: None |
| - llm_int8_enable_fp32_cpu_offload: False |
| - llm_int8_has_fp16_weight: False |
| - bnb_4bit_quant_type: nf4 |
| - bnb_4bit_use_double_quant: True |
| - bnb_4bit_compute_dtype: float16 |
|
|
| ### Framework versions |
|
|
| - PEFT 0.5.0.dev0 |
|
|
| <!-- Necessary for whitespace --> |
| ### |
|
|
| # Limitations |
|
|
| **Prompt Engineering:** The performance may vary depending on the prompt and its following BLOOMZ models. |
|
|
| # Training |
|
|
| ## Model |
|
|
| - **Architecture:** Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file |
|
|