RichardErkhov commited on
Commit
805c41b
·
verified ·
1 Parent(s): 9543e17

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +197 -0
README.md ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ karina - bnb 8bits
11
+ - Model creator: https://huggingface.co/yodi/
12
+ - Original model: https://huggingface.co/yodi/karina/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ datasets:
20
+ - Local
21
+ license: bigscience-bloom-rail-1.0
22
+ language:
23
+ - id
24
+ pipeline_tag: text-generation
25
+ ---
26
+
27
+ # Table of Contents
28
+
29
+ 1. [Model Summary](#model-summary)
30
+ 2. [Use](#use)
31
+ 4. [Training](#training)
32
+
33
+ # Model Summary
34
+
35
+ > We present KARINA, finetuned from BLOOMZ bigscience/bloomz-3b, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOMZ pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages.
36
+
37
+ # Use
38
+
39
+ ## Intended use
40
+
41
+ We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "*prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"*", the model will most likely answer "*Saya Karina. Ada yang bisa saya bantu?*".
42
+
43
+ ## How to use
44
+
45
+ ### CPU
46
+
47
+ <details>
48
+ <summary> Click to expand </summary>
49
+
50
+ ```python
51
+ # pip install -q transformers
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer
53
+
54
+ MODEL_NAME = "yodi/karina"
55
+
56
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
57
+ model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
58
+
59
+ inputs = tokenizer.encode("Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n", return_tensors="pt")
60
+ outputs = model.generate(inputs)
61
+ print(tokenizer.decode(outputs[0]))
62
+ ```
63
+
64
+ </details>
65
+
66
+ ### GPU in 4 bit
67
+
68
+ <details>
69
+ <summary> Click to expand </summary>
70
+
71
+ ```python
72
+ # pip install -q transformers
73
+ from transformers import AutoModelForCausalLM, AutoTokenizer
74
+ from transformers import pipeline
75
+
76
+ MODEL_NAME = "yodi/karina"
77
+
78
+ model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
79
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
80
+
81
+ prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"
82
+
83
+ generator = pipeline('text-generation',
84
+ model=model_4bit,
85
+ tokenizer=tokenizer,
86
+ do_sample=False)
87
+
88
+ result = generator(prompt, max_length=256)
89
+ print(result)
90
+
91
+ ```
92
+
93
+ </details>
94
+
95
+ ### GPU in 8bit
96
+
97
+ <details>
98
+ <summary> Click to expand </summary>
99
+
100
+ ```python
101
+ # pip install -q transformers
102
+ from transformers import AutoModelForCausalLM, AutoTokenizer
103
+ from transformers import pipeline
104
+
105
+ MODEL_NAME = "yodi/karina"
106
+
107
+ model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_8bit=True)
108
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
109
+
110
+ prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"
111
+
112
+ generator = pipeline('text-generation',
113
+ model=model_4bit,
114
+ tokenizer=tokenizer,
115
+ do_sample=False)
116
+
117
+ result = generator(prompt, max_length=256)
118
+ print(result)
119
+ ```
120
+
121
+ </details>
122
+
123
+ ```
124
+ [{'generated_text': 'Given the question:\n{ siapa kamu? }\n---\nAnswer:\nSaya Karina, asisten virtual siap membantu seputar estimasi harga atau pertanyaan lain'}]
125
+ ```
126
+
127
+ ### Infer in Local with Gradio
128
+
129
+ ```python
130
+ from transformers import AutoModelForCausalLM, AutoTokenizer
131
+ from transformers import pipeline
132
+ import re
133
+
134
+ import gradio as gr
135
+
136
+ MODEL_NAME = "yodi/karina"
137
+
138
+ model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
139
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
140
+
141
+ generator = pipeline('text-generation',
142
+ model=model_4bit,
143
+ tokenizer=tokenizer,
144
+ do_sample=False)
145
+
146
+ def preprocess(text):
147
+ return f"Given the question:\n{{ {text} }}\n---\nAnswer:\n"
148
+
149
+ def generate(text):
150
+ preprocess_result = preprocess(text)
151
+ result = generator(preprocess_result, max_length=256)
152
+ output = re.split(r'\n---\nAnswer:\n',result[0]['generated_text'])[1]
153
+
154
+ return output
155
+
156
+ with gr.Blocks() as demo:
157
+ input_text = gr.Textbox(label="Input", lines=1)
158
+ button = gr.Button("Submit")
159
+ output_text = gr.Textbox(lines=6, label="Output")
160
+ button.click(generate, inputs=[input_text], outputs=output_text)
161
+
162
+ demo.launch(enable_queue=True, debug=True)
163
+ ```
164
+ And open the gradio url from browser.
165
+
166
+ ## Training procedure
167
+
168
+
169
+ The following `bitsandbytes` quantization config was used during training:
170
+ - load_in_8bit: False
171
+ - load_in_4bit: True
172
+ - llm_int8_threshold: 6.0
173
+ - llm_int8_skip_modules: None
174
+ - llm_int8_enable_fp32_cpu_offload: False
175
+ - llm_int8_has_fp16_weight: False
176
+ - bnb_4bit_quant_type: nf4
177
+ - bnb_4bit_use_double_quant: True
178
+ - bnb_4bit_compute_dtype: float16
179
+
180
+ ### Framework versions
181
+
182
+ - PEFT 0.5.0.dev0
183
+
184
+ <!-- Necessary for whitespace -->
185
+ ###
186
+
187
+ # Limitations
188
+
189
+ **Prompt Engineering:** The performance may vary depending on the prompt and its following BLOOMZ models.
190
+
191
+ # Training
192
+
193
+ ## Model
194
+
195
+ - **Architecture:** Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file
196
+
197
+