| license: apache-2.0 | |
| # GenZ | |
| The most capable commercially usable Instruct Finetuned LLM yet with 8K input token length, latest information & better coding. | |
| ## Inference | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("budecosystem/genz-7b", trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained("budecosystem/genz-7b", torch_dtype=torch.bfloat16) | |
| inputs = tokenizer("The world is", return_tensors="pt") | |
| sample = model.generate(**inputs, max_length=128) | |
| print(tokenizer.decode(sample[0])) | |
| ``` | |
| Use following prompt template | |
| ``` | |
| A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi, how are you? ASSISTANT: | |
| ``` | |
| ## Finetuning | |
| ```bash | |
| python finetune.py | |
| --model_name Salesforce/xgen-7b-8k-base | |
| --data_path dataset.json | |
| --output_dir output | |
| --trust_remote_code | |
| --prompt_column instruction | |
| --response_column output | |
| --pad_token_id 50256 | |
| ``` | |
| Check the GitHub for the code -> [GenZ](https://github.com/BudEcosystem/GenZ) | |