pishrobpmsAI
/

Pishro-Llama3-8B-Instruct

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

Pishro-Llama3-8B-Instruct / README.md

pishrobpms's picture

Update README.md

bc5fc53 verified 8 months ago

|

history blame contribute delete

1.83 kB

	---
	library_name: transformers
	license: llama3
	language:
	- en
	- fa
	tags:
	- LLM
	- llama-3
	- PishroBPMS
	- conversational
	base_model:
	- meta-llama/Meta-Llama-3-8B-Instruct
	pipeline_tag: text-generation
	---
	# Model Details

	The pishro models are a family of decoder-only models, specifically fine-tuned on Processmaker data, developed by [PishroBPMS](https://pishrobpms.com/). As an initial release, an 8B instruct model from this family is being made available.
	Pishro-Llama3-8B-Instruct is built using the [Meta Llama 3 Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model.


	## How to use

	You can run conversational inference using the Transformers Auto classes with the `generate()` function. Let's look at an example.

	```Python
	import torch
	import transformers
	from transformers import AutoTokenizer, AutoModelForCausalLM
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	messages = [
	{"role": "system",
	"content": "تو یک کارشناس ProcessMaker 4 و PHP هستی و باید فقط یک اسکریپت PHP استاندارد تولید کنی."},
	{"role": "user", "content": "یک اسکریپت PHP ساده برای جمع دو عدد در ProcessMaker 4 بنویس."},
	]
	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)
	terminators = [
	tokenizer.eos_token_id,
	tokenizer.convert_tokens_to_ids("<\|eot_id\|>")
	]
	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	eos_token_id=terminators,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	response = outputs[0][input_ids.shape[-1]:]
	print(tokenizer.decode(response, skip_special_tokens=True))
	```