nmitchko
/

medfalcon-40b-lora

Text Generation

Model card Files Files and versions

medfalcon-40b-lora / README.md

nmitchko's picture

Update README.md

ead6e81 over 2 years ago

|

history blame contribute delete

1.72 kB

	---
	language:
	- en
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- medical
	license: cc-by-nc-3.0
	---

	# MedFalcon 40b LoRA


	## Model Description

	### Architecture
	`nmitchko/medfalcon-40b-lora` is a large language model LoRa specifically fine-tuned for medical domain tasks.
	It is based on [`Falcon-40b-instruct`](https://huggingface.co/tiiuae/falcon-40b-instruct/) at 40 billion parameters.

	The primary goal of this model is to improve question-answering and medical dialogue tasks.
	It was trained using [LoRA](https://arxiv.org/abs/2106.09685), specifically [QLora](https://github.com/artidoro/qlora), to reduce memory footprint.

	> This Lora supports 4-bit and 8-bit modes.

	### Requirements

	```
	bitsandbytes>=0.39.0
	peft
	transformers
	```

	Steps to load this model:
	1. Load base model using QLORA
	2. Apply LoRA using peft

	```python
	#
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import transformers
	import torch

	model = "tiiuae/falcon-40b-instruct"
	LoRA = "nmitchko/medfalcon-40b-lora"

	tokenizer = AutoTokenizer.from_pretrained(model)

	model = AutoModelForCausalLM.from_pretrained(model,
	load_in_8bit=load_8bit,
	torch_dtype=torch.float16,
	trust_remote_code=True,
	)

	model = PeftModel.from_pretrained(model, LoRA)

	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	tokenizer=tokenizer,
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	device_map="auto",
	)

	sequences = pipeline(
	"What does the drug ceftrioxone do?\nDoctor:",
	max_length=200,
	do_sample=True,
	top_k=40,
	num_return_sequences=1,
	eos_token_id=tokenizer.eos_token_id,
	)

	for seq in sequences:
	print(f"Result: {seq['generated_text']}")
	```