styal
/

Reflection-Gemma-2-2b

Text Generation

Trained with AutoTrain

text-generation-inference

4-bit precision

Model card Files Files and versions

Metrics Training metrics Community

Reflection-Gemma-2-2b / README.md

Arthur LAGACHERIE

Update README.md

f62fbc0 verified over 1 year ago

|

history blame contribute delete

2.04 kB

	---
	tags:
	- autotrain
	- text-generation-inference
	- text-generation
	- peft
	library_name: transformers
	base_model: Arthur-LAGACHERIE/Gemma-2-2b-4bit
	widget:
	- messages:
	- role: user
	content: What is your favorite condiment?
	license: other
	---


	# Usage

	This model uses the 4-bits quantization. So you need to install bitsandbytes to use it.
	```python
	pip install bitsandbytes
	```
	For inference (streaming):
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	import torch
	from transformers import TextIteratorStreamer
	from threading import Thread
	device = 'cuda' if torch.cuda.is_available() else 'cpu'

	model_id = "Arthur-LAGACHERIE/Reflection-Gemma-2-2b"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)

	prompt = """
	### System
	You are a world-class AI system, capable of complex reasoning and reflection.
	Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags.
	If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
	Try an answer and see if it's correct before generate the ouput.
	But don't forget to think very carefully.

	### Question
	The question here.
	"""

	chat = [
	{ "role": "user", "content": prompt},
	]
	question = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
	question = tokenizer(question, return_tensors="pt").to(device)
	streamer = TextIteratorStreamer(tokenizer, skip_prompt=True)
	generation_kwargs = dict(question, streamer=streamer, max_new_tokens=4000)
	thread = Thread(target=model.generate, kwargs=generation_kwargs)

	# generate
	thread.start()
	for new_text in streamer:
	print(new_text, end="")
	```

	# Some info
	If you want to know how I fine tune it, what datasets I used and the training code. [See here]()



	# Model Trained Using AutoTrain

	This model was trained using AutoTrain. For more information, please visit [AutoTrain](https://hf.co/docs/autotrain).