rootxhacker
/

Apollo-v3-32B

Text Generation

Model card Files Files and versions

Apollo-v3-32B / README.md

lbourdois's picture

Improve language tag

26ed770 verified 10 months ago

|

1.55 kB

	---
	license: mit
	language:
	- zho
	- eng
	- fra
	- spa
	- por
	- deu
	- ita
	- rus
	- jpn
	- kor
	- vie
	- tha
	- ara
	base_model:
	- Qwen/Qwen2.5-32B-Instruct
	pipeline_tag: text-generation
	---

	# Apollo Model

	This is an experimental hybrid reasoning model built on Qwen2.5-32B-Instruct

	# GGUF

	mradermacher/Apollo-v3-32B-GGUF

	thanks mradermacher for this gguf

	### Merge Method

	This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as a base.


	### Enable reasoning

	prompt the LLM with think deeper and step by step

	### Example code

	```

	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "rootxhacker/Apollo-v3-32B"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "How many r's are in the word strawberry"
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=32768
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)

	```