Update README.md

2bef08a verified about 2 years ago

5.79 kB

	---
	license: cc-by-4.0
	---
	# Piccolo-2x7b


	In loving memory of my dog Klaus (Piccolo)

	_~ Piccolo (Italian): the little one ~_

	![piccolo.png](piccolo.png)


	## GGUF

	Quants are available [here](https://huggingface.co/macadeliccc/piccolo-2x7b-GGUF)
	# Code Example

	Inference and Evaluation colab available [here](https://colab.research.google.com/drive/1ZqLNvVvtFHC_4v2CgcMVh7pP9Fvx0SbI?usp=sharing)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	def generate_response(prompt):
	"""
	Generate a response from the model based on the input prompt.
	Args:
	prompt (str): Prompt for the model.

	Returns:
	str: The generated response from the model.
	"""
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	return response

	model_id = "macadeliccc/piccolo-2x7b"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

	prompt = "What is the best way to train Cane Corsos?"

	print("Response:")
	print(generate_response(prompt), "\n")
	```

	The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.

	# Evaluations

	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|----------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|arc_easy \|Yaml \|none \| 0\|acc \|0.8552\|± \|0.0072\|
	\| \| \|none \| 0\|acc_norm\|0.8237\|± \|0.0078\|
	\|boolq \|Yaml \|none \| 0\|acc \|0.8749\|± \|0.0058\|
	\|hellaswag \|Yaml \|none \| 0\|acc \|0.6734\|± \|0.0047\|
	\| \| \|none \| 0\|acc_norm\|0.8489\|± \|0.0036\|
	\|openbookqa\|Yaml \|none \| 0\|acc \|0.3640\|± \|0.0215\|
	\| \| \|none \| 0\|acc_norm\|0.4780\|± \|0.0224\|
	\|piqa \|Yaml \|none \| 0\|acc \|0.8330\|± \|0.0087\|
	\| \| \|none \| 0\|acc_norm\|0.8368\|± \|0.0086\|
	\|winogrande\|Yaml \|none \| 0\|acc \|0.7703\|± \|0.0118\|


	# Model Evaluation Summary

	\| Model \| AGIEval \| GPT4All \| TruthfulQA \| Bigbench \| Average \|
	\|-------\|---------\|---------\|------------\|----------\|---------\|
	\| piccolo-math-2x7b \| 43.89% \| 74.98% \| 63.96% \| 44.99% \| 56.96% \|

	## AGIEval

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| agieval_aqua_rat \| 0 \| acc \| 24.41 \| ± 2.70 \|
	\| \| \| acc_norm \| 24.80 \| ± 2.72 \|
	\| agieval_logiqa_en \| 0 \| acc \| 35.79 \| ± 1.88 \|
	\| \| \| acc_norm \| 36.71 \| ± 1.89 \|
	\| agieval_lsat_ar \| 0 \| acc \| 23.48 \| ± 2.80 \|
	\| \| \| acc_norm \| 23.91 \| ± 2.82 \|
	\| agieval_lsat_lr \| 0 \| acc \| 49.22 \| ± 2.22 \|
	\| \| \| acc_norm \| 50.00 \| ± 2.22 \|
	\| agieval_lsat_rc \| 0 \| acc \| 63.94 \| ± 2.93 \|
	\| \| \| acc_norm \| 64.31 \| ± 2.93 \|
	\| agieval_sat_en \| 0 \| acc \| 77.18 \| ± 2.93 \|
	\| \| \| acc_norm \| 76.70 \| ± 2.95 \|
	\| agieval_sat_en_without_passage \| 0 \| acc \| 45.15 \| ± 3.48 \|
	\| \| \| acc_norm \| 44.66 \| ± 3.47 \|
	\| agieval_sat_math \| 0 \| acc \| 33.64 \| ± 3.19 \|
	\| \| \| acc_norm \| 30.00 \| ± 3.10 \|

	Average: 43.89%

	## GPT4All

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| arc_challenge \| 0 \| acc \| 61.86 \| ± 1.42 \|
	\| \| \| acc_norm \| 62.88 \| ± 1.41 \|
	\| arc_easy \| 0 \| acc \| 84.34 \| ± 0.75 \|
	\| \| \| acc_norm \| 80.47 \| ± 0.81 \|
	\| boolq \| 1 \| acc \| 86.88 \| ± 0.59 \|
	\| hellaswag \| 0 \| acc \| 68.56 \| ± 0.46 \|
	\| \| \| acc_norm \| 85.16 \| ± 0.35 \|
	\| openbookqa \| 0 \| acc \| 37.00 \| ± 2.16 \|
	\| \| \| acc_norm \| 47.80 \| ± 2.24 \|
	\| piqa \| 0 \| acc \| 82.21 \| ± 0.89 \|
	\| \| \| acc_norm \| 83.68 \| ± 0.86 \|
	\| winogrande \| 0 \| acc \| 77.98 \| ± 1.16 \|

	Average: 74.98%

	## TruthfulQA

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| truthfulqa_mc \| 1 \| mc1 \| 47.37 \| ± 1.75 \|
	\| \| \| mc2 \| 63.96 \| ± 1.57 \|

	Average: 63.96%

	## Bigbench

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| bigbench_causal_judgement \| 0 \| multiple_choice_grade \| 55.26 \| ± 3.62 \|
	\| bigbench_date_understanding \| 0 \| multiple_choice_grade \| 63.14 \| ± 2.51 \|
	\| bigbench_disambiguation_qa \| 0 \| multiple_choice_grade \| 42.64 \| ± 3.08 \|
	\| bigbench_geometric_shapes \| 0 \| multiple_choice_grade \| 22.84 \| ± 2.22 \|
	\| \| \| exact_str_match \| 3.34 \| ± 0.95 \|
	\| bigbench_logical_deduction_five_objects \| 0 \| multiple_choice_grade \| 36.60 \| ± 2.16 \|
	\| bigbench_logical_deduction_seven_objects \| 0 \| multiple_choice_grade \| 25.57 \| ± 1.65 \|
	\| bigbench_logical_deduction_three_objects \| 0 \| multiple_choice_grade \| 56.00 \| ± 2.87 \|
	\| bigbench_movie_recommendation \| 0 \| multiple_choice_grade \| 42.40 \| ± 2.21 \|
	\| bigbench_navigate \| 0 \| multiple_choice_grade \| 54.70 \| ± 1.57 \|
	\| bigbench_reasoning_about_colored_objects \| 0 \| multiple_choice_grade \| 62.90 \| ± 1.08 \|
	\| bigbench_ruin_names \| 0 \| multiple_choice_grade \| 53.35 \| ± 2.36 \|
	\| bigbench_salient_translation_error_detection \| 0 \| multiple_choice_grade \| 24.35 \| ± 1.36 \|
	\| bigbench_snarks \| 0 \| multiple_choice_grade \| 62.43 \| ± 3.61 \|
	\| bigbench_sports_understanding \| 0 \| multiple_choice_grade \| 70.28 \| ± 1.46 \|
	\| bigbench_temporal_sequences \| 0 \| multiple_choice_grade \| 41.30 \| ± 1.56 \|
	\| bigbench_tracking_shuffled_objects_five_objects \| 0 \| multiple_choice_grade \| 22.32 \| ± 1.18 \|
	\| bigbench_tracking_shuffled_objects_seven_objects \| 0 \| multiple_choice_grade \| 17.77 \| ± 0.91 \|
	\| bigbench_tracking_shuffled_objects_three_objects \| 0 \| multiple_choice_grade \| 56.00 \| ± 2.87 \|

	### Overall Average Score

	Average score: 56.96%

	---
	license: cc-by-4.0
	---
	# Piccolo-2x7b


	In loving memory of my dog Klaus (Piccolo)

	_~ Piccolo (Italian): the little one ~_

	![piccolo.png](piccolo.png)


	## GGUF

	Quants are available [here](https://huggingface.co/macadeliccc/piccolo-2x7b-GGUF)
	# Code Example

	Inference and Evaluation colab available [here](https://colab.research.google.com/drive/1ZqLNvVvtFHC_4v2CgcMVh7pP9Fvx0SbI?usp=sharing)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	def generate_response(prompt):
	"""
	Generate a response from the model based on the input prompt.
	Args:
	prompt (str): Prompt for the model.

	Returns:
	str: The generated response from the model.
	"""
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)

	return response

	model_id = "macadeliccc/piccolo-2x7b"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

	prompt = "What is the best way to train Cane Corsos?"

	print("Response:")
	print(generate_response(prompt), "\n")
	```

	The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.

	# Evaluations

	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|----------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|arc_easy \|Yaml \|none \| 0\|acc \|0.8552\|± \|0.0072\|
	\| \| \|none \| 0\|acc_norm\|0.8237\|± \|0.0078\|
	\|boolq \|Yaml \|none \| 0\|acc \|0.8749\|± \|0.0058\|
	\|hellaswag \|Yaml \|none \| 0\|acc \|0.6734\|± \|0.0047\|
	\| \| \|none \| 0\|acc_norm\|0.8489\|± \|0.0036\|
	\|openbookqa\|Yaml \|none \| 0\|acc \|0.3640\|± \|0.0215\|
	\| \| \|none \| 0\|acc_norm\|0.4780\|± \|0.0224\|
	\|piqa \|Yaml \|none \| 0\|acc \|0.8330\|± \|0.0087\|
	\| \| \|none \| 0\|acc_norm\|0.8368\|± \|0.0086\|
	\|winogrande\|Yaml \|none \| 0\|acc \|0.7703\|± \|0.0118\|


	# Model Evaluation Summary

	\| Model \| AGIEval \| GPT4All \| TruthfulQA \| Bigbench \| Average \|
	\|-------\|---------\|---------\|------------\|----------\|---------\|
	\| piccolo-math-2x7b \| 43.89% \| 74.98% \| 63.96% \| 44.99% \| 56.96% \|

	## AGIEval

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| agieval_aqua_rat \| 0 \| acc \| 24.41 \| ± 2.70 \|
	\| \| \| acc_norm \| 24.80 \| ± 2.72 \|
	\| agieval_logiqa_en \| 0 \| acc \| 35.79 \| ± 1.88 \|
	\| \| \| acc_norm \| 36.71 \| ± 1.89 \|
	\| agieval_lsat_ar \| 0 \| acc \| 23.48 \| ± 2.80 \|
	\| \| \| acc_norm \| 23.91 \| ± 2.82 \|
	\| agieval_lsat_lr \| 0 \| acc \| 49.22 \| ± 2.22 \|
	\| \| \| acc_norm \| 50.00 \| ± 2.22 \|
	\| agieval_lsat_rc \| 0 \| acc \| 63.94 \| ± 2.93 \|
	\| \| \| acc_norm \| 64.31 \| ± 2.93 \|
	\| agieval_sat_en \| 0 \| acc \| 77.18 \| ± 2.93 \|
	\| \| \| acc_norm \| 76.70 \| ± 2.95 \|
	\| agieval_sat_en_without_passage \| 0 \| acc \| 45.15 \| ± 3.48 \|
	\| \| \| acc_norm \| 44.66 \| ± 3.47 \|
	\| agieval_sat_math \| 0 \| acc \| 33.64 \| ± 3.19 \|
	\| \| \| acc_norm \| 30.00 \| ± 3.10 \|

	Average: 43.89%

	## GPT4All

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| arc_challenge \| 0 \| acc \| 61.86 \| ± 1.42 \|
	\| \| \| acc_norm \| 62.88 \| ± 1.41 \|
	\| arc_easy \| 0 \| acc \| 84.34 \| ± 0.75 \|
	\| \| \| acc_norm \| 80.47 \| ± 0.81 \|
	\| boolq \| 1 \| acc \| 86.88 \| ± 0.59 \|
	\| hellaswag \| 0 \| acc \| 68.56 \| ± 0.46 \|
	\| \| \| acc_norm \| 85.16 \| ± 0.35 \|
	\| openbookqa \| 0 \| acc \| 37.00 \| ± 2.16 \|
	\| \| \| acc_norm \| 47.80 \| ± 2.24 \|
	\| piqa \| 0 \| acc \| 82.21 \| ± 0.89 \|
	\| \| \| acc_norm \| 83.68 \| ± 0.86 \|
	\| winogrande \| 0 \| acc \| 77.98 \| ± 1.16 \|

	Average: 74.98%

	## TruthfulQA

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| truthfulqa_mc \| 1 \| mc1 \| 47.37 \| ± 1.75 \|
	\| \| \| mc2 \| 63.96 \| ± 1.57 \|

	Average: 63.96%

	## Bigbench

	### Tasks and Results

	\| Task \| Version \| Metric \| Value \| Stderr \|
	\|------\|---------\|--------\|-------\|--------\|
	\| bigbench_causal_judgement \| 0 \| multiple_choice_grade \| 55.26 \| ± 3.62 \|
	\| bigbench_date_understanding \| 0 \| multiple_choice_grade \| 63.14 \| ± 2.51 \|
	\| bigbench_disambiguation_qa \| 0 \| multiple_choice_grade \| 42.64 \| ± 3.08 \|
	\| bigbench_geometric_shapes \| 0 \| multiple_choice_grade \| 22.84 \| ± 2.22 \|
	\| \| \| exact_str_match \| 3.34 \| ± 0.95 \|
	\| bigbench_logical_deduction_five_objects \| 0 \| multiple_choice_grade \| 36.60 \| ± 2.16 \|
	\| bigbench_logical_deduction_seven_objects \| 0 \| multiple_choice_grade \| 25.57 \| ± 1.65 \|
	\| bigbench_logical_deduction_three_objects \| 0 \| multiple_choice_grade \| 56.00 \| ± 2.87 \|
	\| bigbench_movie_recommendation \| 0 \| multiple_choice_grade \| 42.40 \| ± 2.21 \|
	\| bigbench_navigate \| 0 \| multiple_choice_grade \| 54.70 \| ± 1.57 \|
	\| bigbench_reasoning_about_colored_objects \| 0 \| multiple_choice_grade \| 62.90 \| ± 1.08 \|
	\| bigbench_ruin_names \| 0 \| multiple_choice_grade \| 53.35 \| ± 2.36 \|
	\| bigbench_salient_translation_error_detection \| 0 \| multiple_choice_grade \| 24.35 \| ± 1.36 \|
	\| bigbench_snarks \| 0 \| multiple_choice_grade \| 62.43 \| ± 3.61 \|
	\| bigbench_sports_understanding \| 0 \| multiple_choice_grade \| 70.28 \| ± 1.46 \|
	\| bigbench_temporal_sequences \| 0 \| multiple_choice_grade \| 41.30 \| ± 1.56 \|
	\| bigbench_tracking_shuffled_objects_five_objects \| 0 \| multiple_choice_grade \| 22.32 \| ± 1.18 \|
	\| bigbench_tracking_shuffled_objects_seven_objects \| 0 \| multiple_choice_grade \| 17.77 \| ± 0.91 \|
	\| bigbench_tracking_shuffled_objects_three_objects \| 0 \| multiple_choice_grade \| 56.00 \| ± 2.87 \|

	### Overall Average Score

	Average score: 56.96%