File size: 5,791 Bytes
e0f477a 70105d3 e0f477a 70105d3 2bef08a 70105d3 1fb50e7 dd5efb2 4e5ca25 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | ---
license: cc-by-4.0
---
# Piccolo-2x7b
**In loving memory of my dog Klaus (Piccolo)**
_~ Piccolo (Italian): the little one ~_

## GGUF
Quants are available [here](https://huggingface.co/macadeliccc/piccolo-2x7b-GGUF)
# Code Example
Inference and Evaluation colab available [here](https://colab.research.google.com/drive/1ZqLNvVvtFHC_4v2CgcMVh7pP9Fvx0SbI?usp=sharing)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
def generate_response(prompt):
"""
Generate a response from the model based on the input prompt.
Args:
prompt (str): Prompt for the model.
Returns:
str: The generated response from the model.
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
model_id = "macadeliccc/piccolo-2x7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)
prompt = "What is the best way to train Cane Corsos?"
print("Response:")
print(generate_response(prompt), "\n")
```
The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.
# Evaluations
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|----------|-------|------|-----:|--------|-----:|---|-----:|
|arc_easy |Yaml |none | 0|acc |0.8552|± |0.0072|
| | |none | 0|acc_norm|0.8237|± |0.0078|
|boolq |Yaml |none | 0|acc |0.8749|± |0.0058|
|hellaswag |Yaml |none | 0|acc |0.6734|± |0.0047|
| | |none | 0|acc_norm|0.8489|± |0.0036|
|openbookqa|Yaml |none | 0|acc |0.3640|± |0.0215|
| | |none | 0|acc_norm|0.4780|± |0.0224|
|piqa |Yaml |none | 0|acc |0.8330|± |0.0087|
| | |none | 0|acc_norm|0.8368|± |0.0086|
|winogrande|Yaml |none | 0|acc |0.7703|± |0.0118|
# Model Evaluation Summary
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|-------|---------|---------|------------|----------|---------|
| piccolo-math-2x7b | 43.89% | 74.98% | 63.96% | 44.99% | 56.96% |
## AGIEval
### Tasks and Results
| Task | Version | Metric | Value | Stderr |
|------|---------|--------|-------|--------|
| agieval_aqua_rat | 0 | acc | 24.41 | ± 2.70 |
| | | acc_norm | 24.80 | ± 2.72 |
| agieval_logiqa_en | 0 | acc | 35.79 | ± 1.88 |
| | | acc_norm | 36.71 | ± 1.89 |
| agieval_lsat_ar | 0 | acc | 23.48 | ± 2.80 |
| | | acc_norm | 23.91 | ± 2.82 |
| agieval_lsat_lr | 0 | acc | 49.22 | ± 2.22 |
| | | acc_norm | 50.00 | ± 2.22 |
| agieval_lsat_rc | 0 | acc | 63.94 | ± 2.93 |
| | | acc_norm | 64.31 | ± 2.93 |
| agieval_sat_en | 0 | acc | 77.18 | ± 2.93 |
| | | acc_norm | 76.70 | ± 2.95 |
| agieval_sat_en_without_passage | 0 | acc | 45.15 | ± 3.48 |
| | | acc_norm | 44.66 | ± 3.47 |
| agieval_sat_math | 0 | acc | 33.64 | ± 3.19 |
| | | acc_norm | 30.00 | ± 3.10 |
**Average: 43.89%**
## GPT4All
### Tasks and Results
| Task | Version | Metric | Value | Stderr |
|------|---------|--------|-------|--------|
| arc_challenge | 0 | acc | 61.86 | ± 1.42 |
| | | acc_norm | 62.88 | ± 1.41 |
| arc_easy | 0 | acc | 84.34 | ± 0.75 |
| | | acc_norm | 80.47 | ± 0.81 |
| boolq | 1 | acc | 86.88 | ± 0.59 |
| hellaswag | 0 | acc | 68.56 | ± 0.46 |
| | | acc_norm | 85.16 | ± 0.35 |
| openbookqa | 0 | acc | 37.00 | ± 2.16 |
| | | acc_norm | 47.80 | ± 2.24 |
| piqa | 0 | acc | 82.21 | ± 0.89 |
| | | acc_norm | 83.68 | ± 0.86 |
| winogrande | 0 | acc | 77.98 | ± 1.16 |
**Average: 74.98%**
## TruthfulQA
### Tasks and Results
| Task | Version | Metric | Value | Stderr |
|------|---------|--------|-------|--------|
| truthfulqa_mc | 1 | mc1 | 47.37 | ± 1.75 |
| | | mc2 | 63.96 | ± 1.57 |
**Average: 63.96%**
## Bigbench
### Tasks and Results
| Task | Version | Metric | Value | Stderr |
|------|---------|--------|-------|--------|
| bigbench_causal_judgement | 0 | multiple_choice_grade | 55.26 | ± 3.62 |
| bigbench_date_understanding | 0 | multiple_choice_grade | 63.14 | ± 2.51 |
| bigbench_disambiguation_qa | 0 | multiple_choice_grade | 42.64 | ± 3.08 |
| bigbench_geometric_shapes | 0 | multiple_choice_grade | 22.84 | ± 2.22 |
| | | exact_str_match | 3.34 | ± 0.95 |
| bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 36.60 | ± 2.16 |
| bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 25.57 | ± 1.65 |
| bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 56.00 | ± 2.87 |
| bigbench_movie_recommendation | 0 | multiple_choice_grade | 42.40 | ± 2.21 |
| bigbench_navigate | 0 | multiple_choice_grade | 54.70 | ± 1.57 |
| bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 62.90 | ± 1.08 |
| bigbench_ruin_names | 0 | multiple_choice_grade | 53.35 | ± 2.36 |
| bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 24.35 | ± 1.36 |
| bigbench_snarks | 0 | multiple_choice_grade | 62.43 | ± 3.61 |
| bigbench_sports_understanding | 0 | multiple_choice_grade | 70.28 | ± 1.46 |
| bigbench_temporal_sequences | 0 | multiple_choice_grade | 41.30 | ± 1.56 |
| bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 22.32 | ± 1.18 |
| bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 17.77 | ± 0.91 |
| bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 56.00 | ± 2.87 |
### Overall Average Score
**Average score: 56.96%** |