| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| |
|
| | <style> |
| | img{ |
| | user-select: none; |
| | transition: all 0.2s ease; |
| | border-radius: .5rem; |
| | } |
| | img:hover{ |
| | transform: rotate(2deg); |
| | filter: invert(100%); |
| | } |
| | @import url('https://fonts.googleapis.com/css2?family=Vollkorn:ital,wght@0,400..900;1,400..900&display=swap'); |
| | </style> |
| | |
| | <div style="background-color: transparent; border-radius: .5rem; padding: 2rem; font-family: monospace; font-size: .85rem; text-align: justify;"> |
| | |
| |  |
| |
|
| | This is a passthrough of arco with an experimental model. It improved on arc challenge, only missing 1.2 points to get to the level of modern 3b baseline performance. |
| |
|
| | If you prefer answering multilingual, general knowledge, trivially simple questions chose qwen or llama. If you prefer solving trivially simple english tasks while being half the size, chose arco. |
| |
|
| | #### prompt |
| |
|
| | there is no prompt intentionally set. |
| |
|
| |
|
| | #### benchmarks |
| |
|
| | zero-shot results from state-of-the-art small language models |
| |
|
| | | Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average | |
| | | -----------|--------------------------------|-------|-------|-----------|--------|------------|---------| |
| | | 0.5b | qwen 2 |44.13| 28.92| 49.05 | 69.31 | 56.99 | 49.68 | |
| | | 0.3b | smollm |25.52| 37.71| 56.41| 71.93| 59.27| 50.17 | |
| | | 0.5b | danube 3 | 24.81| 36.18| 60.46| 73.78 | 61.01 | 51.25 | |
| | | 0.5b | qwen 2.5 |**47.29**|31.83|52.17|70.29|57.06|51.72| |
| | | 0.5b | arco |26.17|37.29|62.88|74.37|**62.27**|52.60| |
| | | 0.5b | arco 2 |25.51|**38.82**|**63.02**|**74.70**|61.25|**52.66**| |
| | #### supporters |
| |
|
| | <a href="https://ko-fi.com/appvoid" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 34px !important; margin-top: -4px;width: 128px !important; filter: contrast(2) grayscale(100%) brightness(100%);" ></a> |
| |
|
| | ### trivia |
| |
|
| | arco also means "arc optimized" hence the focus on this cognitive-based benchmark. |
| | </div> |