arco-3 / README.md

Update README.md

6f4f339 verified 3 months ago

4.26 kB

	---
	license: apache-2.0
	pipeline_tag: text-generation
	extra_gated_prompt: "You agree to not use this model (or future versions) to conduct experiments that cause harm to any person or group."
	extra_gated_fields:
	Company: text
	Country: country
	Specific date: date_picker
	I want to use this model for:
	type: select
	options:
	- Work
	- Research
	- Education
	- Hobby
	- label: Other
	value: other
	I agree to use this model in good faith ONLY: checkbox
	---
	<style>
	*, html, body, div {
	color: gray;
	background: black !important;
	border: none;
	}

	img {
	filter: contrast(1.3);
	user-select: none;
	transition: all 0.2s ease;
	border-radius: .5rem;
	display: block !important;
	margin: 1rem auto !important;
	}

	img:hover {
	transform: rotate(2deg);
	filter: invert(100%);
	}

	@import url('https://fonts.googleapis.com/css2?family=Vollkorn:ital,wght@0,400..900;1,400..900&display=swap');
	</style>
	<body>
	<div style="background-color: transparent; border-radius: .5rem; padding: 2rem; font-family: monospace; font-size: .85rem; text-align: justify;">

	<p align="center">
	<img src="https://huggingface.co/appvoid/arco-3/resolve/main/super-cubby.png" alt="cubby">
	</p>

	In this repository, we propose the next iteration of `arco`, a new meta-learner small language model. Now with `qwen` as the base architecture for improvements.

	During previous research, we first noticed a dramatic underpeformance on fewshot prompting from previous `arco` series (regardless of benchmark improvements on arc) so we decided that the main concept to work on was making a more robust fewshot learning by focusing directly on tasks that improve that skill with a stronger baseline model like `qwen` family.

	After several merging iterations with some openly available models, we finally achieved a strong baseline for a meta-learner model which we called [`arco-3`](https://huggingface.co/appvoid/arco-3-gguf). This model will serve as the starting point for future fewshot finetunings and experiments.

	#### prompt
	There is no prompt intentionally set.

	#### benchmarks
	##### meta arena
	We tested around 65 models against each other with fewshot tasks and used `gemini-2.5-pro` to chose the best answers from each one. Currently, it ranks 13th in [meta-arena](https://huggingface.co/spaces/appvoid/meta-arena).

	<p align="center">
	<img src="https://huggingface.co/appvoid/arco-3/resolve/main/meta-arena.png" alt="meta arena">
	</p>

	##### variance
	We also tested the model against some popular small models on "power" distribution for our 5 typically chosen language modeling benchmarks.
	<img src="https://huggingface.co/appvoid/arco-3/resolve/main/variance.png" alt="variance">

	##### language modeling
	To our surprise, this model also improved some language modeling tasks over the base model on several well-known benchmarks.

	\| Parameters \| Model \| MMLU \| ARC-C \| HellaSwag \| PIQA \| Winogrande \| Average \|
	\| -----------\|--------------------------------\|-------\|-------\|-----------\|--------\|------------\|---------\|
	\| 0.6b \| qwen 3 \|40.31\| 34.47 \| 47.38 \| 67.46 \| 56.04 \| 49.13 \|
	\| 0.6b \| arco 3 \| 43.34 \| 36.01 \| 49.56 \| 68.17 \| 58.09 \| 51.03 \|

	#### strengths
	- Strong bias to format
	- Excellent classifier
	- State-of-the-art paraphrasing
	- Vocabulary/Idiomatic understanding

	#### limitations
	- Lack of creative outputs
	- Extremely poor summarization skills
	- Poor causality understanding
	- Hallucinations

	We have a plan to tackle each one of these issues for them to be corrected in the future.


	#### supporters
	<a href="https://ko-fi.com/appvoid" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 34px !important; margin-top: -4px;width: 128px !important; filter: contrast(2) grayscale(100%) brightness(100%);" ></a>


	#### trivia
	`arco` means "bow" in spanish, which is just another way to say that hits its target fast and accurately.


	Note: the model has not been tested as a chat assistant and it might not work as intended, use with caution.
	</div>
	</body>