PiCo-1B / README.md

Update README.md

6bbc076 verified about 16 hours ago

5.27 kB

	---
	license: mit
	language:
	- en
	---
	# PiCo 1B

	> A 1B-parameter dense language model optimized for reasoning and knowledge tasks.
	>
	> For clarity, our model uses the tokenizer from Qwen 2 1.5B but has been trained from scratch — it is not a fine-tuned version of Qwen 2 1.5B.

	---

	## 📌 Model Overview

	PiCo 1B is a compact, high-performance language model with ~1.46 billion parameters. Despite its small size, it achieves competitive performance across reasoning, knowledge, and coding benchmarks, particularly excelling in science reasoning tasks.

	---

	## 📋 Model Details

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Model Size \| ~1.46B parameters \|
	\| Architecture \| Dense transformer (decoder-only) \|
	\| Context Length \| 2048 tokens \|
	\| Precision \| FP32 / FP16 / Safetensors \|
	\| License \| Open-source \|

	---

	## 📊 Benchmark Results

	PiCo 1B is evaluated against 31 open-source models in the 1B–2B parameter range across 7 standard benchmarks.

	### MMLU (Massive Multitask Language Understanding)

	Measures general knowledge across 57 subjects including STEM, humanities, and social sciences.

	![mmlu_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/xRmbn3pMYD6hLVEPZYwBa.png)

	---

	### GSM8K (Grade School Math)

	Measures mathematical reasoning with grade-school level word problems.

	![gsm8k_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/E03-rJ-NpEwCH5M97ec-v.png)

	---

	### ARC-Challenge (AI2 Reasoning Challenge)

	Measures science reasoning with grade-level science questions (harder subset).

	![arc_challenge_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/WOOinEwZ0DaRKJl74n6E9.png)

	---

	### ARC-Easy (AI2 Reasoning Challenge)

	Measures basic science reasoning with grade-level science questions (easier subset).

	![arc_easy_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/BTGhsnWy0gA6iJaHBr8j4.png)

	---

	### HellaSwag (Commonsense Reasoning)

	Measures commonsense natural language inference with everyday scenarios.

	![hellaswag_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pNLjRk1Xy6aoGFOik3Zsv.png)

	---

	### HumanEval (Code Generation)

	Measures functional correctness of code generation across 164 programming problems.

	![humaneval_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/FnhNO9sBCDl-Ky438q6ll.png)

	---

	### TruthfulQA (Truthfulness)

	Measures whether the model generates truthful answers rather than mimicking common misconceptions.

	![truthfulqa_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pKq1fhlzmfistFhZ8BRRt.png)

	---

	## 🏆 Performance Highlights

	### ✅ Strengths

	- Science Reasoning: Best-in-class performance on ARC-Easy and ARC-Challenge
	- General Knowledge: Top 3 on MMLU, outperforming many larger 1.5B–2B models
	- Coding Ability: Strong HumanEval performance, competitive with models 2x its size
	- Truthfulness: Top 5 on TruthfulQA, demonstrating reliable factual output

	### 📈 Areas for Improvement

	- Commonsense Reasoning: HellaSwag score lags behind modern 1.5B+ models
	- Mathematical Reasoning: GSM8K performance is solid but not top-tier
	- Scale: Further training on larger, more diverse datasets could boost all benchmarks

	---

	## 🚀 Usage

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "pico-1b"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	prompt = "Explain the theory of relativity in simple terms."
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=200)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Model Formats

	- Safetensors (recommended): Secure and fast loading
	- PyTorch (FP16): Standard format
	- GGUF: For local inference with llama.cpp

	---

	## 🏋️ Training Details

	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Architecture \| Dense decoder-only transformer \|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| Cosine schedule with warmup \|
	\| Batch Size \| Configurable per GPU setup \|
	\| Training Framework \| PyTorch + Hugging Face Transformers \|

	---

	## ⚠️ Limitations

	- Small Model Size: As a 1B-parameter model, it has inherent limitations compared to larger models (7B+) on complex reasoning tasks
	- Training Data: Primarily trained on English text; performance on non-English languages may be limited
	- Hallucinations: Like all LLMs, it may generate factually incorrect information
	- Context Window: Limited to 2048 tokens by default

	---

	## 📝 Citation

	If you use PiCo 1B in your research or projects, please cite:

	```bibtex
	@misc{pico1b,
	title={PiCo 1B: A Compact Language Model Optimized for Reasoning},
	author={Arc Develop Team},
	year={2026},
	howpublished={\url{https://github.com/pico-llm/pico-1b}},
	}
	```

	---

	## 📄 License

	This model is released under an open-source license. Please see the LICENSE file for details.

	---

	Last updated: June 2026