---
license: mit
language:
- en
---
# PiCo 1B

> A 1B-parameter dense language model optimized for reasoning and knowledge tasks.
>
> For clarity, our model uses the tokenizer from Qwen 2 1.5B but has been trained from scratch — it is not a fine-tuned version of Qwen 2 1.5B.

---

## 📌 Model Overview

**PiCo 1B** is a compact, high-performance language model with ~1.46 billion parameters. Despite its small size, it achieves competitive performance across reasoning, knowledge, and coding benchmarks, particularly excelling in science reasoning tasks.

---

## 📋 Model Details

| Attribute | Value |
|-----------|-------|
| **Model Size** | ~1.46B parameters |
| **Architecture** | Dense transformer (decoder-only) |
| **Context Length** | 2048 tokens |
| **Precision** | FP32 / FP16 / Safetensors |
| **License** | Open-source |

---

## 📊 Benchmark Results

PiCo 1B is evaluated against **31 open-source models** in the 1B–2B parameter range across 7 standard benchmarks.

### MMLU (Massive Multitask Language Understanding)

Measures general knowledge across 57 subjects including STEM, humanities, and social sciences.

![mmlu_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/xRmbn3pMYD6hLVEPZYwBa.png)

---

### GSM8K (Grade School Math)

Measures mathematical reasoning with grade-school level word problems.

![gsm8k_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/E03-rJ-NpEwCH5M97ec-v.png)

---

### ARC-Challenge (AI2 Reasoning Challenge)

Measures science reasoning with grade-level science questions (harder subset).

![arc_challenge_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/WOOinEwZ0DaRKJl74n6E9.png)

---

### ARC-Easy (AI2 Reasoning Challenge)

Measures basic science reasoning with grade-level science questions (easier subset).

![arc_easy_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/BTGhsnWy0gA6iJaHBr8j4.png)

---

### HellaSwag (Commonsense Reasoning)

Measures commonsense natural language inference with everyday scenarios.

![hellaswag_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pNLjRk1Xy6aoGFOik3Zsv.png)

---

### HumanEval (Code Generation)

Measures functional correctness of code generation across 164 programming problems.

![humaneval_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/FnhNO9sBCDl-Ky438q6ll.png)

---

### TruthfulQA (Truthfulness)

Measures whether the model generates truthful answers rather than mimicking common misconceptions.

![truthfulqa_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pKq1fhlzmfistFhZ8BRRt.png)

---

## 🏆 Performance Highlights

### ✅ Strengths

- **Science Reasoning**: Best-in-class performance on ARC-Easy and ARC-Challenge
- **General Knowledge**: Top 3 on MMLU, outperforming many larger 1.5B–2B models
- **Coding Ability**: Strong HumanEval performance, competitive with models 2x its size
- **Truthfulness**: Top 5 on TruthfulQA, demonstrating reliable factual output

### 📈 Areas for Improvement

- **Commonsense Reasoning**: HellaSwag score lags behind modern 1.5B+ models
- **Mathematical Reasoning**: GSM8K performance is solid but not top-tier
- **Scale**: Further training on larger, more diverse datasets could boost all benchmarks

---

## 🚀 Usage

### Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "pico-1b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Explain the theory of relativity in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Model Formats

- **Safetensors** (recommended): Secure and fast loading
- **PyTorch (FP16)**: Standard format
- **GGUF**: For local inference with llama.cpp

---

## 🏋️ Training Details

| Aspect | Description |
|--------|-------------|
| **Architecture** | Dense decoder-only transformer |
| **Optimizer** | AdamW |
| **Learning Rate** | Cosine schedule with warmup |
| **Batch Size** | Configurable per GPU setup |
| **Training Framework** | PyTorch + Hugging Face Transformers |

---

## ⚠️ Limitations

- **Small Model Size**: As a 1B-parameter model, it has inherent limitations compared to larger models (7B+) on complex reasoning tasks
- **Training Data**: Primarily trained on English text; performance on non-English languages may be limited
- **Hallucinations**: Like all LLMs, it may generate factually incorrect information
- **Context Window**: Limited to 2048 tokens by default

---

## 📝 Citation

If you use PiCo 1B in your research or projects, please cite:

```bibtex
@misc{pico1b,
  title={PiCo 1B: A Compact Language Model Optimized for Reasoning},
  author={Arc Develop Team},
  year={2026},
  howpublished={\url{https://github.com/pico-llm/pico-1b}},
}
```

---

## 📄 License

This model is released under an open-source license. Please see the LICENSE file for details.

---

*Last updated: June 2026*