PiCo-1B / README.md
ArcOffical's picture
Update README.md
6bbc076 verified
|
Raw
History Blame Contribute Delete
5.27 kB
---
license: mit
language:
- en
---
# PiCo 1B
> A 1B-parameter dense language model optimized for reasoning and knowledge tasks.
>
> For clarity, our model uses the tokenizer from Qwen 2 1.5B but has been trained from scratch β€” it is not a fine-tuned version of Qwen 2 1.5B.
---
## πŸ“Œ Model Overview
**PiCo 1B** is a compact, high-performance language model with ~1.46 billion parameters. Despite its small size, it achieves competitive performance across reasoning, knowledge, and coding benchmarks, particularly excelling in science reasoning tasks.
---
## πŸ“‹ Model Details
| Attribute | Value |
|-----------|-------|
| **Model Size** | ~1.46B parameters |
| **Architecture** | Dense transformer (decoder-only) |
| **Context Length** | 2048 tokens |
| **Precision** | FP32 / FP16 / Safetensors |
| **License** | Open-source |
---
## πŸ“Š Benchmark Results
PiCo 1B is evaluated against **31 open-source models** in the 1B–2B parameter range across 7 standard benchmarks.
### MMLU (Massive Multitask Language Understanding)
Measures general knowledge across 57 subjects including STEM, humanities, and social sciences.
![mmlu_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/xRmbn3pMYD6hLVEPZYwBa.png)
---
### GSM8K (Grade School Math)
Measures mathematical reasoning with grade-school level word problems.
![gsm8k_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/E03-rJ-NpEwCH5M97ec-v.png)
---
### ARC-Challenge (AI2 Reasoning Challenge)
Measures science reasoning with grade-level science questions (harder subset).
![arc_challenge_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/WOOinEwZ0DaRKJl74n6E9.png)
---
### ARC-Easy (AI2 Reasoning Challenge)
Measures basic science reasoning with grade-level science questions (easier subset).
![arc_easy_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/BTGhsnWy0gA6iJaHBr8j4.png)
---
### HellaSwag (Commonsense Reasoning)
Measures commonsense natural language inference with everyday scenarios.
![hellaswag_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pNLjRk1Xy6aoGFOik3Zsv.png)
---
### HumanEval (Code Generation)
Measures functional correctness of code generation across 164 programming problems.
![humaneval_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/FnhNO9sBCDl-Ky438q6ll.png)
---
### TruthfulQA (Truthfulness)
Measures whether the model generates truthful answers rather than mimicking common misconceptions.
![truthfulqa_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pKq1fhlzmfistFhZ8BRRt.png)
---
## πŸ† Performance Highlights
### βœ… Strengths
- **Science Reasoning**: Best-in-class performance on ARC-Easy and ARC-Challenge
- **General Knowledge**: Top 3 on MMLU, outperforming many larger 1.5B–2B models
- **Coding Ability**: Strong HumanEval performance, competitive with models 2x its size
- **Truthfulness**: Top 5 on TruthfulQA, demonstrating reliable factual output
### πŸ“ˆ Areas for Improvement
- **Commonsense Reasoning**: HellaSwag score lags behind modern 1.5B+ models
- **Mathematical Reasoning**: GSM8K performance is solid but not top-tier
- **Scale**: Further training on larger, more diverse datasets could boost all benchmarks
---
## πŸš€ Usage
### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "pico-1b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Explain the theory of relativity in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Model Formats
- **Safetensors** (recommended): Secure and fast loading
- **PyTorch (FP16)**: Standard format
- **GGUF**: For local inference with llama.cpp
---
## πŸ‹οΈ Training Details
| Aspect | Description |
|--------|-------------|
| **Architecture** | Dense decoder-only transformer |
| **Optimizer** | AdamW |
| **Learning Rate** | Cosine schedule with warmup |
| **Batch Size** | Configurable per GPU setup |
| **Training Framework** | PyTorch + Hugging Face Transformers |
---
## ⚠️ Limitations
- **Small Model Size**: As a 1B-parameter model, it has inherent limitations compared to larger models (7B+) on complex reasoning tasks
- **Training Data**: Primarily trained on English text; performance on non-English languages may be limited
- **Hallucinations**: Like all LLMs, it may generate factually incorrect information
- **Context Window**: Limited to 2048 tokens by default
---
## πŸ“ Citation
If you use PiCo 1B in your research or projects, please cite:
```bibtex
@misc{pico1b,
title={PiCo 1B: A Compact Language Model Optimized for Reasoning},
author={Arc Develop Team},
year={2026},
howpublished={\url{https://github.com/pico-llm/pico-1b}},
}
```
---
## πŸ“„ License
This model is released under an open-source license. Please see the LICENSE file for details.
---
*Last updated: June 2026*