| --- |
| license: mit |
| language: |
| - en |
| --- |
| # PiCo 1B |
|
|
| > A 1B-parameter dense language model optimized for reasoning and knowledge tasks. |
| > |
| > For clarity, our model uses the tokenizer from Qwen 2 1.5B but has been trained from scratch β it is not a fine-tuned version of Qwen 2 1.5B. |
|
|
| --- |
|
|
| ## π Model Overview |
|
|
| **PiCo 1B** is a compact, high-performance language model with ~1.46 billion parameters. Despite its small size, it achieves competitive performance across reasoning, knowledge, and coding benchmarks, particularly excelling in science reasoning tasks. |
|
|
| --- |
|
|
| ## π Model Details |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | **Model Size** | ~1.46B parameters | |
| | **Architecture** | Dense transformer (decoder-only) | |
| | **Context Length** | 2048 tokens | |
| | **Precision** | FP32 / FP16 / Safetensors | |
| | **License** | Open-source | |
|
|
| --- |
|
|
| ## π Benchmark Results |
|
|
| PiCo 1B is evaluated against **31 open-source models** in the 1Bβ2B parameter range across 7 standard benchmarks. |
|
|
| ### MMLU (Massive Multitask Language Understanding) |
|
|
| Measures general knowledge across 57 subjects including STEM, humanities, and social sciences. |
|
|
|  |
|
|
| --- |
|
|
| ### GSM8K (Grade School Math) |
|
|
| Measures mathematical reasoning with grade-school level word problems. |
|
|
|  |
|
|
| --- |
|
|
| ### ARC-Challenge (AI2 Reasoning Challenge) |
|
|
| Measures science reasoning with grade-level science questions (harder subset). |
|
|
|  |
|
|
| --- |
|
|
| ### ARC-Easy (AI2 Reasoning Challenge) |
|
|
| Measures basic science reasoning with grade-level science questions (easier subset). |
|
|
|  |
|
|
| --- |
|
|
| ### HellaSwag (Commonsense Reasoning) |
|
|
| Measures commonsense natural language inference with everyday scenarios. |
|
|
|  |
|
|
| --- |
|
|
| ### HumanEval (Code Generation) |
|
|
| Measures functional correctness of code generation across 164 programming problems. |
|
|
|  |
|
|
| --- |
|
|
| ### TruthfulQA (Truthfulness) |
|
|
| Measures whether the model generates truthful answers rather than mimicking common misconceptions. |
|
|
|  |
|
|
| --- |
|
|
| ## π Performance Highlights |
|
|
| ### β
Strengths |
|
|
| - **Science Reasoning**: Best-in-class performance on ARC-Easy and ARC-Challenge |
| - **General Knowledge**: Top 3 on MMLU, outperforming many larger 1.5Bβ2B models |
| - **Coding Ability**: Strong HumanEval performance, competitive with models 2x its size |
| - **Truthfulness**: Top 5 on TruthfulQA, demonstrating reliable factual output |
|
|
| ### π Areas for Improvement |
|
|
| - **Commonsense Reasoning**: HellaSwag score lags behind modern 1.5B+ models |
| - **Mathematical Reasoning**: GSM8K performance is solid but not top-tier |
| - **Scale**: Further training on larger, more diverse datasets could boost all benchmarks |
|
|
| --- |
|
|
| ## π Usage |
|
|
| ### Quick Start |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_name = "pico-1b" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name) |
| |
| prompt = "Explain the theory of relativity in simple terms." |
| inputs = tokenizer(prompt, return_tensors="pt") |
| outputs = model.generate(**inputs, max_length=200) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### Model Formats |
|
|
| - **Safetensors** (recommended): Secure and fast loading |
| - **PyTorch (FP16)**: Standard format |
| - **GGUF**: For local inference with llama.cpp |
|
|
| --- |
|
|
| ## ποΈ Training Details |
|
|
| | Aspect | Description | |
| |--------|-------------| |
| | **Architecture** | Dense decoder-only transformer | |
| | **Optimizer** | AdamW | |
| | **Learning Rate** | Cosine schedule with warmup | |
| | **Batch Size** | Configurable per GPU setup | |
| | **Training Framework** | PyTorch + Hugging Face Transformers | |
|
|
| --- |
|
|
| ## β οΈ Limitations |
|
|
| - **Small Model Size**: As a 1B-parameter model, it has inherent limitations compared to larger models (7B+) on complex reasoning tasks |
| - **Training Data**: Primarily trained on English text; performance on non-English languages may be limited |
| - **Hallucinations**: Like all LLMs, it may generate factually incorrect information |
| - **Context Window**: Limited to 2048 tokens by default |
|
|
| --- |
|
|
| ## π Citation |
|
|
| If you use PiCo 1B in your research or projects, please cite: |
|
|
| ```bibtex |
| @misc{pico1b, |
| title={PiCo 1B: A Compact Language Model Optimized for Reasoning}, |
| author={Arc Develop Team}, |
| year={2026}, |
| howpublished={\url{https://github.com/pico-llm/pico-1b}}, |
| } |
| ``` |
|
|
| --- |
|
|
| ## π License |
|
|
| This model is released under an open-source license. Please see the LICENSE file for details. |
|
|
| --- |
|
|
| *Last updated: June 2026* |