--- license: mit language: - en --- # PiCo 1B > A 1B-parameter dense language model optimized for reasoning and knowledge tasks. > > For clarity, our model uses the tokenizer from Qwen 2 1.5B but has been trained from scratch — it is not a fine-tuned version of Qwen 2 1.5B. --- ## 📌 Model Overview **PiCo 1B** is a compact, high-performance language model with ~1.46 billion parameters. Despite its small size, it achieves competitive performance across reasoning, knowledge, and coding benchmarks, particularly excelling in science reasoning tasks. --- ## 📋 Model Details | Attribute | Value | |-----------|-------| | **Model Size** | ~1.46B parameters | | **Architecture** | Dense transformer (decoder-only) | | **Context Length** | 2048 tokens | | **Precision** | FP32 / FP16 / Safetensors | | **License** | Open-source | --- ## 📊 Benchmark Results PiCo 1B is evaluated against **31 open-source models** in the 1B–2B parameter range across 7 standard benchmarks. ### MMLU (Massive Multitask Language Understanding) Measures general knowledge across 57 subjects including STEM, humanities, and social sciences. ![mmlu_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/xRmbn3pMYD6hLVEPZYwBa.png) --- ### GSM8K (Grade School Math) Measures mathematical reasoning with grade-school level word problems. ![gsm8k_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/E03-rJ-NpEwCH5M97ec-v.png) --- ### ARC-Challenge (AI2 Reasoning Challenge) Measures science reasoning with grade-level science questions (harder subset). ![arc_challenge_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/WOOinEwZ0DaRKJl74n6E9.png) --- ### ARC-Easy (AI2 Reasoning Challenge) Measures basic science reasoning with grade-level science questions (easier subset). ![arc_easy_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/BTGhsnWy0gA6iJaHBr8j4.png) --- ### HellaSwag (Commonsense Reasoning) Measures commonsense natural language inference with everyday scenarios. ![hellaswag_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pNLjRk1Xy6aoGFOik3Zsv.png) --- ### HumanEval (Code Generation) Measures functional correctness of code generation across 164 programming problems. ![humaneval_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/FnhNO9sBCDl-Ky438q6ll.png) --- ### TruthfulQA (Truthfulness) Measures whether the model generates truthful answers rather than mimicking common misconceptions. ![truthfulqa_comparison](https://cdn-uploads.huggingface.co/production/uploads/67f51c8931456897b8918a16/pKq1fhlzmfistFhZ8BRRt.png) --- ## 🏆 Performance Highlights ### ✅ Strengths - **Science Reasoning**: Best-in-class performance on ARC-Easy and ARC-Challenge - **General Knowledge**: Top 3 on MMLU, outperforming many larger 1.5B–2B models - **Coding Ability**: Strong HumanEval performance, competitive with models 2x its size - **Truthfulness**: Top 5 on TruthfulQA, demonstrating reliable factual output ### 📈 Areas for Improvement - **Commonsense Reasoning**: HellaSwag score lags behind modern 1.5B+ models - **Mathematical Reasoning**: GSM8K performance is solid but not top-tier - **Scale**: Further training on larger, more diverse datasets could boost all benchmarks --- ## 🚀 Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "pico-1b" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "Explain the theory of relativity in simple terms." inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Model Formats - **Safetensors** (recommended): Secure and fast loading - **PyTorch (FP16)**: Standard format - **GGUF**: For local inference with llama.cpp --- ## 🏋️ Training Details | Aspect | Description | |--------|-------------| | **Architecture** | Dense decoder-only transformer | | **Optimizer** | AdamW | | **Learning Rate** | Cosine schedule with warmup | | **Batch Size** | Configurable per GPU setup | | **Training Framework** | PyTorch + Hugging Face Transformers | --- ## ⚠️ Limitations - **Small Model Size**: As a 1B-parameter model, it has inherent limitations compared to larger models (7B+) on complex reasoning tasks - **Training Data**: Primarily trained on English text; performance on non-English languages may be limited - **Hallucinations**: Like all LLMs, it may generate factually incorrect information - **Context Window**: Limited to 2048 tokens by default --- ## 📝 Citation If you use PiCo 1B in your research or projects, please cite: ```bibtex @misc{pico1b, title={PiCo 1B: A Compact Language Model Optimized for Reasoning}, author={Arc Develop Team}, year={2026}, howpublished={\url{https://github.com/pico-llm/pico-1b}}, } ``` --- ## 📄 License This model is released under an open-source license. Please see the LICENSE file for details. --- *Last updated: June 2026*