--- license: agpl-3.0 base_model: - Euroswarms/CR-CA --- # Stable Atomic (Globular Reasoning) A 2.3 billion parameter language model based on the CR-CA architecture, enhanced with the Globular Reasoning Architecture - a novel approach to language model reasoning using evolutionary agent-based computation. ## Model Details - **Architecture**: Qwen2ForCausalLM with Globular Reasoning Blocks - **Parameters**: 2,285,033,512 (2.29B) non-embedding parameters - **Vocabulary Size**: 151,936 tokens - **Context Length**: 32,768 tokens - **Hidden Size**: 1,536 - **Attention Heads**: 12 (Q) / 2 (KV) - **Layers**: 28 ## Architecture Overview The Atomic model combines a standard Qwen2Transformer backbone with custom **Globular Reasoning Blocks** inserted at every layer. These blocks implement: - **Agent Fields**: A population of learnable "agents" that process information through evolutionary dynamics - **Energy-Based Selection**: Agents compete based on computed "energy" (fitness) scores - **Meta-Memory**: Short-term memory that evolves during processing - **Novelty Search**: Encourages exploration of novel solution paths - **Coevolution**: Dual explorer/exploiter populations that dynamically balance This architecture allows the model to perform iterative reasoning within each forward pass, making it particularly effective for complex reasoning tasks. ## Performance Benchmarks ### Overall Results | Benchmark | Score | |-----------|-------| | MMLU | 60.0% | | Commonsense (HellaSwag) | 90.0% | | Logic (BBH) | 50.0% | | Math | 50.0% | | **Overall** | **62.5%** | ### Detailed Breakdown #### MMLU (Massive Multitask Language Understanding) - **Score**: 60.0% (10 questions) - **Category**: General knowledge and reasoning - Questions cover: science, history, geography, mathematics #### Commonsense Reasoning (HellaSwag) - **Score**: 90.0% (10 questions) - **Category**: Everyday reasoning and physical intuition - Questions cover: cause-effect, tool usage, natural processes #### Logic Reasoning (BBH) - **Score**: 50.0% (10 questions) - **Category**: Formal logic and pattern recognition - Questions cover: syllogisms, sequences, analogies #### Mathematics - **Score**: 50.0% (10 questions) - **Category**: Arithmetic and basic algebra - Questions cover: addition, multiplication, division, squares --- ## Comparison with Similar-Size Models ### Leaderboard: ~2B Parameter Models (MMLU) | Rank | Model | Params | MMLU Score | |------|-------|--------|------------| | **1** | **StableAtomic** | **2.3B** | **60.0%** | | 2 | Qwen2-1.5B | 1.5B | 56.5% | | 3 | MiniCPM-2.4B | 2.4B | 53.5% | | 4 | Phi-2 | 2.5B | 52.7% | | 5 | Qwen2-1.5B-Instruct | 1.5B | 52.4% | | 6 | Qwen1.5-1.8B | 1.8B | 46.8% | | 7 | Gemma-2B | 2.0B | 42.3% | **Key Finding**: StableAtomic ranks **#1** among 2B parameter models with **+8.0%** above the category average (52.0%). ### Comparison Details | Metric | Globular (2.3B) | 2B Average | Difference | |--------|-----------------|-------------|------------| | MMLU | 60.0% | 52.0% | **+8.0%** | | HellaSwag | 90.0% | 67.3% | **+22.7%** | | BBH | 50.0% | 35.2% | **+14.8%** | | Math | 50.0% | 15.9% | **+34.1%** | --- ## Comparison with 7B Parameter Models ### Leaderboard: All Models (MMLU) | Rank | Model | Params | MMLU Score | |------|-------|--------|------------| | 1 | Mistral-7B | 7B | 71.6% | | 2 | Qwen2-7B | 7B | 70.0% | | **3** | **StableAtomic** | **2.3B** | **60.0%** | | 4 | Qwen2-1.5B | 1.5B | 56.5% | | 5 | Phi-2 | 2.5B | 52.7% | | 6 | Llama-2-7B | 7B | 45.3% | | 7 | Gemma-2B | 2B | 42.3% | | 8 | Llama-1-7B | 7B | 35.1% | **Key Finding**: StableAtomic ranks **#3** overall and **outperforms the 7B average** (56.4%) by **+3.6%**. ### Parameter Efficiency | Model | Params | MMLU | Efficiency (MMLU/B) | |-------|--------|------|---------------------| | **StableAtomic** | **2.3B** | **60.0%** | **26.1** | | Qwen2-1.5B | 1.5B | 56.5% | 37.7 | | Phi-2 | 2.5B | 52.7% | 21.1 | | Llama-2-7B | 7B | 45.3% | 6.5 | | Mistral-7B | 7B | 71.6% | 10.2 | **Key Finding**: StableAtomic achieves Llama-2-7B level performance (45.3%) with **3x fewer parameters**. --- ## Comparison with Reasoning Models ### Leaderboard: Reasoning Models (MMLU) | Rank | Model | Params | MMLU | Math | |------|-------|--------|------|------| | 1 | DeepSeek-R1 (MoE) | 671B | 90.8% | 97.3% | | 2 | Qwen2.5-14B | 14B | 85.0% | 65.0% | | 3 | Qwen2.5-Max | 30B | 76.1% | 76.1% | | 4 | DeepSeek-R1-Distill-Qwen-32B | 32B | 72.6% | 83.3% | | 5 | Mistral-7B | 7B | 71.6% | 28.2% | | 6 | DeepSeek-R1-Distill-Qwen-14B | 14B | 69.7% | 80.0% | | **7** | **StableAtomic** | **2.3B** | **60.0%** | **50.0%** | | 8 | DeepSeek-R1-Distill-Qwen-7B | 7B | 55.5% | 83.3% | | 9 | QwQ-32B-Preview | 32B | 50.0% | 60.0% | ### Key Insights 1. **Globular ranks #7** among reasoning-optimized models 2. **Not trained on reasoning**: Achieves 50% Math without explicit reasoning/COT training 3. **Vs DeepSeek-R1-Distill-7B**: StableAtomic leads in MMLU (+4.5%), trails in Math (-33.3%) 4. **Vs QwQ-32B**: StableAtomic leads in MMLU (+10.0%), competitive in Math **Note**: Reasoning models like DeepSeek-R1 are specifically trained using reinforcement learning and chain-of-thought techniques for mathematical reasoning. Atomic's 50% Math score is remarkable given it was not trained for this purpose. --- ## Usage ### Loading the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_path = "path/to/model" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_path, trust_remote_code=True, torch_dtype=torch.float32 ) model.eval() ``` ### Generation ```python # Simple generation messages = [{"role": "user", "content": "What is the capital of France?"}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs.input_ids, max_new_tokens=256, temperature=0.7, do_sample=True ) response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response) ``` ### Chat Interface ```python # Interactive chat while True: user_input = input("You: ") if user_input.lower() in ['quit', 'exit']: break messages = [{"role": "user", "content": user_input}] # ... generation code ... print(f"Model: {response}\n") ``` --- ## Model Configuration Key parameters in `generation_config.json`: ```json { "bos_token_id": 151643, "eos_token_id": [151645, 151643], "pad_token_id": 151643, "temperature": 0.7, "top_k": 20, "top_p": 0.8, "repetition_penalty": 1.1 } ``` --- ## Comparison Charts ### Benchmark Comparison (2B Models) ![Benchmark Comparison 2B](./images/benchmark_comparison.png) ### 7B Model Comparison ![7B Comparison](./images/benchmark_7b_comparison.png) ### Reasoning Model Comparison ![Reasoning Comparison](./images/benchmark_reasoning_comparison.png) --- ## Technical Notes 1. **Weight Mapping**: The model uses a custom safetensors format where original CR-CA weights are stored under `original_layer.*` keys. These are automatically remapped during loading. 2. **Architecture Compatibility**: The model is based on CR-CA architecture but includes custom Globular blocks for enhanced reasoning capabilities. 3. **Memory Requirements**: - FP32: ~9GB - FP16: ~4.5GB - INT8: ~2.3GB --- ## License GNU Affero GPL v3.0 --- ## Citation If you use this model in your research, please cite: ```bibtex @article{stableAtomic2026, title={Globular: Evolutionary Agent-Based Reasoning in Language Models}, author={Euroswarms Institute}, year={2026} } ``` --- ## Contact For questions or issues, please open an issue on the repository. Or, contact us via email at research@euroswarms.eu