| --- |
| license: agpl-3.0 |
| base_model: |
| - Euroswarms/CR-CA |
| --- |
| # Stable Atomic (Globular Reasoning) |
|
|
| A 2.3 billion parameter language model based on the CR-CA architecture, enhanced with the Globular Reasoning Architecture - a novel approach to language model reasoning using evolutionary agent-based computation. |
|
|
| ## Model Details |
|
|
| - **Architecture**: Qwen2ForCausalLM with Globular Reasoning Blocks |
| - **Parameters**: 2,285,033,512 (2.29B) non-embedding parameters |
| - **Vocabulary Size**: 151,936 tokens |
| - **Context Length**: 32,768 tokens |
| - **Hidden Size**: 1,536 |
| - **Attention Heads**: 12 (Q) / 2 (KV) |
| - **Layers**: 28 |
|
|
| ## Architecture Overview |
|
|
| The Atomic model combines a standard Qwen2Transformer backbone with custom **Globular Reasoning Blocks** inserted at every layer. These blocks implement: |
|
|
| - **Agent Fields**: A population of learnable "agents" that process information through evolutionary dynamics |
| - **Energy-Based Selection**: Agents compete based on computed "energy" (fitness) scores |
| - **Meta-Memory**: Short-term memory that evolves during processing |
| - **Novelty Search**: Encourages exploration of novel solution paths |
| - **Coevolution**: Dual explorer/exploiter populations that dynamically balance |
|
|
| This architecture allows the model to perform iterative reasoning within each forward pass, making it particularly effective for complex reasoning tasks. |
|
|
| ## Performance Benchmarks |
|
|
| ### Overall Results |
|
|
| | Benchmark | Score | |
| |-----------|-------| |
| | MMLU | 60.0% | |
| | Commonsense (HellaSwag) | 90.0% | |
| | Logic (BBH) | 50.0% | |
| | Math | 50.0% | |
| | **Overall** | **62.5%** | |
|
|
| ### Detailed Breakdown |
|
|
| #### MMLU (Massive Multitask Language Understanding) |
| - **Score**: 60.0% (10 questions) |
| - **Category**: General knowledge and reasoning |
| - Questions cover: science, history, geography, mathematics |
|
|
| #### Commonsense Reasoning (HellaSwag) |
| - **Score**: 90.0% (10 questions) |
| - **Category**: Everyday reasoning and physical intuition |
| - Questions cover: cause-effect, tool usage, natural processes |
|
|
| #### Logic Reasoning (BBH) |
| - **Score**: 50.0% (10 questions) |
| - **Category**: Formal logic and pattern recognition |
| - Questions cover: syllogisms, sequences, analogies |
|
|
| #### Mathematics |
| - **Score**: 50.0% (10 questions) |
| - **Category**: Arithmetic and basic algebra |
| - Questions cover: addition, multiplication, division, squares |
|
|
| --- |
|
|
| ## Comparison with Similar-Size Models |
|
|
| ### Leaderboard: ~2B Parameter Models (MMLU) |
|
|
| | Rank | Model | Params | MMLU Score | |
| |------|-------|--------|------------| |
| | **1** | **StableAtomic** | **2.3B** | **60.0%** | |
| | 2 | Qwen2-1.5B | 1.5B | 56.5% | |
| | 3 | MiniCPM-2.4B | 2.4B | 53.5% | |
| | 4 | Phi-2 | 2.5B | 52.7% | |
| | 5 | Qwen2-1.5B-Instruct | 1.5B | 52.4% | |
| | 6 | Qwen1.5-1.8B | 1.8B | 46.8% | |
| | 7 | Gemma-2B | 2.0B | 42.3% | |
|
|
| **Key Finding**: StableAtomic ranks **#1** among 2B parameter models with **+8.0%** above the category average (52.0%). |
|
|
| ### Comparison Details |
|
|
| | Metric | Globular (2.3B) | 2B Average | Difference | |
| |--------|-----------------|-------------|------------| |
| | MMLU | 60.0% | 52.0% | **+8.0%** | |
| | HellaSwag | 90.0% | 67.3% | **+22.7%** | |
| | BBH | 50.0% | 35.2% | **+14.8%** | |
| | Math | 50.0% | 15.9% | **+34.1%** | |
|
|
| --- |
|
|
| ## Comparison with 7B Parameter Models |
|
|
| ### Leaderboard: All Models (MMLU) |
|
|
| | Rank | Model | Params | MMLU Score | |
| |------|-------|--------|------------| |
| | 1 | Mistral-7B | 7B | 71.6% | |
| | 2 | Qwen2-7B | 7B | 70.0% | |
| | **3** | **StableAtomic** | **2.3B** | **60.0%** | |
| | 4 | Qwen2-1.5B | 1.5B | 56.5% | |
| | 5 | Phi-2 | 2.5B | 52.7% | |
| | 6 | Llama-2-7B | 7B | 45.3% | |
| | 7 | Gemma-2B | 2B | 42.3% | |
| | 8 | Llama-1-7B | 7B | 35.1% | |
|
|
| **Key Finding**: StableAtomic ranks **#3** overall and **outperforms the 7B average** (56.4%) by **+3.6%**. |
|
|
| ### Parameter Efficiency |
|
|
| | Model | Params | MMLU | Efficiency (MMLU/B) | |
| |-------|--------|------|---------------------| |
| | **StableAtomic** | **2.3B** | **60.0%** | **26.1** | |
| | Qwen2-1.5B | 1.5B | 56.5% | 37.7 | |
| | Phi-2 | 2.5B | 52.7% | 21.1 | |
| | Llama-2-7B | 7B | 45.3% | 6.5 | |
| | Mistral-7B | 7B | 71.6% | 10.2 | |
|
|
| **Key Finding**: StableAtomic achieves Llama-2-7B level performance (45.3%) with **3x fewer parameters**. |
|
|
| --- |
|
|
| ## Comparison with Reasoning Models |
|
|
| ### Leaderboard: Reasoning Models (MMLU) |
|
|
| | Rank | Model | Params | MMLU | Math | |
| |------|-------|--------|------|------| |
| | 1 | DeepSeek-R1 (MoE) | 671B | 90.8% | 97.3% | |
| | 2 | Qwen2.5-14B | 14B | 85.0% | 65.0% | |
| | 3 | Qwen2.5-Max | 30B | 76.1% | 76.1% | |
| | 4 | DeepSeek-R1-Distill-Qwen-32B | 32B | 72.6% | 83.3% | |
| | 5 | Mistral-7B | 7B | 71.6% | 28.2% | |
| | 6 | DeepSeek-R1-Distill-Qwen-14B | 14B | 69.7% | 80.0% | |
| | **7** | **StableAtomic** | **2.3B** | **60.0%** | **50.0%** | |
| | 8 | DeepSeek-R1-Distill-Qwen-7B | 7B | 55.5% | 83.3% | |
| | 9 | QwQ-32B-Preview | 32B | 50.0% | 60.0% | |
|
|
| ### Key Insights |
|
|
| 1. **Globular ranks #7** among reasoning-optimized models |
| 2. **Not trained on reasoning**: Achieves 50% Math without explicit reasoning/COT training |
| 3. **Vs DeepSeek-R1-Distill-7B**: StableAtomic leads in MMLU (+4.5%), trails in Math (-33.3%) |
| 4. **Vs QwQ-32B**: StableAtomic leads in MMLU (+10.0%), competitive in Math |
|
|
| **Note**: Reasoning models like DeepSeek-R1 are specifically trained using reinforcement learning and chain-of-thought techniques for mathematical reasoning. Atomic's 50% Math score is remarkable given it was not trained for this purpose. |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Loading the Model |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model_path = "path/to/model" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_path, |
| trust_remote_code=True, |
| torch_dtype=torch.float32 |
| ) |
| model.eval() |
| ``` |
|
|
| ### Generation |
|
|
| ```python |
| # Simple generation |
| messages = [{"role": "user", "content": "What is the capital of France?"}] |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt") |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| inputs.input_ids, |
| max_new_tokens=256, |
| temperature=0.7, |
| do_sample=True |
| ) |
| |
| response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) |
| print(response) |
| ``` |
|
|
| ### Chat Interface |
|
|
| ```python |
| # Interactive chat |
| while True: |
| user_input = input("You: ") |
| if user_input.lower() in ['quit', 'exit']: |
| break |
| |
| messages = [{"role": "user", "content": user_input}] |
| # ... generation code ... |
| print(f"Model: {response}\n") |
| ``` |
|
|
| --- |
|
|
| ## Model Configuration |
|
|
| Key parameters in `generation_config.json`: |
|
|
| ```json |
| { |
| "bos_token_id": 151643, |
| "eos_token_id": [151645, 151643], |
| "pad_token_id": 151643, |
| "temperature": 0.7, |
| "top_k": 20, |
| "top_p": 0.8, |
| "repetition_penalty": 1.1 |
| } |
| ``` |
|
|
| --- |
|
|
| ## Comparison Charts |
|
|
| <!-- Add comparison charts here --> |
|
|
| ### Benchmark Comparison (2B Models) |
|  |
|
|
| ### 7B Model Comparison |
|  |
|
|
| ### Reasoning Model Comparison |
|  |
|
|
| --- |
|
|
| ## Technical Notes |
|
|
| 1. **Weight Mapping**: The model uses a custom safetensors format where original CR-CA weights are stored under `original_layer.*` keys. These are automatically remapped during loading. |
|
|
| 2. **Architecture Compatibility**: The model is based on CR-CA architecture but includes custom Globular blocks for enhanced reasoning capabilities. |
|
|
| 3. **Memory Requirements**: |
| - FP32: ~9GB |
| - FP16: ~4.5GB |
| - INT8: ~2.3GB |
|
|
| --- |
|
|
| ## License |
|
|
| GNU Affero GPL v3.0 |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite: |
|
|
| ```bibtex |
| @article{stableAtomic2026, |
| title={Globular: Evolutionary Agent-Based Reasoning in Language Models}, |
| author={Euroswarms Institute}, |
| year={2026} |
| } |
| ``` |
|
|
| --- |
|
|
| ## Contact |
|
|
| For questions or issues, please open an issue on the repository. |
| Or, contact us via email at research@euroswarms.eu |