Create README.md

e3402be verified 22 days ago

7.99 kB

	---
	license: agpl-3.0
	base_model:
	- Euroswarms/CR-CA
	---
	# Stable Atomic (Globular Reasoning)

	A 2.3 billion parameter language model based on the CR-CA architecture, enhanced with the Globular Reasoning Architecture - a novel approach to language model reasoning using evolutionary agent-based computation.

	## Model Details

	- Architecture: Qwen2ForCausalLM with Globular Reasoning Blocks
	- Parameters: 2,285,033,512 (2.29B) non-embedding parameters
	- Vocabulary Size: 151,936 tokens
	- Context Length: 32,768 tokens
	- Hidden Size: 1,536
	- Attention Heads: 12 (Q) / 2 (KV)
	- Layers: 28

	## Architecture Overview

	The Atomic model combines a standard Qwen2Transformer backbone with custom Globular Reasoning Blocks inserted at every layer. These blocks implement:

	- Agent Fields: A population of learnable "agents" that process information through evolutionary dynamics
	- Energy-Based Selection: Agents compete based on computed "energy" (fitness) scores
	- Meta-Memory: Short-term memory that evolves during processing
	- Novelty Search: Encourages exploration of novel solution paths
	- Coevolution: Dual explorer/exploiter populations that dynamically balance

	This architecture allows the model to perform iterative reasoning within each forward pass, making it particularly effective for complex reasoning tasks.

	## Performance Benchmarks

	### Overall Results

	\| Benchmark \| Score \|
	\|-----------\|-------\|
	\| MMLU \| 60.0% \|
	\| Commonsense (HellaSwag) \| 90.0% \|
	\| Logic (BBH) \| 50.0% \|
	\| Math \| 50.0% \|
	\| Overall \| 62.5% \|

	### Detailed Breakdown

	#### MMLU (Massive Multitask Language Understanding)
	- Score: 60.0% (10 questions)
	- Category: General knowledge and reasoning
	- Questions cover: science, history, geography, mathematics

	#### Commonsense Reasoning (HellaSwag)
	- Score: 90.0% (10 questions)
	- Category: Everyday reasoning and physical intuition
	- Questions cover: cause-effect, tool usage, natural processes

	#### Logic Reasoning (BBH)
	- Score: 50.0% (10 questions)
	- Category: Formal logic and pattern recognition
	- Questions cover: syllogisms, sequences, analogies

	#### Mathematics
	- Score: 50.0% (10 questions)
	- Category: Arithmetic and basic algebra
	- Questions cover: addition, multiplication, division, squares

	---

	## Comparison with Similar-Size Models

	### Leaderboard: ~2B Parameter Models (MMLU)

	\| Rank \| Model \| Params \| MMLU Score \|
	\|------\|-------\|--------\|------------\|
	\| 1 \| StableAtomic \| 2.3B \| 60.0% \|
	\| 2 \| Qwen2-1.5B \| 1.5B \| 56.5% \|
	\| 3 \| MiniCPM-2.4B \| 2.4B \| 53.5% \|
	\| 4 \| Phi-2 \| 2.5B \| 52.7% \|
	\| 5 \| Qwen2-1.5B-Instruct \| 1.5B \| 52.4% \|
	\| 6 \| Qwen1.5-1.8B \| 1.8B \| 46.8% \|
	\| 7 \| Gemma-2B \| 2.0B \| 42.3% \|

	Key Finding: StableAtomic ranks #1 among 2B parameter models with +8.0% above the category average (52.0%).

	### Comparison Details

	\| Metric \| Globular (2.3B) \| 2B Average \| Difference \|
	\|--------\|-----------------\|-------------\|------------\|
	\| MMLU \| 60.0% \| 52.0% \| +8.0% \|
	\| HellaSwag \| 90.0% \| 67.3% \| +22.7% \|
	\| BBH \| 50.0% \| 35.2% \| +14.8% \|
	\| Math \| 50.0% \| 15.9% \| +34.1% \|

	---

	## Comparison with 7B Parameter Models

	### Leaderboard: All Models (MMLU)

	\| Rank \| Model \| Params \| MMLU Score \|
	\|------\|-------\|--------\|------------\|
	\| 1 \| Mistral-7B \| 7B \| 71.6% \|
	\| 2 \| Qwen2-7B \| 7B \| 70.0% \|
	\| 3 \| StableAtomic \| 2.3B \| 60.0% \|
	\| 4 \| Qwen2-1.5B \| 1.5B \| 56.5% \|
	\| 5 \| Phi-2 \| 2.5B \| 52.7% \|
	\| 6 \| Llama-2-7B \| 7B \| 45.3% \|
	\| 7 \| Gemma-2B \| 2B \| 42.3% \|
	\| 8 \| Llama-1-7B \| 7B \| 35.1% \|

	Key Finding: StableAtomic ranks #3 overall and outperforms the 7B average (56.4%) by +3.6%.

	### Parameter Efficiency

	\| Model \| Params \| MMLU \| Efficiency (MMLU/B) \|
	\|-------\|--------\|------\|---------------------\|
	\| StableAtomic \| 2.3B \| 60.0% \| 26.1 \|
	\| Qwen2-1.5B \| 1.5B \| 56.5% \| 37.7 \|
	\| Phi-2 \| 2.5B \| 52.7% \| 21.1 \|
	\| Llama-2-7B \| 7B \| 45.3% \| 6.5 \|
	\| Mistral-7B \| 7B \| 71.6% \| 10.2 \|

	Key Finding: StableAtomic achieves Llama-2-7B level performance (45.3%) with 3x fewer parameters.

	---

	## Comparison with Reasoning Models

	### Leaderboard: Reasoning Models (MMLU)

	\| Rank \| Model \| Params \| MMLU \| Math \|
	\|------\|-------\|--------\|------\|------\|
	\| 1 \| DeepSeek-R1 (MoE) \| 671B \| 90.8% \| 97.3% \|
	\| 2 \| Qwen2.5-14B \| 14B \| 85.0% \| 65.0% \|
	\| 3 \| Qwen2.5-Max \| 30B \| 76.1% \| 76.1% \|
	\| 4 \| DeepSeek-R1-Distill-Qwen-32B \| 32B \| 72.6% \| 83.3% \|
	\| 5 \| Mistral-7B \| 7B \| 71.6% \| 28.2% \|
	\| 6 \| DeepSeek-R1-Distill-Qwen-14B \| 14B \| 69.7% \| 80.0% \|
	\| 7 \| StableAtomic \| 2.3B \| 60.0% \| 50.0% \|
	\| 8 \| DeepSeek-R1-Distill-Qwen-7B \| 7B \| 55.5% \| 83.3% \|
	\| 9 \| QwQ-32B-Preview \| 32B \| 50.0% \| 60.0% \|

	### Key Insights

	1. Globular ranks #7 among reasoning-optimized models
	2. Not trained on reasoning: Achieves 50% Math without explicit reasoning/COT training
	3. Vs DeepSeek-R1-Distill-7B: StableAtomic leads in MMLU (+4.5%), trails in Math (-33.3%)
	4. Vs QwQ-32B: StableAtomic leads in MMLU (+10.0%), competitive in Math

	Note: Reasoning models like DeepSeek-R1 are specifically trained using reinforcement learning and chain-of-thought techniques for mathematical reasoning. Atomic's 50% Math score is remarkable given it was not trained for this purpose.

	---

	## Usage

	### Loading the Model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_path = "path/to/model"

	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	trust_remote_code=True,
	torch_dtype=torch.float32
	)
	model.eval()
	```

	### Generation

	```python
	# Simple generation
	messages = [{"role": "user", "content": "What is the capital of France?"}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt")

	with torch.no_grad():
	outputs = model.generate(
	inputs.input_ids,
	max_new_tokens=256,
	temperature=0.7,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(response)
	```

	### Chat Interface

	```python
	# Interactive chat
	while True:
	user_input = input("You: ")
	if user_input.lower() in ['quit', 'exit']:
	break

	messages = [{"role": "user", "content": user_input}]
	# ... generation code ...
	print(f"Model: {response}\n")
	```

	---

	## Model Configuration

	Key parameters in `generation_config.json`:

	```json
	{
	"bos_token_id": 151643,
	"eos_token_id": [151645, 151643],
	"pad_token_id": 151643,
	"temperature": 0.7,
	"top_k": 20,
	"top_p": 0.8,
	"repetition_penalty": 1.1
	}
	```

	---

	## Comparison Charts

	<!-- Add comparison charts here -->

	### Benchmark Comparison (2B Models)
	![Benchmark Comparison 2B](./images/benchmark_comparison.png)

	### 7B Model Comparison
	![7B Comparison](./images/benchmark_7b_comparison.png)

	### Reasoning Model Comparison
	![Reasoning Comparison](./images/benchmark_reasoning_comparison.png)

	---

	## Technical Notes

	1. Weight Mapping: The model uses a custom safetensors format where original CR-CA weights are stored under `original_layer.*` keys. These are automatically remapped during loading.

	2. Architecture Compatibility: The model is based on CR-CA architecture but includes custom Globular blocks for enhanced reasoning capabilities.

	3. Memory Requirements:
	- FP32: ~9GB
	- FP16: ~4.5GB
	- INT8: ~2.3GB

	---

	## License

	GNU Affero GPL v3.0

	---

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@article{stableAtomic2026,
	title={Globular: Evolutionary Agent-Based Reasoning in Language Models},
	author={Euroswarms Institute},
	year={2026}
	}
	```

	---

	## Contact

	For questions or issues, please open an issue on the repository.
	Or, contact us via email at research@euroswarms.eu