galena-2b-math-physics / MODEL_CARD.md
xJoePec's picture
Upload MODEL_CARD.md
ac75d74 verified
# Model Card: Galena-2B (Granite 3.3 Math & Physics)
## Model Description
**Galena-2B** is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.
- **Developed by:** [Your Name/Organization]
- **Base Model:** [IBM Granite 3.3-2B Instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)
- **Model Type:** Causal Language Model (Decoder-only Transformer)
- **Language:** English
- **License:** Apache 2.0
- **Fine-tuned from:** ibm-granite/granite-3.3-2b-instruct
## Model Architecture
- **Architecture:** GraniteForCausalLM
- **Parameters:** 2.0B
- **Layers:** 40
- **Hidden Size:** 2048
- **Attention Heads:** 32 (query) / 8 (key-value, GQA)
- **Intermediate Size:** 8192
- **Vocabulary Size:** 49,159 tokens
- **Context Window:** 131,072 tokens (128k)
- **Precision:** bfloat16 (training & inference)
- **Activation Function:** SiLU (Swish)
### Key Features
- **Grouped Query Attention (GQA)** for efficient inference
- **RoPE Embeddings** with extended context support (theta=10M)
- **Attention & Logits Scaling** for training stability
- **Embedding Multiplier** (12.0) and Residual Multiplier (0.22)
## Intended Use
### Primary Use Cases
- **Educational Applications:** Teaching and learning advanced mathematics and physics
- **Research Tools:** Assisting with physics problem formulation and mathematical reasoning
- **Conversational AI:** Domain-specific chatbots for STEM topics
- **Tool-Augmented Reasoning:** Integration with calculators and symbolic math engines
### Out-of-Scope Use
- **Critical Decision Making:** Not suitable for medical, legal, or safety-critical applications
- **General-Purpose Conversational AI:** Optimized for math/physics; may underperform on general topics
- **Production Systems:** This is a research/educational model without production guarantees
- **Factual Information Retrieval:** May hallucinate; always verify outputs
## Training Data
The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:
### 1. NVIDIA Nemotron-RL-Math (Advanced Calculations)
- **Source:** `nvidia/Nemotron-RL-math-advanced_calculations`
- **Content:** Complex mathematical problems with step-by-step reasoning traces
- **Features:** Tool-augmented reasoning, calculator integration, multi-step problem decomposition
- **Format:** Instruction-following with detailed solution traces
### 2. CAMEL-AI Physics Dataset
- **Source:** `camel-ai/physics`
- **Content:** Physics dialogue pairs covering diverse topics and subtopics
- **Features:** Conceptual explanations, problem-solving, physics principles
- **Metadata:** Topic and subtopic categorization for structured learning
### Data Preparation
- **Preprocessing:** `scripts/prepare_math_physics.py` in parent GRANITE repository
- **Format Conversion:** Unified into Granite's chat format (`<|user|>`/`<|assistant|>` tags)
- **Output:** `data/math_physics.jsonl` (26k examples)
- **Token Length:** Max sequence length capped at 512 tokens during training
## Training Procedure
### Training Hyperparameters
- **Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Base Model Precision:** 4-bit quantization (NF4)
- **LoRA Rank:** Default (typically 8-16)
- **LoRA Alpha:** Default
- **Target Modules:** Query, Key, Value, Output projections
- **Gradient Checkpointing:** Enabled
- **Mixed Precision:** bfloat16
### Training Configuration
```python
{
"base_model": "ibm-granite/granite-3.3-2b-instruct",
"dataset_path": "data/math_physics.jsonl",
"output_dir": "outputs/granite-math-physics-lora",
"use_4bit": true,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 4,
"effective_batch_size": 4,
"num_train_epochs": 1,
"max_steps": 500,
"max_seq_length": 512,
"learning_rate": "2e-4 (default)",
"batching_strategy": "padding",
"optimizer": "paged_adamw_8bit",
"bf16": true
}
```
### Training Infrastructure
- **Hardware:** NVIDIA GeForce RTX 4060 (8GB VRAM)
- **Software Stack:**
- PyTorch 2.x
- Hugging Face Transformers 4.44+
- PEFT 0.11+
- bitsandbytes 0.43+
- CUDA 12.1
- **Training Time:** ~500 steps (1 epoch over 26k examples with batch size 4)
- **Checkpointing:** LoRA adapters saved every N steps
### Post-Training
1. **Adapter Merging:** LoRA adapters merged back into base weights using `scripts/merge_lora.py`
2. **GGUF Conversion:** Exported to F16 GGUF format via `llama.cpp/convert_hf_to_gguf.py`
3. **Formats Produced:**
- Hugging Face Transformers (safetensors)
- GGUF F16 (llama.cpp compatible)
## Evaluation
### Qualitative Assessment
The model demonstrates improved performance on:
- Multi-step mathematical reasoning
- Physics problem explanation
- Calculator-augmented computation tasks
- Domain-specific terminology and notation
### Limitations
- **Limited Training Steps:** Only 500 training steps; longer training may improve performance
- **Domain Specialization:** May sacrifice general capabilities for math/physics expertise
- **Hallucination Risk:** Can generate plausible but incorrect solutions
- **Tool Integration:** Expects calculator tools in reasoning traces; standalone performance may vary
- **Context Window:** Fine-tuned on 512-token sequences; full 128k context not extensively tested
## Bias, Risks, and Limitations
### Known Limitations
1. **Domain Specificity:** Optimized for math/physics; general knowledge may be limited
2. **Factual Accuracy:** No guarantee of correctness; outputs should be verified
3. **Training Data Bias:** Inherits biases from Nemotron and CAMEL-AI datasets
4. **Base Model Limitations:** Retains all limitations of Granite 3.3-2B Instruct
5. **Small Training Set:** 26k examples may not cover all edge cases
### Ethical Considerations
- **Educational Use:** Should supplement, not replace, human instruction
- **Verification Required:** Always validate mathematical and scientific outputs
- **Accessibility:** May use technical jargon inaccessible to beginners
- **Dataset Provenance:** Users should review source dataset licenses and terms
### Recommendations
- Use as an educational aid, not a source of truth
- Implement output validation for critical applications
- Combine with symbolic computation tools for verification
- Monitor for hallucinations and incorrect reasoning
- Consider fine-tuning on domain-specific data for production use
## Environmental Impact
- **Hardware:** NVIDIA RTX 4060 (8GB VRAM)
- **Training Duration:** ~500 steps (estimated 1-2 hours)
- **Energy Consumption:** Estimated <1 kWh for training
- **Carbon Footprint:** Minimal due to efficient LoRA training
## Technical Specifications
### Model Formats
| Format | Precision | Size | Compatible Frameworks |
|--------|-----------|------|-----------------------|
| Hugging Face Transformers | bfloat16 | ~5.0 GB | PyTorch, Transformers, vLLM, TGI |
| GGUF F16 | float16 | ~4.7 GB | llama.cpp, Ollama, LM Studio |
### System Requirements
**Minimum (CPU Inference):**
- RAM: 8 GB
- Storage: 10 GB free space
- CPU: Modern x86-64 with AVX2 support
**Recommended (GPU Inference):**
- GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
- RAM: 16 GB
- CUDA 12.1+ or ROCm 5.7+
### Loading & Inference
Before running inference, pull the artifacts into `models/math-physics/`:
```bash
python scripts/download_artifacts.py --artifact all
```
**Transformers (Python):**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"models/math-physics/hf",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")
```
**llama.cpp (Command Line):**
```bash
./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256
```
## Citation
```bibtex
@software{galena_2b_2024,
title = {Galena-2B: Granite 3.3 Math & Physics Model},
author = {Your Name},
year = {2024},
url = {https://github.com/yourusername/galena-2B},
note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}
}
```
## Acknowledgments
- IBM Research for the Granite 3.3 foundation model
- NVIDIA for the Nemotron-RL-Math dataset
- CAMEL-AI for the physics dialogue dataset
- Hugging Face for training infrastructure and libraries
## Contact
For questions, issues, or contributions:
- **Repository:** [GitHub Issues](https://github.com/yourusername/galena-2B/issues)
- **Email:** your.email@example.com
## Changelog
### Version 1.0 (2024-11-17)
- Initial release
- Fine-tuned on 26k math/physics examples
- 500 training steps with QLoRA
- Hugging Face and GGUF formats released