# Model Card: Galena-2B (Granite 3.3 Math & Physics) ## Model Description **Galena-2B** is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts. - **Developed by:** [Your Name/Organization] - **Base Model:** [IBM Granite 3.3-2B Instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct) - **Model Type:** Causal Language Model (Decoder-only Transformer) - **Language:** English - **License:** Apache 2.0 - **Fine-tuned from:** ibm-granite/granite-3.3-2b-instruct ## Model Architecture - **Architecture:** GraniteForCausalLM - **Parameters:** 2.0B - **Layers:** 40 - **Hidden Size:** 2048 - **Attention Heads:** 32 (query) / 8 (key-value, GQA) - **Intermediate Size:** 8192 - **Vocabulary Size:** 49,159 tokens - **Context Window:** 131,072 tokens (128k) - **Precision:** bfloat16 (training & inference) - **Activation Function:** SiLU (Swish) ### Key Features - **Grouped Query Attention (GQA)** for efficient inference - **RoPE Embeddings** with extended context support (theta=10M) - **Attention & Logits Scaling** for training stability - **Embedding Multiplier** (12.0) and Residual Multiplier (0.22) ## Intended Use ### Primary Use Cases - **Educational Applications:** Teaching and learning advanced mathematics and physics - **Research Tools:** Assisting with physics problem formulation and mathematical reasoning - **Conversational AI:** Domain-specific chatbots for STEM topics - **Tool-Augmented Reasoning:** Integration with calculators and symbolic math engines ### Out-of-Scope Use - **Critical Decision Making:** Not suitable for medical, legal, or safety-critical applications - **General-Purpose Conversational AI:** Optimized for math/physics; may underperform on general topics - **Production Systems:** This is a research/educational model without production guarantees - **Factual Information Retrieval:** May hallucinate; always verify outputs ## Training Data The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets: ### 1. NVIDIA Nemotron-RL-Math (Advanced Calculations) - **Source:** `nvidia/Nemotron-RL-math-advanced_calculations` - **Content:** Complex mathematical problems with step-by-step reasoning traces - **Features:** Tool-augmented reasoning, calculator integration, multi-step problem decomposition - **Format:** Instruction-following with detailed solution traces ### 2. CAMEL-AI Physics Dataset - **Source:** `camel-ai/physics` - **Content:** Physics dialogue pairs covering diverse topics and subtopics - **Features:** Conceptual explanations, problem-solving, physics principles - **Metadata:** Topic and subtopic categorization for structured learning ### Data Preparation - **Preprocessing:** `scripts/prepare_math_physics.py` in parent GRANITE repository - **Format Conversion:** Unified into Granite's chat format (`<|user|>`/`<|assistant|>` tags) - **Output:** `data/math_physics.jsonl` (26k examples) - **Token Length:** Max sequence length capped at 512 tokens during training ## Training Procedure ### Training Hyperparameters - **Method:** QLoRA (Quantized Low-Rank Adaptation) - **Base Model Precision:** 4-bit quantization (NF4) - **LoRA Rank:** Default (typically 8-16) - **LoRA Alpha:** Default - **Target Modules:** Query, Key, Value, Output projections - **Gradient Checkpointing:** Enabled - **Mixed Precision:** bfloat16 ### Training Configuration ```python { "base_model": "ibm-granite/granite-3.3-2b-instruct", "dataset_path": "data/math_physics.jsonl", "output_dir": "outputs/granite-math-physics-lora", "use_4bit": true, "per_device_train_batch_size": 1, "gradient_accumulation_steps": 4, "effective_batch_size": 4, "num_train_epochs": 1, "max_steps": 500, "max_seq_length": 512, "learning_rate": "2e-4 (default)", "batching_strategy": "padding", "optimizer": "paged_adamw_8bit", "bf16": true } ``` ### Training Infrastructure - **Hardware:** NVIDIA GeForce RTX 4060 (8GB VRAM) - **Software Stack:** - PyTorch 2.x - Hugging Face Transformers 4.44+ - PEFT 0.11+ - bitsandbytes 0.43+ - CUDA 12.1 - **Training Time:** ~500 steps (1 epoch over 26k examples with batch size 4) - **Checkpointing:** LoRA adapters saved every N steps ### Post-Training 1. **Adapter Merging:** LoRA adapters merged back into base weights using `scripts/merge_lora.py` 2. **GGUF Conversion:** Exported to F16 GGUF format via `llama.cpp/convert_hf_to_gguf.py` 3. **Formats Produced:** - Hugging Face Transformers (safetensors) - GGUF F16 (llama.cpp compatible) ## Evaluation ### Qualitative Assessment The model demonstrates improved performance on: - Multi-step mathematical reasoning - Physics problem explanation - Calculator-augmented computation tasks - Domain-specific terminology and notation ### Limitations - **Limited Training Steps:** Only 500 training steps; longer training may improve performance - **Domain Specialization:** May sacrifice general capabilities for math/physics expertise - **Hallucination Risk:** Can generate plausible but incorrect solutions - **Tool Integration:** Expects calculator tools in reasoning traces; standalone performance may vary - **Context Window:** Fine-tuned on 512-token sequences; full 128k context not extensively tested ## Bias, Risks, and Limitations ### Known Limitations 1. **Domain Specificity:** Optimized for math/physics; general knowledge may be limited 2. **Factual Accuracy:** No guarantee of correctness; outputs should be verified 3. **Training Data Bias:** Inherits biases from Nemotron and CAMEL-AI datasets 4. **Base Model Limitations:** Retains all limitations of Granite 3.3-2B Instruct 5. **Small Training Set:** 26k examples may not cover all edge cases ### Ethical Considerations - **Educational Use:** Should supplement, not replace, human instruction - **Verification Required:** Always validate mathematical and scientific outputs - **Accessibility:** May use technical jargon inaccessible to beginners - **Dataset Provenance:** Users should review source dataset licenses and terms ### Recommendations - Use as an educational aid, not a source of truth - Implement output validation for critical applications - Combine with symbolic computation tools for verification - Monitor for hallucinations and incorrect reasoning - Consider fine-tuning on domain-specific data for production use ## Environmental Impact - **Hardware:** NVIDIA RTX 4060 (8GB VRAM) - **Training Duration:** ~500 steps (estimated 1-2 hours) - **Energy Consumption:** Estimated <1 kWh for training - **Carbon Footprint:** Minimal due to efficient LoRA training ## Technical Specifications ### Model Formats | Format | Precision | Size | Compatible Frameworks | |--------|-----------|------|-----------------------| | Hugging Face Transformers | bfloat16 | ~5.0 GB | PyTorch, Transformers, vLLM, TGI | | GGUF F16 | float16 | ~4.7 GB | llama.cpp, Ollama, LM Studio | ### System Requirements **Minimum (CPU Inference):** - RAM: 8 GB - Storage: 10 GB free space - CPU: Modern x86-64 with AVX2 support **Recommended (GPU Inference):** - GPU: 6+ GB VRAM (RTX 3060, A4000, or better) - RAM: 16 GB - CUDA 12.1+ or ROCm 5.7+ ### Loading & Inference Before running inference, pull the artifacts into `models/math-physics/`: ```bash python scripts/download_artifacts.py --artifact all ``` **Transformers (Python):** ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "models/math-physics/hf", device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf") ``` **llama.cpp (Command Line):** ```bash ./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256 ``` ## Citation ```bibtex @software{galena_2b_2024, title = {Galena-2B: Granite 3.3 Math & Physics Model}, author = {Your Name}, year = {2024}, url = {https://github.com/yourusername/galena-2B}, note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets} } ``` ## Acknowledgments - IBM Research for the Granite 3.3 foundation model - NVIDIA for the Nemotron-RL-Math dataset - CAMEL-AI for the physics dialogue dataset - Hugging Face for training infrastructure and libraries ## Contact For questions, issues, or contributions: - **Repository:** [GitHub Issues](https://github.com/yourusername/galena-2B/issues) - **Email:** your.email@example.com ## Changelog ### Version 1.0 (2024-11-17) - Initial release - Fine-tuned on 26k math/physics examples - 500 training steps with QLoRA - Hugging Face and GGUF formats released