# Model Card: Galena-2B (Granite 3.3 Math & Physics)

## Model Description

**Galena-2B** is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.

- **Developed by:** [Your Name/Organization]
- **Base Model:** [IBM Granite 3.3-2B Instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)
- **Model Type:** Causal Language Model (Decoder-only Transformer)
- **Language:** English
- **License:** Apache 2.0
- **Fine-tuned from:** ibm-granite/granite-3.3-2b-instruct

## Model Architecture

- **Architecture:** GraniteForCausalLM
- **Parameters:** 2.0B
- **Layers:** 40
- **Hidden Size:** 2048
- **Attention Heads:** 32 (query) / 8 (key-value, GQA)
- **Intermediate Size:** 8192
- **Vocabulary Size:** 49,159 tokens
- **Context Window:** 131,072 tokens (128k)
- **Precision:** bfloat16 (training & inference)
- **Activation Function:** SiLU (Swish)

### Key Features

- **Grouped Query Attention (GQA)** for efficient inference
- **RoPE Embeddings** with extended context support (theta=10M)
- **Attention & Logits Scaling** for training stability
- **Embedding Multiplier** (12.0) and Residual Multiplier (0.22)

## Intended Use

### Primary Use Cases

- **Educational Applications:** Teaching and learning advanced mathematics and physics
- **Research Tools:** Assisting with physics problem formulation and mathematical reasoning
- **Conversational AI:** Domain-specific chatbots for STEM topics
- **Tool-Augmented Reasoning:** Integration with calculators and symbolic math engines

### Out-of-Scope Use

- **Critical Decision Making:** Not suitable for medical, legal, or safety-critical applications
- **General-Purpose Conversational AI:** Optimized for math/physics; may underperform on general topics
- **Production Systems:** This is a research/educational model without production guarantees
- **Factual Information Retrieval:** May hallucinate; always verify outputs

## Training Data

The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:

### 1. NVIDIA Nemotron-RL-Math (Advanced Calculations)

- **Source:** `nvidia/Nemotron-RL-math-advanced_calculations`
- **Content:** Complex mathematical problems with step-by-step reasoning traces
- **Features:** Tool-augmented reasoning, calculator integration, multi-step problem decomposition
- **Format:** Instruction-following with detailed solution traces

### 2. CAMEL-AI Physics Dataset

- **Source:** `camel-ai/physics`
- **Content:** Physics dialogue pairs covering diverse topics and subtopics
- **Features:** Conceptual explanations, problem-solving, physics principles
- **Metadata:** Topic and subtopic categorization for structured learning

### Data Preparation

- **Preprocessing:** `scripts/prepare_math_physics.py` in parent GRANITE repository
- **Format Conversion:** Unified into Granite's chat format (`<|user|>`/`<|assistant|>` tags)
- **Output:** `data/math_physics.jsonl` (26k examples)
- **Token Length:** Max sequence length capped at 512 tokens during training

## Training Procedure

### Training Hyperparameters

- **Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Base Model Precision:** 4-bit quantization (NF4)
- **LoRA Rank:** Default (typically 8-16)
- **LoRA Alpha:** Default
- **Target Modules:** Query, Key, Value, Output projections
- **Gradient Checkpointing:** Enabled
- **Mixed Precision:** bfloat16

### Training Configuration

```python
{
    "base_model": "ibm-granite/granite-3.3-2b-instruct",
    "dataset_path": "data/math_physics.jsonl",
    "output_dir": "outputs/granite-math-physics-lora",
    "use_4bit": true,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 4,
    "effective_batch_size": 4,
    "num_train_epochs": 1,
    "max_steps": 500,
    "max_seq_length": 512,
    "learning_rate": "2e-4 (default)",
    "batching_strategy": "padding",
    "optimizer": "paged_adamw_8bit",
    "bf16": true
}
```

### Training Infrastructure

- **Hardware:** NVIDIA GeForce RTX 4060 (8GB VRAM)
- **Software Stack:**
  - PyTorch 2.x
  - Hugging Face Transformers 4.44+
  - PEFT 0.11+
  - bitsandbytes 0.43+
  - CUDA 12.1
- **Training Time:** ~500 steps (1 epoch over 26k examples with batch size 4)
- **Checkpointing:** LoRA adapters saved every N steps

### Post-Training

1. **Adapter Merging:** LoRA adapters merged back into base weights using `scripts/merge_lora.py`
2. **GGUF Conversion:** Exported to F16 GGUF format via `llama.cpp/convert_hf_to_gguf.py`
3. **Formats Produced:**
   - Hugging Face Transformers (safetensors)
   - GGUF F16 (llama.cpp compatible)

## Evaluation

### Qualitative Assessment

The model demonstrates improved performance on:

- Multi-step mathematical reasoning
- Physics problem explanation
- Calculator-augmented computation tasks
- Domain-specific terminology and notation

### Limitations

- **Limited Training Steps:** Only 500 training steps; longer training may improve performance
- **Domain Specialization:** May sacrifice general capabilities for math/physics expertise
- **Hallucination Risk:** Can generate plausible but incorrect solutions
- **Tool Integration:** Expects calculator tools in reasoning traces; standalone performance may vary
- **Context Window:** Fine-tuned on 512-token sequences; full 128k context not extensively tested

## Bias, Risks, and Limitations

### Known Limitations

1. **Domain Specificity:** Optimized for math/physics; general knowledge may be limited
2. **Factual Accuracy:** No guarantee of correctness; outputs should be verified
3. **Training Data Bias:** Inherits biases from Nemotron and CAMEL-AI datasets
4. **Base Model Limitations:** Retains all limitations of Granite 3.3-2B Instruct
5. **Small Training Set:** 26k examples may not cover all edge cases

### Ethical Considerations

- **Educational Use:** Should supplement, not replace, human instruction
- **Verification Required:** Always validate mathematical and scientific outputs
- **Accessibility:** May use technical jargon inaccessible to beginners
- **Dataset Provenance:** Users should review source dataset licenses and terms

### Recommendations

- Use as an educational aid, not a source of truth
- Implement output validation for critical applications
- Combine with symbolic computation tools for verification
- Monitor for hallucinations and incorrect reasoning
- Consider fine-tuning on domain-specific data for production use

## Environmental Impact

- **Hardware:** NVIDIA RTX 4060 (8GB VRAM)
- **Training Duration:** ~500 steps (estimated 1-2 hours)
- **Energy Consumption:** Estimated <1 kWh for training
- **Carbon Footprint:** Minimal due to efficient LoRA training

## Technical Specifications

### Model Formats

| Format | Precision | Size | Compatible Frameworks |
|--------|-----------|------|-----------------------|
| Hugging Face Transformers | bfloat16 | ~5.0 GB | PyTorch, Transformers, vLLM, TGI |
| GGUF F16 | float16 | ~4.7 GB | llama.cpp, Ollama, LM Studio |

### System Requirements

**Minimum (CPU Inference):**
- RAM: 8 GB
- Storage: 10 GB free space
- CPU: Modern x86-64 with AVX2 support

**Recommended (GPU Inference):**
- GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
- RAM: 16 GB
- CUDA 12.1+ or ROCm 5.7+

### Loading & Inference

Before running inference, pull the artifacts into `models/math-physics/`:

```bash
python scripts/download_artifacts.py --artifact all
```

**Transformers (Python):**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "models/math-physics/hf",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")
```

**llama.cpp (Command Line):**
```bash
./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256
```

## Citation

```bibtex
@software{galena_2b_2024,
  title = {Galena-2B: Granite 3.3 Math & Physics Model},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/galena-2B},
  note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}
}
```

## Acknowledgments

- IBM Research for the Granite 3.3 foundation model
- NVIDIA for the Nemotron-RL-Math dataset
- CAMEL-AI for the physics dialogue dataset
- Hugging Face for training infrastructure and libraries

## Contact

For questions, issues, or contributions:
- **Repository:** [GitHub Issues](https://github.com/yourusername/galena-2B/issues)
- **Email:** your.email@example.com

## Changelog

### Version 1.0 (2024-11-17)

- Initial release
- Fine-tuned on 26k math/physics examples
- 500 training steps with QLoRA
- Hugging Face and GGUF formats released