Upload MODEL_CARD.md

ac75d74 verified 3 months ago

9.14 kB

	# Model Card: Galena-2B (Granite 3.3 Math & Physics)

	## Model Description

	Galena-2B is a specialized 2-billion parameter language model optimized for mathematical reasoning and physics problem-solving. It is derived from IBM's Granite 3.3-2B Instruct base model through parameter-efficient fine-tuning (LoRA) on curated datasets focused on advanced calculations and physics concepts.

	- Developed by: [Your Name/Organization]
	- Base Model: [IBM Granite 3.3-2B Instruct](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)
	- Model Type: Causal Language Model (Decoder-only Transformer)
	- Language: English
	- License: Apache 2.0
	- Fine-tuned from: ibm-granite/granite-3.3-2b-instruct

	## Model Architecture

	- Architecture: GraniteForCausalLM
	- Parameters: 2.0B
	- Layers: 40
	- Hidden Size: 2048
	- Attention Heads: 32 (query) / 8 (key-value, GQA)
	- Intermediate Size: 8192
	- Vocabulary Size: 49,159 tokens
	- Context Window: 131,072 tokens (128k)
	- Precision: bfloat16 (training & inference)
	- Activation Function: SiLU (Swish)

	### Key Features

	- Grouped Query Attention (GQA) for efficient inference
	- RoPE Embeddings with extended context support (theta=10M)
	- Attention & Logits Scaling for training stability
	- Embedding Multiplier (12.0) and Residual Multiplier (0.22)

	## Intended Use

	### Primary Use Cases

	- Educational Applications: Teaching and learning advanced mathematics and physics
	- Research Tools: Assisting with physics problem formulation and mathematical reasoning
	- Conversational AI: Domain-specific chatbots for STEM topics
	- Tool-Augmented Reasoning: Integration with calculators and symbolic math engines

	### Out-of-Scope Use

	- Critical Decision Making: Not suitable for medical, legal, or safety-critical applications
	- General-Purpose Conversational AI: Optimized for math/physics; may underperform on general topics
	- Production Systems: This is a research/educational model without production guarantees
	- Factual Information Retrieval: May hallucinate; always verify outputs

	## Training Data

	The model was fine-tuned on a carefully curated dataset of 26,000 instruction-response pairs blending two specialized datasets:

	### 1. NVIDIA Nemotron-RL-Math (Advanced Calculations)

	- Source: `nvidia/Nemotron-RL-math-advanced_calculations`
	- Content: Complex mathematical problems with step-by-step reasoning traces
	- Features: Tool-augmented reasoning, calculator integration, multi-step problem decomposition
	- Format: Instruction-following with detailed solution traces

	### 2. CAMEL-AI Physics Dataset

	- Source: `camel-ai/physics`
	- Content: Physics dialogue pairs covering diverse topics and subtopics
	- Features: Conceptual explanations, problem-solving, physics principles
	- Metadata: Topic and subtopic categorization for structured learning

	### Data Preparation

	- Preprocessing: `scripts/prepare_math_physics.py` in parent GRANITE repository
	- Format Conversion: Unified into Granite's chat format (`<\|user\|>`/`<\|assistant\|>` tags)
	- Output: `data/math_physics.jsonl` (26k examples)
	- Token Length: Max sequence length capped at 512 tokens during training

	## Training Procedure

	### Training Hyperparameters

	- Method: QLoRA (Quantized Low-Rank Adaptation)
	- Base Model Precision: 4-bit quantization (NF4)
	- LoRA Rank: Default (typically 8-16)
	- LoRA Alpha: Default
	- Target Modules: Query, Key, Value, Output projections
	- Gradient Checkpointing: Enabled
	- Mixed Precision: bfloat16

	### Training Configuration

	```python
	{
	"base_model": "ibm-granite/granite-3.3-2b-instruct",
	"dataset_path": "data/math_physics.jsonl",
	"output_dir": "outputs/granite-math-physics-lora",
	"use_4bit": true,
	"per_device_train_batch_size": 1,
	"gradient_accumulation_steps": 4,
	"effective_batch_size": 4,
	"num_train_epochs": 1,
	"max_steps": 500,
	"max_seq_length": 512,
	"learning_rate": "2e-4 (default)",
	"batching_strategy": "padding",
	"optimizer": "paged_adamw_8bit",
	"bf16": true
	}
	```

	### Training Infrastructure

	- Hardware: NVIDIA GeForce RTX 4060 (8GB VRAM)
	- Software Stack:
	- PyTorch 2.x
	- Hugging Face Transformers 4.44+
	- PEFT 0.11+
	- bitsandbytes 0.43+
	- CUDA 12.1
	- Training Time: ~500 steps (1 epoch over 26k examples with batch size 4)
	- Checkpointing: LoRA adapters saved every N steps

	### Post-Training

	1. Adapter Merging: LoRA adapters merged back into base weights using `scripts/merge_lora.py`
	2. GGUF Conversion: Exported to F16 GGUF format via `llama.cpp/convert_hf_to_gguf.py`
	3. Formats Produced:
	- Hugging Face Transformers (safetensors)
	- GGUF F16 (llama.cpp compatible)

	## Evaluation

	### Qualitative Assessment

	The model demonstrates improved performance on:

	- Multi-step mathematical reasoning
	- Physics problem explanation
	- Calculator-augmented computation tasks
	- Domain-specific terminology and notation

	### Limitations

	- Limited Training Steps: Only 500 training steps; longer training may improve performance
	- Domain Specialization: May sacrifice general capabilities for math/physics expertise
	- Hallucination Risk: Can generate plausible but incorrect solutions
	- Tool Integration: Expects calculator tools in reasoning traces; standalone performance may vary
	- Context Window: Fine-tuned on 512-token sequences; full 128k context not extensively tested

	## Bias, Risks, and Limitations

	### Known Limitations

	1. Domain Specificity: Optimized for math/physics; general knowledge may be limited
	2. Factual Accuracy: No guarantee of correctness; outputs should be verified
	3. Training Data Bias: Inherits biases from Nemotron and CAMEL-AI datasets
	4. Base Model Limitations: Retains all limitations of Granite 3.3-2B Instruct
	5. Small Training Set: 26k examples may not cover all edge cases

	### Ethical Considerations

	- Educational Use: Should supplement, not replace, human instruction
	- Verification Required: Always validate mathematical and scientific outputs
	- Accessibility: May use technical jargon inaccessible to beginners
	- Dataset Provenance: Users should review source dataset licenses and terms

	### Recommendations

	- Use as an educational aid, not a source of truth
	- Implement output validation for critical applications
	- Combine with symbolic computation tools for verification
	- Monitor for hallucinations and incorrect reasoning
	- Consider fine-tuning on domain-specific data for production use

	## Environmental Impact

	- Hardware: NVIDIA RTX 4060 (8GB VRAM)
	- Training Duration: ~500 steps (estimated 1-2 hours)
	- Energy Consumption: Estimated <1 kWh for training
	- Carbon Footprint: Minimal due to efficient LoRA training

	## Technical Specifications

	### Model Formats

	\| Format \| Precision \| Size \| Compatible Frameworks \|
	\|--------\|-----------\|------\|-----------------------\|
	\| Hugging Face Transformers \| bfloat16 \| ~5.0 GB \| PyTorch, Transformers, vLLM, TGI \|
	\| GGUF F16 \| float16 \| ~4.7 GB \| llama.cpp, Ollama, LM Studio \|

	### System Requirements

	Minimum (CPU Inference):
	- RAM: 8 GB
	- Storage: 10 GB free space
	- CPU: Modern x86-64 with AVX2 support

	Recommended (GPU Inference):
	- GPU: 6+ GB VRAM (RTX 3060, A4000, or better)
	- RAM: 16 GB
	- CUDA 12.1+ or ROCm 5.7+

	### Loading & Inference

	Before running inference, pull the artifacts into `models/math-physics/`:

	```bash
	python scripts/download_artifacts.py --artifact all
	```

	Transformers (Python):
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"models/math-physics/hf",
	device_map="auto",
	trust_remote_code=True
	)
	tokenizer = AutoTokenizer.from_pretrained("models/math-physics/hf")
	```

	llama.cpp (Command Line):
	```bash
	./llama-cli -m granite-math-physics-f16.gguf -p "Your prompt" -n 256
	```

	## Citation

	```bibtex
	@software{galena_2b_2024,
	title = {Galena-2B: Granite 3.3 Math & Physics Model},
	author = {Your Name},
	year = {2024},
	url = {https://github.com/yourusername/galena-2B},
	note = {Fine-tuned from IBM Granite 3.3-2B Instruct on math and physics datasets}
	}
	```

	## Acknowledgments

	- IBM Research for the Granite 3.3 foundation model
	- NVIDIA for the Nemotron-RL-Math dataset
	- CAMEL-AI for the physics dialogue dataset
	- Hugging Face for training infrastructure and libraries

	## Contact

	For questions, issues, or contributions:
	- Repository: [GitHub Issues](https://github.com/yourusername/galena-2B/issues)
	- Email: your.email@example.com

	## Changelog

	### Version 1.0 (2024-11-17)

	- Initial release
	- Fine-tuned on 26k math/physics examples
	- 500 training steps with QLoRA
	- Hugging Face and GGUF formats released