PHI-2-STEM-261125

DOI License Base Model Model Size

A Fine-tuned Phi-2 Model Optimized for STEM Knowledge

Science, Technology, Engineering, Mathematics, and Ethics


Model Description

PHI-2-STEM-261125 is a fine-tuned version of Microsoft's Phi-2 (2.78B parameters) specifically optimized for generating accurate and comprehensive explanations across multiple STEM domains. The model was trained using INT8 quantization to enable efficient training on consumer-grade GPUs.

Key Features

  • Multi-domain STEM expertise: Mathematics, Physics, Chemistry, Biology, and Ethics
  • Efficient training: INT8 quantization enables training on 4GB VRAM GPUs
  • High-quality curated dataset: 18 expert-written examples covering 11 specialized domains
  • Significant loss reduction: 26% improvement from initial to final loss

Model Details

Model Information

Property Value
Model Name PHI-2-STEM-261125
Base Model microsoft/phi-2
Parameters 2.78 billion
Architecture Transformer (decoder-only)
Precision FP16 (Safetensors)
Training Date November 26, 2025
License MIT
DOI 10.57967/hf/7105

Author Information

Field Value
Author Francisco Molina Burgos
ORCID 0009-0008-6093-8267
Organization Independent Researcher
Contact pako.molina@gmail.com

Training Details

Training Configuration

Parameter Value
Epochs 5
Batch Size 1 (per device)
Gradient Accumulation Steps 4
Effective Batch Size 4
Learning Rate 1e-5
Warmup Steps 2
Max Sequence Length 512 tokens
Precision FP16 (Mixed Precision)
Quantization INT8 (BitsAndBytes)
Gradient Checkpointing Enabled

Hardware Specifications

Component Specification
GPU NVIDIA GeForce RTX 3050 (4GB VRAM)
CPU Intel Core i7-12650H
RAM 16GB
Training Time ~30 minutes
VRAM Usage ~3.5 GB

Training Metrics

Metric Value
Initial Loss 2.07
Final Loss (3 epochs) 1.65
Final Loss (5 epochs) 1.54
Average Loss 1.80
Total Loss Reduction ~26%

Loss Progression

Epoch 1: Loss ~2.07 (initial)
Epoch 2: Loss ~1.85
Epoch 3: Loss ~1.65
Epoch 4: Loss ~1.58
Epoch 5: Loss ~1.54 (final)

Dataset

Overview

The model was trained on a curated dataset of 18 expert-written examples covering 11 specialized STEM domains. Each example provides a concise, technically accurate explanation of fundamental concepts.

Domain Distribution

Domain Examples Topics Covered
Mathematics 3 Fundamental Theorem of Calculus, Riemann Hypothesis, Gödel's Incompleteness Theorems
Organic Chemistry 2 SN2 Reaction Mechanism, Molecular Orbital Theory (Benzene)
Quantum Chemistry 1 Density Functional Theory (DFT)
Quantum Physics 2 Quantum Entanglement, Heisenberg Uncertainty Principle
Physics 1 General Relativity (Einstein Field Equations)
Crystallography 1 X-ray Crystallography
Biochemistry 1 Enzyme Catalysis (Michaelis-Menten)
Pharmacology 1 Pharmacodynamics (Receptor Theory)
Ethics 3 Kant's Categorical Imperative, Bioethics, AI Ethics
Music Theory 2 Harmonic Analysis, Counterpoint
Art Theory 1 Golden Ratio

Dataset Characteristics

  • Format: Plain text explanations
  • Language: English (technical/scientific)
  • Average Length: ~100-150 tokens per example
  • Quality: Expert-curated, factually accurate
  • Coverage: Fundamental concepts across STEM disciplines

Usage

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "Yatro/PHI-2-STEM-261125"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Generate text
prompt = "Explain the Heisenberg Uncertainty Principle:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Usage with INT8 Quantization (Low VRAM)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# INT8 configuration for GPUs with limited VRAM
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
)

model = AutoModelForCausalLM.from_pretrained(
    "Yatro/PHI-2-STEM-261125",
    trust_remote_code=True,
    quantization_config=quantization_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Yatro/PHI-2-STEM-261125", trust_remote_code=True)

# Generate
prompt = "What is Gödel's Incompleteness Theorem?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Prompt Templates

For best results, use clear, direct prompts:

# Good prompts
prompts = [
    "Explain the Fundamental Theorem of Calculus:",
    "What is the SN2 reaction mechanism in organic chemistry?",
    "Describe quantum entanglement:",
    "What are the four principles of bioethics?",
    "Explain Kant's Categorical Imperative:",
]

# Structured format (also works well)
prompt = """Question: What is the Heisenberg Uncertainty Principle?
Answer:"""

Example Outputs

Quantum Physics

Prompt: "Explain quantum entanglement:"

Output:

Quantum entanglement describes correlations between particles that persist regardless of spatial separation. When particles are entangled, measuring one particle's state instantly determines the other's state. This phenomenon, verified experimentally through Bell inequality violations, challenges classical notions of locality and has applications in quantum computing and cryptography.

Organic Chemistry

Prompt: "What is the SN2 mechanism?"

Output:

The SN2 (bimolecular nucleophilic substitution) reaction proceeds through a single concerted step. The nucleophile attacks the electrophilic carbon from the backside, opposite to the leaving group. This results in inversion of stereochemistry (Walden inversion). The reaction is favored with primary substrates, good leaving groups, strong nucleophiles, and polar aprotic solvents.

Ethics

Prompt: "What is Kant's Categorical Imperative?"

Output:

Kant's Categorical Imperative is a fundamental principle in deontological ethics. It states: "Act only according to that maxim whereby you can at the same time will that it should become a universal law." This means moral actions must be universalizable - if everyone acted the same way, the action should remain coherent and beneficial. It emphasizes duty and rational moral principles over consequences.


Intended Use

Primary Use Cases

  1. Educational Content Generation: Creating explanations of STEM concepts for learning materials
  2. Research Assistance: Generating initial drafts of scientific explanations
  3. Tutoring Systems: Providing explanations in AI-assisted learning platforms
  4. Knowledge Retrieval: Answering questions about fundamental STEM concepts
  5. Content Augmentation: Enhancing educational content with accurate explanations

Target Users

  • Educators and teachers
  • Students (undergraduate and graduate level)
  • Science communicators
  • EdTech developers
  • Researchers exploring LLM capabilities in STEM

Limitations

Known Limitations

  1. Small Training Dataset: Only 18 examples, limiting coverage of STEM topics
  2. Domain Specificity: Best performance on topics similar to training data
  3. No Real-time Information: Knowledge cutoff based on base model (Phi-2)
  4. Mathematical Reasoning: May struggle with complex mathematical derivations
  5. Hallucination Risk: May generate plausible-sounding but incorrect information
  6. Language: English only

Out-of-Scope Use Cases

  • Medical diagnosis or treatment recommendations
  • Legal advice
  • Financial decisions
  • Safety-critical applications
  • Generating content presented as human-written without disclosure

Recommendations

  • Always verify generated content against authoritative sources
  • Use as a starting point, not as definitive truth
  • Human review required for any published or educational content
  • Not suitable for generating content on topics outside training domains

Ethical Considerations

Bias and Fairness

  • The model inherits biases from the base Phi-2 model and training data
  • Training data reflects Western academic perspectives on STEM
  • Limited representation of non-Western scientific traditions

Environmental Impact

  • Training was performed on consumer hardware (RTX 3050)
  • Estimated carbon footprint: ~0.5 kg CO2 (30 minutes on 75W GPU)
  • INT8 quantization reduced computational requirements significantly

Transparency

  • Full training code and data are documented
  • Model weights are openly available
  • Limitations are clearly stated

Technical Specifications

Model Architecture

PHI-2-STEM-261125
├── Architecture: Transformer (decoder-only)
├── Hidden Size: 2560
├── Intermediate Size: 10240
├── Num Attention Heads: 32
├── Num Hidden Layers: 32
├── Vocab Size: 51200
├── Max Position Embeddings: 2048
├── Rotary Embedding Dimension: 32
└── Activation Function: GELU

File Structure

PHI-2-STEM-261125/
├── config.json              # Model configuration
├── model.safetensors        # Model weights (F16)
├── tokenizer.json           # Tokenizer vocabulary
├── tokenizer_config.json    # Tokenizer configuration
├── special_tokens_map.json  # Special tokens mapping
└── README.md                # This model card

Dependencies

transformers>=4.35.0
torch>=2.0.0
accelerate>=0.24.0
bitsandbytes>=0.41.0  # For INT8 quantization
safetensors>=0.4.0

Evaluation

Training Evaluation

Metric Value Notes
Final Loss 1.54 After 5 epochs
Loss Reduction 26% From initial 2.07
Convergence Yes Consistent decrease

Qualitative Evaluation

The model was evaluated on:

  • Factual Accuracy: High for trained domains
  • Coherence: Strong sentence-level coherence
  • Relevance: Good adherence to prompts
  • Completeness: Adequate coverage of key concepts

Recommended Benchmarks

For comprehensive evaluation, consider:

Benchmark Purpose Expected Performance
MMLU (STEM subset) Multi-task knowledge Improved on base
GSM8K Mathematical reasoning Baseline
ARC Challenge Scientific reasoning Improved
SciQ Science questions Improved

Citation

BibTeX

@misc{molina_burgos_2025,
    author       = {Molina Burgos, Francisco},
    title        = {{PHI-2-STEM-261125} (Revision 54c4d49)},
    year         = 2025,
    url          = {https://huggingface.co/Yatro/PHI-2-STEM-261125},
    doi          = {10.57967/hf/7105},
    publisher    = {Hugging Face}
}

APA

Molina Burgos, F. (2025). PHI-2-STEM-261125 (Version 54c4d49) [Large language model]. Hugging Face. https://doi.org/10.57967/hf/7105


Related Work

Base Model

  • Phi-2: microsoft/phi-2
    • 2.7B parameter model trained on synthetic and web data
    • Strong performance on reasoning benchmarks

Similar Models

Related Research

  • Gunasekar, S., et al. (2023). "Textbooks Are All You Need"
  • Li, Y., et al. (2023). "Phi-1.5: Training LLMs with Synthetic Data"

Acknowledgments

  • Microsoft Research for the Phi-2 base model
  • Hugging Face for the transformers library and model hosting
  • BitsAndBytes team for efficient INT8 quantization
  • The open-source ML community for tools and inspiration

Version History

Version Date Changes
1.0.0 2025-11-26 Initial release (5 epochs, loss 1.54)

Contact & Support


Made with dedication for the advancement of AI in STEM education

Licensed under MIT - Free to use, modify, and distribute

Downloads last month
4
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Yatro/PHI-2-STEM-261125

Base model

microsoft/phi-2
Adapter
(951)
this model

Evaluation results