---
base_model: Qwen/Qwen2-VL-7B
library_name: peft
pipeline_tag: image-text-to-text
tags:
  - base_model:adapter:Qwen/Qwen2-VL-7B
  - lora
  - qwen2_vl
  - multimodal
  - transformers
license: apache-2.0
language:
  - en
---

# MATRIX-PT

MATRIX-PT is a parameter-efficient LoRA adapter released by **Radical AI** for **Qwen/Qwen2-VL-7B**. It is designed to study post-training adaptations for materials science tasks, with a focus on theoretical reasoning, scientific problem solving, and multimodal reasoning over experimental images.

This model is released alongside the **MATRIX** benchmark ([dataset link](https://huggingface.co/datasets/radical-ai/MATRIX)), which is used to evaluate reasoning across text- and image-based materials science tasks.

---

## Model Details

### Model Description
- **Developed by:** Radical AI  
- **Model type:** LoRA adapter (PEFT) for a multimodal transformer  
- **Base model:** `Qwen/Qwen2-VL-7B`  
- **Language(s):** English  
- **License:** Apache-2.0 (adapter); base model license applies to `Qwen/Qwen2-VL-7B`  
- **Finetuned from model:** `Qwen/Qwen2-VL-7B`

MATRIX-PT modifies the base model through lightweight post-training to better surface domain-relevant reasoning patterns in materials science. The adapter primarily affects inference-time behavior, improving the model's ability to reason about structured scientific concepts and experimental imagery without altering the underlying base weights.

### Model Sources
- **Repository:** https://huggingface.co/radical-ai/MATRIX-PT  
- **Paper:** *[MATRIX: A Multimodal Benchmark and Post-Training Framework for
Materials Science](https://www.arxiv.org/pdf/2602.00376)*  
- **Benchmark:** https://huggingface.co/datasets/radical-ai/MATRIX  

---

## Uses

### Direct Use
MATRIX-PT is intended for:
- Evaluating multimodal reasoning in materials science
- Studying post-training effects on scientific reasoning behavior
- Benchmarking model performance on theory-driven and experiment-driven tasks using MATRIX

The adapter can be loaded on top of `Qwen/Qwen2-VL-7B` using PEFT without modifying the base model weights.

### Downstream Use
The adapter may be used as a starting point for:
- Further domain-specific fine-tuning
- Diagnostic studies of reasoning behavior in scientific models
- Comparative evaluation against other multimodal or domain-adapted models

### Out-of-Scope Use
MATRIX-PT is **not** intended for:
- General-purpose conversational use
- High-stakes decision making (e.g., medical, legal, industrial control)
- Deployment without human oversight in safety-critical settings

---

## Bias, Risks, and Limitations

- MATRIX-PT inherits limitations and biases from the base model, including potential hallucinations and incorrect reasoning.
- The adapter is trained and evaluated on a focused materials science benchmark and may not generalize outside this domain.
- Performance improvements are task- and prompt-dependent and should not be interpreted as broad scientific understanding.
- As with most LLMs/VLMs, the model may produce plausible-sounding but incorrect explanations.

### Recommendations
Users should:
- Treat outputs as assistive rather than authoritative
- Validate results against domain expertise or ground truth
- Use MATRIX-PT primarily for evaluation, analysis, and research purposes

---

## How to Get Started with the Model

### Install

**Tested versions:**
```bash
pip install torch>=2.0.0 torchvision>=0.15.0
pip install transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0
pip install pillow>=10.0.0 qwen-vl-utils>=0.0.8
```

**Or install all at once:**
```bash
pip install torch>=2.0.0 torchvision>=0.15.0 transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0 pillow>=10.0.0 qwen-vl-utils>=0.0.8
```

### Load the Adapter

```python
import torch
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel

DEFAULT_EOS_TOKEN = "</s>"
DEFAULT_BOS_TOKEN = "<s>"
DEFAULT_UNK_TOKEN = "<unk>"

def align_tokenizer_and_model(tokenizer, model):
    """
    Ensure required special tokens exist and resize embeddings to match tokenizer vocab.
    This is necessary because the adapter was trained with this alignment.
    """
    special_tokens = {}
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    if tokenizer.eos_token is None:
        special_tokens["eos_token"] = DEFAULT_EOS_TOKEN
    if tokenizer.bos_token is None:
        special_tokens["bos_token"] = DEFAULT_BOS_TOKEN
    if tokenizer.unk_token is None:
        special_tokens["unk_token"] = DEFAULT_UNK_TOKEN

    num_new_tokens = tokenizer.add_special_tokens(special_tokens)
    if num_new_tokens > 0 or model.get_input_embeddings().weight.shape[0] != len(tokenizer):
        model.resize_token_embeddings(len(tokenizer))
        if num_new_tokens > 0:
            input_embeds = model.get_input_embeddings().weight.data
            output_embeds = model.get_output_embeddings().weight.data

            if tokenizer.unk_token_id is not None:
                input_init = input_embeds[tokenizer.unk_token_id].unsqueeze(0)
                output_init = output_embeds[tokenizer.unk_token_id].unsqueeze(0)
            else:
                input_init = input_embeds[:-num_new_tokens].mean(dim=0, keepdim=True)
                output_init = output_embeds[:-num_new_tokens].mean(dim=0, keepdim=True)

            input_embeds[-num_new_tokens:] = input_init
            output_embeds[-num_new_tokens:] = output_init

# Model IDs
base_model_id = "Qwen/Qwen2-VL-7B"
adapter_id = "radical-ai/MATRIX-PT"

# Load processor from base model
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
tokenizer = processor.tokenizer
tokenizer.padding_side = "left"
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Use Instruct processor for chat template (base model template has issues)
instruct_processor = AutoProcessor.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct",
    trust_remote_code=True
)
processor.chat_template = instruct_processor.chat_template
tokenizer.chat_template = instruct_processor.tokenizer.chat_template

# Load base model
model = Qwen2VLForConditionalGeneration.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

# IMPORTANT: Align tokenizer and model before loading adapter
align_tokenizer_and_model(tokenizer, model)

# Load adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
```

### Run Inference

```python
# Text-only inference
question = "What is a phase diagram?"
messages = [{"role": "user", "content": question}]

rendered = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer([rendered], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

# Decode only the new tokens
input_len = inputs["input_ids"].shape[1]
generated_ids = outputs[:, input_len:]
response = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=True,
)[0].strip()

print(response)
```

### With Images

```python
from PIL import Image

# Load image
image = Image.open("path/to/image.png").convert("RGB")

# Create message with image
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe this experimental image."}
        ]
    }
]

# Process with image
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt")

# Convert pixel_values to bfloat16 if present
if "pixel_values" in inputs:
    inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

input_len = inputs["input_ids"].shape[1]
generated_ids = outputs[:, input_len:]
response = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=True,
)[0].strip()

print(response)
```

## Training Details

### Training Data

The adapter was trained using a curated materials science dataset emphasizing:

- Foundational theory questions
- Research-level reasoning
- Hypothesis generation
- Multimodal reasoning over experimental imagery

For evaluation details, see the [MATRIX dataset](https://huggingface.co/datasets/radical-ai/MATRIX) card and accompanying paper.

### Training Procedure

- Method: LoRA (parameter-efficient fine-tuning)
- LoRA rank (r): 8
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Objective: Improve accessibility of materials science-relevant reasoning patterns during inference
- Training regime: Mixed precision (bf16)

## Evaluation

### Testing Data

MATRIX-PT is benchmarked on the **MATRIX** dataset, which consists of both textual and visual reasoning tasks in materials science. Evaluation compares the adapted model against the base `Qwen/Qwen2-VL-7B` model under identical prompting and decoding settings.

### Metrics
- Task accuracy
- Reasoning consistency across related prompts
- Qualitative error analysis (see accompanying paper)

## Results

Across MATRIX tasks, MATRIX-PT demonstrates improved performance relative to the base model, particularly on:
- Theory-driven reasoning questions
- Structured scientific problem solving
- Interpretation of experimental images

These improvements primarily manifest at inference time, highlighting the role of post-training in shaping reasoning accessibility rather than training-time memorization alone.

## Citation

If you use this model or the MATRIX benchmark, please cite the accompanying paper:

[MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science](https://www.arxiv.org/pdf/2602.00376)

### Bibtex
```
@article{mcgrath2026matrix,
  title   = {MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science},
  author  = {McGrath, Delia and Chong, Curtis and Kulkarni, Rohil and Ceder, Gerbrand and Kolluru, Adeesh},
  journal = {arXiv preprint arXiv:2602.00376},
  year    = {2026}
}
```

### Framework Versions

- PEFT: 0.18.0
- Transformers: 4.56.0+
- PyTorch: 2.0.0+
- Python: 3.10+