MATRIX-PT / README.md
dmcgrath19's picture
Update README.md
6ff10b1 verified
---
base_model: Qwen/Qwen2-VL-7B
library_name: peft
pipeline_tag: image-text-to-text
tags:
- base_model:adapter:Qwen/Qwen2-VL-7B
- lora
- qwen2_vl
- multimodal
- transformers
license: apache-2.0
language:
- en
---
# MATRIX-PT
MATRIX-PT is a parameter-efficient LoRA adapter released by **Radical AI** for **Qwen/Qwen2-VL-7B**. It is designed to study post-training adaptations for materials science tasks, with a focus on theoretical reasoning, scientific problem solving, and multimodal reasoning over experimental images.
This model is released alongside the **MATRIX** benchmark ([dataset link](https://huggingface.co/datasets/radical-ai/MATRIX)), which is used to evaluate reasoning across text- and image-based materials science tasks.
---
## Model Details
### Model Description
- **Developed by:** Radical AI
- **Model type:** LoRA adapter (PEFT) for a multimodal transformer
- **Base model:** `Qwen/Qwen2-VL-7B`
- **Language(s):** English
- **License:** Apache-2.0 (adapter); base model license applies to `Qwen/Qwen2-VL-7B`
- **Finetuned from model:** `Qwen/Qwen2-VL-7B`
MATRIX-PT modifies the base model through lightweight post-training to better surface domain-relevant reasoning patterns in materials science. The adapter primarily affects inference-time behavior, improving the model's ability to reason about structured scientific concepts and experimental imagery without altering the underlying base weights.
### Model Sources
- **Repository:** https://huggingface.co/radical-ai/MATRIX-PT
- **Paper:** *[MATRIX: A Multimodal Benchmark and Post-Training Framework for
Materials Science](https://www.arxiv.org/pdf/2602.00376)*
- **Benchmark:** https://huggingface.co/datasets/radical-ai/MATRIX
---
## Uses
### Direct Use
MATRIX-PT is intended for:
- Evaluating multimodal reasoning in materials science
- Studying post-training effects on scientific reasoning behavior
- Benchmarking model performance on theory-driven and experiment-driven tasks using MATRIX
The adapter can be loaded on top of `Qwen/Qwen2-VL-7B` using PEFT without modifying the base model weights.
### Downstream Use
The adapter may be used as a starting point for:
- Further domain-specific fine-tuning
- Diagnostic studies of reasoning behavior in scientific models
- Comparative evaluation against other multimodal or domain-adapted models
### Out-of-Scope Use
MATRIX-PT is **not** intended for:
- General-purpose conversational use
- High-stakes decision making (e.g., medical, legal, industrial control)
- Deployment without human oversight in safety-critical settings
---
## Bias, Risks, and Limitations
- MATRIX-PT inherits limitations and biases from the base model, including potential hallucinations and incorrect reasoning.
- The adapter is trained and evaluated on a focused materials science benchmark and may not generalize outside this domain.
- Performance improvements are task- and prompt-dependent and should not be interpreted as broad scientific understanding.
- As with most LLMs/VLMs, the model may produce plausible-sounding but incorrect explanations.
### Recommendations
Users should:
- Treat outputs as assistive rather than authoritative
- Validate results against domain expertise or ground truth
- Use MATRIX-PT primarily for evaluation, analysis, and research purposes
---
## How to Get Started with the Model
### Install
**Tested versions:**
```bash
pip install torch>=2.0.0 torchvision>=0.15.0
pip install transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0
pip install pillow>=10.0.0 qwen-vl-utils>=0.0.8
```
**Or install all at once:**
```bash
pip install torch>=2.0.0 torchvision>=0.15.0 transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0 pillow>=10.0.0 qwen-vl-utils>=0.0.8
```
### Load the Adapter
```python
import torch
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from peft import PeftModel
DEFAULT_EOS_TOKEN = "</s>"
DEFAULT_BOS_TOKEN = "<s>"
DEFAULT_UNK_TOKEN = "<unk>"
def align_tokenizer_and_model(tokenizer, model):
"""
Ensure required special tokens exist and resize embeddings to match tokenizer vocab.
This is necessary because the adapter was trained with this alignment.
"""
special_tokens = {}
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
if tokenizer.eos_token is None:
special_tokens["eos_token"] = DEFAULT_EOS_TOKEN
if tokenizer.bos_token is None:
special_tokens["bos_token"] = DEFAULT_BOS_TOKEN
if tokenizer.unk_token is None:
special_tokens["unk_token"] = DEFAULT_UNK_TOKEN
num_new_tokens = tokenizer.add_special_tokens(special_tokens)
if num_new_tokens > 0 or model.get_input_embeddings().weight.shape[0] != len(tokenizer):
model.resize_token_embeddings(len(tokenizer))
if num_new_tokens > 0:
input_embeds = model.get_input_embeddings().weight.data
output_embeds = model.get_output_embeddings().weight.data
if tokenizer.unk_token_id is not None:
input_init = input_embeds[tokenizer.unk_token_id].unsqueeze(0)
output_init = output_embeds[tokenizer.unk_token_id].unsqueeze(0)
else:
input_init = input_embeds[:-num_new_tokens].mean(dim=0, keepdim=True)
output_init = output_embeds[:-num_new_tokens].mean(dim=0, keepdim=True)
input_embeds[-num_new_tokens:] = input_init
output_embeds[-num_new_tokens:] = output_init
# Model IDs
base_model_id = "Qwen/Qwen2-VL-7B"
adapter_id = "radical-ai/MATRIX-PT"
# Load processor from base model
processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
tokenizer = processor.tokenizer
tokenizer.padding_side = "left"
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
# Use Instruct processor for chat template (base model template has issues)
instruct_processor = AutoProcessor.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct",
trust_remote_code=True
)
processor.chat_template = instruct_processor.chat_template
tokenizer.chat_template = instruct_processor.tokenizer.chat_template
# Load base model
model = Qwen2VLForConditionalGeneration.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
# IMPORTANT: Align tokenizer and model before loading adapter
align_tokenizer_and_model(tokenizer, model)
# Load adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
```
### Run Inference
```python
# Text-only inference
question = "What is a phase diagram?"
messages = [{"role": "user", "content": question}]
rendered = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer([rendered], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
pad_token_id=tokenizer.pad_token_id
)
# Decode only the new tokens
input_len = inputs["input_ids"].shape[1]
generated_ids = outputs[:, input_len:]
response = processor.batch_decode(
generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=True,
)[0].strip()
print(response)
```
### With Images
```python
from PIL import Image
# Load image
image = Image.open("path/to/image.png").convert("RGB")
# Create message with image
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Describe this experimental image."}
]
}
]
# Process with image
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt")
# Convert pixel_values to bfloat16 if present
if "pixel_values" in inputs:
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
)
input_len = inputs["input_ids"].shape[1]
generated_ids = outputs[:, input_len:]
response = processor.batch_decode(
generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=True,
)[0].strip()
print(response)
```
## Training Details
### Training Data
The adapter was trained using a curated materials science dataset emphasizing:
- Foundational theory questions
- Research-level reasoning
- Hypothesis generation
- Multimodal reasoning over experimental imagery
For evaluation details, see the [MATRIX dataset](https://huggingface.co/datasets/radical-ai/MATRIX) card and accompanying paper.
### Training Procedure
- Method: LoRA (parameter-efficient fine-tuning)
- LoRA rank (r): 8
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Objective: Improve accessibility of materials science-relevant reasoning patterns during inference
- Training regime: Mixed precision (bf16)
## Evaluation
### Testing Data
MATRIX-PT is benchmarked on the **MATRIX** dataset, which consists of both textual and visual reasoning tasks in materials science. Evaluation compares the adapted model against the base `Qwen/Qwen2-VL-7B` model under identical prompting and decoding settings.
### Metrics
- Task accuracy
- Reasoning consistency across related prompts
- Qualitative error analysis (see accompanying paper)
## Results
Across MATRIX tasks, MATRIX-PT demonstrates improved performance relative to the base model, particularly on:
- Theory-driven reasoning questions
- Structured scientific problem solving
- Interpretation of experimental images
These improvements primarily manifest at inference time, highlighting the role of post-training in shaping reasoning accessibility rather than training-time memorization alone.
## Citation
If you use this model or the MATRIX benchmark, please cite the accompanying paper:
[MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science](https://www.arxiv.org/pdf/2602.00376)
### Bibtex
```
@article{mcgrath2026matrix,
title = {MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science},
author = {McGrath, Delia and Chong, Curtis and Kulkarni, Rohil and Ceder, Gerbrand and Kolluru, Adeesh},
journal = {arXiv preprint arXiv:2602.00376},
year = {2026}
}
```
### Framework Versions
- PEFT: 0.18.0
- Transformers: 4.56.0+
- PyTorch: 2.0.0+
- Python: 3.10+