MATRIX-PT / README.md

Update README.md

6ff10b1 verified 10 days ago

10.6 kB

	---
	base_model: Qwen/Qwen2-VL-7B
	library_name: peft
	pipeline_tag: image-text-to-text
	tags:
	- base_model:adapter:Qwen/Qwen2-VL-7B
	- lora
	- qwen2_vl
	- multimodal
	- transformers
	license: apache-2.0
	language:
	- en
	---

	# MATRIX-PT

	MATRIX-PT is a parameter-efficient LoRA adapter released by Radical AI for Qwen/Qwen2-VL-7B. It is designed to study post-training adaptations for materials science tasks, with a focus on theoretical reasoning, scientific problem solving, and multimodal reasoning over experimental images.

	This model is released alongside the MATRIX benchmark ([dataset link](https://huggingface.co/datasets/radical-ai/MATRIX)), which is used to evaluate reasoning across text- and image-based materials science tasks.

	---

	## Model Details

	### Model Description
	- Developed by: Radical AI
	- Model type: LoRA adapter (PEFT) for a multimodal transformer
	- Base model: `Qwen/Qwen2-VL-7B`
	- Language(s): English
	- License: Apache-2.0 (adapter); base model license applies to `Qwen/Qwen2-VL-7B`
	- Finetuned from model: `Qwen/Qwen2-VL-7B`

	MATRIX-PT modifies the base model through lightweight post-training to better surface domain-relevant reasoning patterns in materials science. The adapter primarily affects inference-time behavior, improving the model's ability to reason about structured scientific concepts and experimental imagery without altering the underlying base weights.

	### Model Sources
	- Repository: https://huggingface.co/radical-ai/MATRIX-PT
	- Paper: *[MATRIX: A Multimodal Benchmark and Post-Training Framework for
	Materials Science](https://www.arxiv.org/pdf/2602.00376)*
	- Benchmark: https://huggingface.co/datasets/radical-ai/MATRIX

	---

	## Uses

	### Direct Use
	MATRIX-PT is intended for:
	- Evaluating multimodal reasoning in materials science
	- Studying post-training effects on scientific reasoning behavior
	- Benchmarking model performance on theory-driven and experiment-driven tasks using MATRIX

	The adapter can be loaded on top of `Qwen/Qwen2-VL-7B` using PEFT without modifying the base model weights.

	### Downstream Use
	The adapter may be used as a starting point for:
	- Further domain-specific fine-tuning
	- Diagnostic studies of reasoning behavior in scientific models
	- Comparative evaluation against other multimodal or domain-adapted models

	### Out-of-Scope Use
	MATRIX-PT is not intended for:
	- General-purpose conversational use
	- High-stakes decision making (e.g., medical, legal, industrial control)
	- Deployment without human oversight in safety-critical settings

	---

	## Bias, Risks, and Limitations

	- MATRIX-PT inherits limitations and biases from the base model, including potential hallucinations and incorrect reasoning.
	- The adapter is trained and evaluated on a focused materials science benchmark and may not generalize outside this domain.
	- Performance improvements are task- and prompt-dependent and should not be interpreted as broad scientific understanding.
	- As with most LLMs/VLMs, the model may produce plausible-sounding but incorrect explanations.

	### Recommendations
	Users should:
	- Treat outputs as assistive rather than authoritative
	- Validate results against domain expertise or ground truth
	- Use MATRIX-PT primarily for evaluation, analysis, and research purposes

	---

	## How to Get Started with the Model

	### Install

	Tested versions:
	```bash
	pip install torch>=2.0.0 torchvision>=0.15.0
	pip install transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0
	pip install pillow>=10.0.0 qwen-vl-utils>=0.0.8
	```

	Or install all at once:
	```bash
	pip install torch>=2.0.0 torchvision>=0.15.0 transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0 pillow>=10.0.0 qwen-vl-utils>=0.0.8
	```

	### Load the Adapter

	```python
	import torch
	from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
	from peft import PeftModel

	DEFAULT_EOS_TOKEN = "</s>"
	DEFAULT_BOS_TOKEN = "<s>"
	DEFAULT_UNK_TOKEN = "<unk>"

	def align_tokenizer_and_model(tokenizer, model):
	"""
	Ensure required special tokens exist and resize embeddings to match tokenizer vocab.
	This is necessary because the adapter was trained with this alignment.
	"""
	special_tokens = {}
	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token
	if tokenizer.eos_token is None:
	special_tokens["eos_token"] = DEFAULT_EOS_TOKEN
	if tokenizer.bos_token is None:
	special_tokens["bos_token"] = DEFAULT_BOS_TOKEN
	if tokenizer.unk_token is None:
	special_tokens["unk_token"] = DEFAULT_UNK_TOKEN

	num_new_tokens = tokenizer.add_special_tokens(special_tokens)
	if num_new_tokens > 0 or model.get_input_embeddings().weight.shape[0] != len(tokenizer):
	model.resize_token_embeddings(len(tokenizer))
	if num_new_tokens > 0:
	input_embeds = model.get_input_embeddings().weight.data
	output_embeds = model.get_output_embeddings().weight.data

	if tokenizer.unk_token_id is not None:
	input_init = input_embeds[tokenizer.unk_token_id].unsqueeze(0)
	output_init = output_embeds[tokenizer.unk_token_id].unsqueeze(0)
	else:
	input_init = input_embeds[:-num_new_tokens].mean(dim=0, keepdim=True)
	output_init = output_embeds[:-num_new_tokens].mean(dim=0, keepdim=True)

	input_embeds[-num_new_tokens:] = input_init
	output_embeds[-num_new_tokens:] = output_init

	# Model IDs
	base_model_id = "Qwen/Qwen2-VL-7B"
	adapter_id = "radical-ai/MATRIX-PT"

	# Load processor from base model
	processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True)
	tokenizer = processor.tokenizer
	tokenizer.padding_side = "left"
	if tokenizer.pad_token_id is None:
	tokenizer.pad_token_id = tokenizer.eos_token_id

	# Use Instruct processor for chat template (base model template has issues)
	instruct_processor = AutoProcessor.from_pretrained(
	"Qwen/Qwen2-VL-7B-Instruct",
	trust_remote_code=True
	)
	processor.chat_template = instruct_processor.chat_template
	tokenizer.chat_template = instruct_processor.tokenizer.chat_template

	# Load base model
	model = Qwen2VLForConditionalGeneration.from_pretrained(
	base_model_id,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	)

	# IMPORTANT: Align tokenizer and model before loading adapter
	align_tokenizer_and_model(tokenizer, model)

	# Load adapter
	model = PeftModel.from_pretrained(model, adapter_id)
	model.eval()
	```

	### Run Inference

	```python
	# Text-only inference
	question = "What is a phase diagram?"
	messages = [{"role": "user", "content": question}]

	rendered = processor.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	inputs = tokenizer([rendered], return_tensors="pt")
	inputs = {k: v.to(model.device) for k, v in inputs.items()}

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	do_sample=False,
	pad_token_id=tokenizer.pad_token_id
	)

	# Decode only the new tokens
	input_len = inputs["input_ids"].shape[1]
	generated_ids = outputs[:, input_len:]
	response = processor.batch_decode(
	generated_ids,
	skip_special_tokens=True,
	clean_up_tokenization_spaces=True,
	)[0].strip()

	print(response)
	```

	### With Images

	```python
	from PIL import Image

	# Load image
	image = Image.open("path/to/image.png").convert("RGB")

	# Create message with image
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image"},
	{"type": "text", "text": "Describe this experimental image."}
	]
	}
	]

	# Process with image
	prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
	inputs = processor(text=prompt, images=[image], return_tensors="pt")

	# Convert pixel_values to bfloat16 if present
	if "pixel_values" in inputs:
	inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

	inputs = {k: v.to(model.device) for k, v in inputs.items()}

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	do_sample=False,
	)

	input_len = inputs["input_ids"].shape[1]
	generated_ids = outputs[:, input_len:]
	response = processor.batch_decode(
	generated_ids,
	skip_special_tokens=True,
	clean_up_tokenization_spaces=True,
	)[0].strip()

	print(response)
	```

	## Training Details

	### Training Data

	The adapter was trained using a curated materials science dataset emphasizing:

	- Foundational theory questions
	- Research-level reasoning
	- Hypothesis generation
	- Multimodal reasoning over experimental imagery

	For evaluation details, see the [MATRIX dataset](https://huggingface.co/datasets/radical-ai/MATRIX) card and accompanying paper.

	### Training Procedure

	- Method: LoRA (parameter-efficient fine-tuning)
	- LoRA rank (r): 8
	- LoRA alpha: 32
	- LoRA dropout: 0.05
	- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Objective: Improve accessibility of materials science-relevant reasoning patterns during inference
	- Training regime: Mixed precision (bf16)

	## Evaluation

	### Testing Data

	MATRIX-PT is benchmarked on the MATRIX dataset, which consists of both textual and visual reasoning tasks in materials science. Evaluation compares the adapted model against the base `Qwen/Qwen2-VL-7B` model under identical prompting and decoding settings.

	### Metrics
	- Task accuracy
	- Reasoning consistency across related prompts
	- Qualitative error analysis (see accompanying paper)

	## Results

	Across MATRIX tasks, MATRIX-PT demonstrates improved performance relative to the base model, particularly on:
	- Theory-driven reasoning questions
	- Structured scientific problem solving
	- Interpretation of experimental images

	These improvements primarily manifest at inference time, highlighting the role of post-training in shaping reasoning accessibility rather than training-time memorization alone.

	## Citation

	If you use this model or the MATRIX benchmark, please cite the accompanying paper:

	[MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science](https://www.arxiv.org/pdf/2602.00376)

	### Bibtex
	```
	@article{mcgrath2026matrix,
	title = {MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science},
	author = {McGrath, Delia and Chong, Curtis and Kulkarni, Rohil and Ceder, Gerbrand and Kolluru, Adeesh},
	journal = {arXiv preprint arXiv:2602.00376},
	year = {2026}
	}
	```

	### Framework Versions

	- PEFT: 0.18.0
	- Transformers: 4.56.0+
	- PyTorch: 2.0.0+
	- Python: 3.10+