--- base_model: Qwen/Qwen2-VL-7B library_name: peft pipeline_tag: image-text-to-text tags: - base_model:adapter:Qwen/Qwen2-VL-7B - lora - qwen2_vl - multimodal - transformers license: apache-2.0 language: - en --- # MATRIX-PT MATRIX-PT is a parameter-efficient LoRA adapter released by **Radical AI** for **Qwen/Qwen2-VL-7B**. It is designed to study post-training adaptations for materials science tasks, with a focus on theoretical reasoning, scientific problem solving, and multimodal reasoning over experimental images. This model is released alongside the **MATRIX** benchmark ([dataset link](https://huggingface.co/datasets/radical-ai/MATRIX)), which is used to evaluate reasoning across text- and image-based materials science tasks. --- ## Model Details ### Model Description - **Developed by:** Radical AI - **Model type:** LoRA adapter (PEFT) for a multimodal transformer - **Base model:** `Qwen/Qwen2-VL-7B` - **Language(s):** English - **License:** Apache-2.0 (adapter); base model license applies to `Qwen/Qwen2-VL-7B` - **Finetuned from model:** `Qwen/Qwen2-VL-7B` MATRIX-PT modifies the base model through lightweight post-training to better surface domain-relevant reasoning patterns in materials science. The adapter primarily affects inference-time behavior, improving the model's ability to reason about structured scientific concepts and experimental imagery without altering the underlying base weights. ### Model Sources - **Repository:** https://huggingface.co/radical-ai/MATRIX-PT - **Paper:** *[MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science](https://www.arxiv.org/pdf/2602.00376)* - **Benchmark:** https://huggingface.co/datasets/radical-ai/MATRIX --- ## Uses ### Direct Use MATRIX-PT is intended for: - Evaluating multimodal reasoning in materials science - Studying post-training effects on scientific reasoning behavior - Benchmarking model performance on theory-driven and experiment-driven tasks using MATRIX The adapter can be loaded on top of `Qwen/Qwen2-VL-7B` using PEFT without modifying the base model weights. ### Downstream Use The adapter may be used as a starting point for: - Further domain-specific fine-tuning - Diagnostic studies of reasoning behavior in scientific models - Comparative evaluation against other multimodal or domain-adapted models ### Out-of-Scope Use MATRIX-PT is **not** intended for: - General-purpose conversational use - High-stakes decision making (e.g., medical, legal, industrial control) - Deployment without human oversight in safety-critical settings --- ## Bias, Risks, and Limitations - MATRIX-PT inherits limitations and biases from the base model, including potential hallucinations and incorrect reasoning. - The adapter is trained and evaluated on a focused materials science benchmark and may not generalize outside this domain. - Performance improvements are task- and prompt-dependent and should not be interpreted as broad scientific understanding. - As with most LLMs/VLMs, the model may produce plausible-sounding but incorrect explanations. ### Recommendations Users should: - Treat outputs as assistive rather than authoritative - Validate results against domain expertise or ground truth - Use MATRIX-PT primarily for evaluation, analysis, and research purposes --- ## How to Get Started with the Model ### Install **Tested versions:** ```bash pip install torch>=2.0.0 torchvision>=0.15.0 pip install transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0 pip install pillow>=10.0.0 qwen-vl-utils>=0.0.8 ``` **Or install all at once:** ```bash pip install torch>=2.0.0 torchvision>=0.15.0 transformers>=4.56.0 peft>=0.17.0 accelerate>=1.10.0 pillow>=10.0.0 qwen-vl-utils>=0.0.8 ``` ### Load the Adapter ```python import torch from transformers import AutoProcessor, Qwen2VLForConditionalGeneration from peft import PeftModel DEFAULT_EOS_TOKEN = "" DEFAULT_BOS_TOKEN = "" DEFAULT_UNK_TOKEN = "" def align_tokenizer_and_model(tokenizer, model): """ Ensure required special tokens exist and resize embeddings to match tokenizer vocab. This is necessary because the adapter was trained with this alignment. """ special_tokens = {} if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token if tokenizer.eos_token is None: special_tokens["eos_token"] = DEFAULT_EOS_TOKEN if tokenizer.bos_token is None: special_tokens["bos_token"] = DEFAULT_BOS_TOKEN if tokenizer.unk_token is None: special_tokens["unk_token"] = DEFAULT_UNK_TOKEN num_new_tokens = tokenizer.add_special_tokens(special_tokens) if num_new_tokens > 0 or model.get_input_embeddings().weight.shape[0] != len(tokenizer): model.resize_token_embeddings(len(tokenizer)) if num_new_tokens > 0: input_embeds = model.get_input_embeddings().weight.data output_embeds = model.get_output_embeddings().weight.data if tokenizer.unk_token_id is not None: input_init = input_embeds[tokenizer.unk_token_id].unsqueeze(0) output_init = output_embeds[tokenizer.unk_token_id].unsqueeze(0) else: input_init = input_embeds[:-num_new_tokens].mean(dim=0, keepdim=True) output_init = output_embeds[:-num_new_tokens].mean(dim=0, keepdim=True) input_embeds[-num_new_tokens:] = input_init output_embeds[-num_new_tokens:] = output_init # Model IDs base_model_id = "Qwen/Qwen2-VL-7B" adapter_id = "radical-ai/MATRIX-PT" # Load processor from base model processor = AutoProcessor.from_pretrained(base_model_id, trust_remote_code=True) tokenizer = processor.tokenizer tokenizer.padding_side = "left" if tokenizer.pad_token_id is None: tokenizer.pad_token_id = tokenizer.eos_token_id # Use Instruct processor for chat template (base model template has issues) instruct_processor = AutoProcessor.from_pretrained( "Qwen/Qwen2-VL-7B-Instruct", trust_remote_code=True ) processor.chat_template = instruct_processor.chat_template tokenizer.chat_template = instruct_processor.tokenizer.chat_template # Load base model model = Qwen2VLForConditionalGeneration.from_pretrained( base_model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, ) # IMPORTANT: Align tokenizer and model before loading adapter align_tokenizer_and_model(tokenizer, model) # Load adapter model = PeftModel.from_pretrained(model, adapter_id) model.eval() ``` ### Run Inference ```python # Text-only inference question = "What is a phase diagram?" messages = [{"role": "user", "content": question}] rendered = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) inputs = tokenizer([rendered], return_tensors="pt") inputs = {k: v.to(model.device) for k, v in inputs.items()} with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, do_sample=False, pad_token_id=tokenizer.pad_token_id ) # Decode only the new tokens input_len = inputs["input_ids"].shape[1] generated_ids = outputs[:, input_len:] response = processor.batch_decode( generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True, )[0].strip() print(response) ``` ### With Images ```python from PIL import Image # Load image image = Image.open("path/to/image.png").convert("RGB") # Create message with image messages = [ { "role": "user", "content": [ {"type": "image"}, {"type": "text", "text": "Describe this experimental image."} ] } ] # Process with image prompt = processor.apply_chat_template(messages, add_generation_prompt=True) inputs = processor(text=prompt, images=[image], return_tensors="pt") # Convert pixel_values to bfloat16 if present if "pixel_values" in inputs: inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16) inputs = {k: v.to(model.device) for k, v in inputs.items()} with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, do_sample=False, ) input_len = inputs["input_ids"].shape[1] generated_ids = outputs[:, input_len:] response = processor.batch_decode( generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True, )[0].strip() print(response) ``` ## Training Details ### Training Data The adapter was trained using a curated materials science dataset emphasizing: - Foundational theory questions - Research-level reasoning - Hypothesis generation - Multimodal reasoning over experimental imagery For evaluation details, see the [MATRIX dataset](https://huggingface.co/datasets/radical-ai/MATRIX) card and accompanying paper. ### Training Procedure - Method: LoRA (parameter-efficient fine-tuning) - LoRA rank (r): 8 - LoRA alpha: 32 - LoRA dropout: 0.05 - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - Objective: Improve accessibility of materials science-relevant reasoning patterns during inference - Training regime: Mixed precision (bf16) ## Evaluation ### Testing Data MATRIX-PT is benchmarked on the **MATRIX** dataset, which consists of both textual and visual reasoning tasks in materials science. Evaluation compares the adapted model against the base `Qwen/Qwen2-VL-7B` model under identical prompting and decoding settings. ### Metrics - Task accuracy - Reasoning consistency across related prompts - Qualitative error analysis (see accompanying paper) ## Results Across MATRIX tasks, MATRIX-PT demonstrates improved performance relative to the base model, particularly on: - Theory-driven reasoning questions - Structured scientific problem solving - Interpretation of experimental images These improvements primarily manifest at inference time, highlighting the role of post-training in shaping reasoning accessibility rather than training-time memorization alone. ## Citation If you use this model or the MATRIX benchmark, please cite the accompanying paper: [MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science](https://www.arxiv.org/pdf/2602.00376) ### Bibtex ``` @article{mcgrath2026matrix, title = {MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science}, author = {McGrath, Delia and Chong, Curtis and Kulkarni, Rohil and Ceder, Gerbrand and Kolluru, Adeesh}, journal = {arXiv preprint arXiv:2602.00376}, year = {2026} } ``` ### Framework Versions - PEFT: 0.18.0 - Transformers: 4.56.0+ - PyTorch: 2.0.0+ - Python: 3.10+