Update README.md

2eee544 verified 5 months ago

4.34 kB

	# 📐 Math2Visual: Visual Language Generation Model

	This is the official model for generating structured visual language representations from math word problems, as proposed in:

	📄 [ACL 2025 Findings Paper — Math2Visual](https://arxiv.org/abs/2506.03735)

	🎥 [Project Video](https://youtu.be/jdPYVoHEPtk)

	📘 [Annotated Visual Language and Visual Dataset](https://huggingface.co/datasets/junling24/Math2Visual-Generating_Pedagogically_Meaningful_Visuals_for_Math_Word_Problems)

	💻 [GitHub Codebase](https://github.com/eth-lre/math2visual/tree/main)

	---

	## ✨ Model Summary

	This model takes a math word problem (MWP) and its equation (formula) as input and outputs a visual language string which is used for generating pedagogically meaningful visuals. The output follows a fixed structure based on teacher-informed design to describe key mathematical relationships between entities, containers, and operations.

	It is built by fine-tuning `meta-llama/Llama-3.1-8B` with LoRA using [PEFT](https://github.com/huggingface/peft), optimized with 4-bit quantization (BitsAndBytes). The code for generating visuals using visual language can be found in our [github repository](https://github.com/eth-lre/math2visual/tree/main)


	---

	## 🧠 Example Use

	### 🔧 Install dependencies

	```bash
	pip install torch==2.5.1+cu121 torchvision==0.20.1+cu121 torchaudio==2.5.1+cu121 \
	bitsandbytes==0.45.0 inflect==7.3.1 lxml==5.3.0 ipython==8.25.0 python-dotenv==1.0.1 \
	git+https://github.com/huggingface/transformers.git@5fa35344755d8d9c29610b57d175efd03776ae9e \
	git+https://github.com/huggingface/peft.git@aa3f41f7529ed078e9225b2fc1edbb8c71f58f99

	💡 Use -f https://download.pytorch.org/whl/torch_stable.html for CUDA wheels if needed.

	⸻

	🚀 Run Inference

	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	from peft import PeftModel

	# Load model
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	base_model_id = "meta-llama/Llama-3.1-8B"
	adapter_dir = "junling24/Math2Visual-Visual_Language_Generation"

	base = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	quantization_config=bnb_config,
	device_map="auto",
	trust_remote_code=True
	)
	model = PeftModel.from_pretrained(base, adapter_dir)
	model.eval()
	model.config.use_cache = True

	tokenizer = AutoTokenizer.from_pretrained(
	base_model_id,
	padding_side="left",
	add_eos_token=True,
	add_bos_token=True,
	trust_remote_code=True
	)
	tokenizer.pad_token = tokenizer.eos_token
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)

	# Prompt
	def create_prompt(mwp, formula=None):
	return (
	'''You are an expert at converting math story problem into a structured 'visual language'...'''
	f"Question: {mwp}\n"
	f"Formula: {formula}\n"
	"Answer in visual language:"
	)

	mwp = "Janet has nine oranges, and Sharon has seven oranges. How many oranges do Janet and Sharon have together?"
	formula = "9 + 7 = 16"
	prompt = create_prompt(mwp, formula)

	inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048, padding="max_length").to(device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=2048,
	do_sample=True,
	temperature=0.7,
	repetition_penalty=1.15
	)

	visual_language = tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
	print("Generated Visual Language:\n", visual_language)


	⸻
	📄 Citation

	@inproceedings{wang2025math2visual,
	title={Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models},
	author={Wang, Junling and Rutkiewicz, Anna and Wang, April Yi and Sachan, Mrinmaya},
	booktitle={Findings of the Association for Computational Linguistics: ACL 2025},
	year={2025},
	url={https://arxiv.org/abs/2506.03735}
	}


	⸻

	📬 License & Contact

	This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).

	For research inquiries, please contact:
	📧 Junling Wang — wangjun [at] ethz [dot] ch