khazarai
/

Math-VL-8B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

Math-VL-8B / README.md

Rustamshry's picture

Update README.md

ee50c08 verified 7 days ago

|

history blame contribute delete

3.07 kB

	---
	base_model: unsloth/qwen3-vl-8b-instruct-unsloth-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen3_vl
	license: apache-2.0
	language:
	- en
	- tr
	datasets:
	- ituperceptron/turkish-math-vqa
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	## Model Description

	- Base Architecture: Qwen3-VL-8B-Instruct
	- Fine-Tuning Method: QLoRA (PEFT)
	- Language: Turkish
	- Domain: High School Mathematics (12th Grade)
	- Modality: Vision-Language (Image + Text → Text)

	This model is a QLoRA fine-tuned version of Qwen3-VL-8B-Instruct trained on the Turkish-Math-VQA dataset, which consists of 12th-grade mathematics problems published by the Turkish Ministry of National Education (MEB).
	The model is designed to:
	- Understand mathematical problem images
	- Generate step-by-step solutions in Turkish
	- Handle topics such as logarithms, sequences & series, trigonometry, derivatives, and integrals


	## Intended Use

	Primary Use Cases
	- Turkish mathematical Visual Question Answering (VQA)
	- Educational AI assistants
	- Step-by-step solution generation
	- Math tutoring systems
	- Research in Turkish multimodal reasoning


	## Out-of-Scope Use
	- Professional exam grading without human validation
	- Safety-critical mathematical applications
	- Guaranteed mathematically verified reasoning


	## Training Data

	Dataset: Turkish-Math-VQA
	The dataset contains mathematics problems from official 12th-grade exams prepared by the Turkish Ministry of National Education.

	Dataset Fields:
	- `test_number`: The test identifier
	- `question_number`: Question number within the test
	- `image`: The image containing the math problem
	- `solution`: Turkish solution generated synthetically using GPT-o1


	Important Note on Labels:

	The solution field was generated synthetically by GPT-o1 and has not been manually verified for correctness. While GPT-o1 is generally strong at solving problems at this level, the dataset may contain:
	- Incorrect reasoning steps
	- Logical inconsistencies
	- Arithmetic mistakes

	Therefore, the fine-tuned model may inherit these imperfections.

	## How to Get Started with the Model

	```python
	from transformers import AutoProcessor, AutoModelForImageTextToText

	processor = AutoProcessor.from_pretrained("khazarai/Math-VL-8B")
	model = AutoModelForImageTextToText.from_pretrained("khazarai/Math-VL-8B")
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
	{"type": "text", "text": "Resimde verilen matematik problemini çözün."}
	]
	},
	]
	inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
	).to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=1024)
	print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
	```

	## Citation

	If you use this model in academic work, please cite:
	- The original Qwen model
	- Turkish-Math-VQA dataset