| | --- |
| | base_model: unsloth/qwen3-vl-8b-instruct-unsloth-bnb-4bit |
| | tags: |
| | - text-generation-inference |
| | - transformers |
| | - unsloth |
| | - qwen3_vl |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - tr |
| | datasets: |
| | - ituperceptron/turkish-math-vqa |
| | pipeline_tag: image-text-to-text |
| | library_name: transformers |
| | --- |
| | |
| | ## Model Description |
| |
|
| | - Base Architecture: Qwen3-VL-8B-Instruct |
| | - Fine-Tuning Method: QLoRA (PEFT) |
| | - Language: Turkish |
| | - Domain: High School Mathematics (12th Grade) |
| | - Modality: Vision-Language (Image + Text → Text) |
| |
|
| | This model is a QLoRA fine-tuned version of Qwen3-VL-8B-Instruct trained on the Turkish-Math-VQA dataset, which consists of 12th-grade mathematics problems published by the Turkish Ministry of National Education (MEB). |
| | The model is designed to: |
| | - Understand mathematical problem images |
| | - Generate step-by-step solutions in Turkish |
| | - Handle topics such as logarithms, sequences & series, trigonometry, derivatives, and integrals |
| |
|
| |
|
| | ## Intended Use |
| |
|
| | Primary Use Cases |
| | - Turkish mathematical Visual Question Answering (VQA) |
| | - Educational AI assistants |
| | - Step-by-step solution generation |
| | - Math tutoring systems |
| | - Research in Turkish multimodal reasoning |
| |
|
| |
|
| | ## Out-of-Scope Use |
| | - Professional exam grading without human validation |
| | - Safety-critical mathematical applications |
| | - Guaranteed mathematically verified reasoning |
| |
|
| |
|
| | ## Training Data |
| |
|
| | **Dataset**: Turkish-Math-VQA |
| | The dataset contains mathematics problems from official 12th-grade exams prepared by the Turkish Ministry of National Education. |
| |
|
| | **Dataset Fields**: |
| | - `test_number`: The test identifier |
| | - `question_number`: Question number within the test |
| | - `image`: The image containing the math problem |
| | - `solution`: Turkish solution generated synthetically using GPT-o1 |
| |
|
| |
|
| | **Important Note on Labels**: |
| |
|
| | The solution field was generated synthetically by GPT-o1 and has not been manually verified for correctness. While GPT-o1 is generally strong at solving problems at this level, the dataset may contain: |
| | - Incorrect reasoning steps |
| | - Logical inconsistencies |
| | - Arithmetic mistakes |
| |
|
| | Therefore, the fine-tuned model may inherit these imperfections. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | ```python |
| | from transformers import AutoProcessor, AutoModelForImageTextToText |
| | |
| | processor = AutoProcessor.from_pretrained("khazarai/Math-VL-8B") |
| | model = AutoModelForImageTextToText.from_pretrained("khazarai/Math-VL-8B") |
| | messages = [ |
| | { |
| | "role": "user", |
| | "content": [ |
| | {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, |
| | {"type": "text", "text": "Resimde verilen matematik problemini çözün."} |
| | ] |
| | }, |
| | ] |
| | inputs = processor.apply_chat_template( |
| | messages, |
| | add_generation_prompt=True, |
| | tokenize=True, |
| | return_dict=True, |
| | return_tensors="pt", |
| | ).to(model.device) |
| | |
| | outputs = model.generate(**inputs, max_new_tokens=1024) |
| | print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you use this model in academic work, please cite: |
| | - The original Qwen model |
| | - Turkish-Math-VQA dataset |