--- base_model: unsloth/qwen3-vl-8b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen3_vl license: apache-2.0 language: - en - tr datasets: - ituperceptron/turkish-math-vqa pipeline_tag: image-text-to-text library_name: transformers --- ## Model Description - Base Architecture: Qwen3-VL-8B-Instruct - Fine-Tuning Method: QLoRA (PEFT) - Language: Turkish - Domain: High School Mathematics (12th Grade) - Modality: Vision-Language (Image + Text → Text) This model is a QLoRA fine-tuned version of Qwen3-VL-8B-Instruct trained on the Turkish-Math-VQA dataset, which consists of 12th-grade mathematics problems published by the Turkish Ministry of National Education (MEB). The model is designed to: - Understand mathematical problem images - Generate step-by-step solutions in Turkish - Handle topics such as logarithms, sequences & series, trigonometry, derivatives, and integrals ## Intended Use Primary Use Cases - Turkish mathematical Visual Question Answering (VQA) - Educational AI assistants - Step-by-step solution generation - Math tutoring systems - Research in Turkish multimodal reasoning ## Out-of-Scope Use - Professional exam grading without human validation - Safety-critical mathematical applications - Guaranteed mathematically verified reasoning ## Training Data **Dataset**: Turkish-Math-VQA The dataset contains mathematics problems from official 12th-grade exams prepared by the Turkish Ministry of National Education. **Dataset Fields**: - `test_number`: The test identifier - `question_number`: Question number within the test - `image`: The image containing the math problem - `solution`: Turkish solution generated synthetically using GPT-o1 **Important Note on Labels**: The solution field was generated synthetically by GPT-o1 and has not been manually verified for correctness. While GPT-o1 is generally strong at solving problems at this level, the dataset may contain: - Incorrect reasoning steps - Logical inconsistencies - Arithmetic mistakes Therefore, the fine-tuned model may inherit these imperfections. ## How to Get Started with the Model ```python from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("khazarai/Math-VL-8B") model = AutoModelForImageTextToText.from_pretrained("khazarai/Math-VL-8B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "Resimde verilen matematik problemini çözün."} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=1024) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) ``` ## Citation If you use this model in academic work, please cite: - The original Qwen model - Turkish-Math-VQA dataset