Qwen3-VL-4B-EduGraph

This model labels K-4 math learning material with competence concepts from the EduGraph ontology.

Classification is performed by a fine-tuned Qwen3-VL model capable of processing images to understand and categorize learning content. It is trained to label content along the three competence dimensions of EduGraph: Area, Scope and Ability.

Overview

  • Developed by: Christian Bick | Github | LinkedIn | Website
  • Funded by: Community Sponsors & GCP Credits
  • Model type: Multimodal Labeling
  • Language(s): Multilingual
  • License: GNU Affero General Public License v3.0
  • Finetuned from model: Qwen3-VL-4B-Instruct
  • Repository: GitHub
  • Status: Research

Uses

This model is intended to be used in education technology, grounding other AI applications in a robust understanding of competences formed and trained by learning interactions (physical and digital).

Direct Use

This model can be used to create databases of learning material using a concise competence labeling. When combined with the knowledge graph embeddings that we have trained for the EduGraph ontology, users can create vector databases with semantic search capabilities that are far superior to what text chunking can achieve.

Downstream Use

With such a database in place, the common downstream AI applications are recommendation engines for determing custom learning paths (individualization). Recommendations require a good semantic understanding of the learning material that a student has interacted with in the past and potentially will interact with in the future.

This model provides this understanding and enables save AI applications in very sensible environments where not only accuracy is important, but also the ability to break down AI decisions for human review in concise ways.

To further increase the accuracy of this understanding for specialized tasks, this model is available for fine-tuning.

Bias, Risks, and Limitations

Important: Currently this model is in a research status and has not been evaluated under real-world conditions.

  • ONLY use this model for research, experimentation and evaluation
  • Do NOT use in a classroom environment
  • Do NOT use for automations that might impact children

Using the Model with llama.cpp (recommended)

For normal inference tasks, it is recommended to use the quantized version of this model with llama.cpp:

Installation

Winget (Windows)

winget install llama.cpp

Homebrew (Mac and Linux)

brew install llama.cpp

Download

Download the trained classification model and the original vision projector of Qwen3-VL:

# Classification model
curl -L https://huggingface.co/christian-bick/Qwen3-VL-4B-EduGraph-Q4_K_M-GGUF/resolve/main/qwen3-vl-4b-edugraph-q4_k_m.gguf -o model.gguf

# Qwen vision projector
curl -L https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct-GGUF/resolve/main/mmproj-Qwen3VL-4B-Instruct-F16.gguf -o mmproj.gguf

CLI

Classifying a provided image with llama-mtmd-cli:

llama-mtmd-cli \
  -m model.gguf \
  --mmproj mmproj.gguf \
  -ngl 99 \
  -c 8192 \
  --n-predict 1024 \
  --temp 0.0 \
  --image ./path/to/image.png \
  -p "Learning Material"

For CPU usage (set -t to your pyhical cores):

llama-mtmd-cli \
  -m model.gguf \
  --mmproj mmproj.gguf \
  --no-mmproj-offload \
  -t 4 \
  -c 8192 \
  --n-predict 1024 \
  --temp 0.0 \
  --image ./path/to/image.png \
  -p "Learning Material"

Optionally add --flash-attn to speed up tokenization for supported hardware.

Server

Starting an inference server with llama-server:

llama-server \
  -m model.gguf \
  --mmproj mmproj.gguf \
  -ngl 99 \
  -c 8192 \
  --n_predict 1024 \
  --temp 0.0 \
  --port 8080 \
  --host 0.0.0.0

Optionally add --flash-attn to speed up tokenization for supported hardware.

GGUF Repositories

In case you want to use other ways to provide the quantized model, check out these repositories:

EduGraph GGUF Repo: christian-bick/Qwen3-VL-4B-EduGraph-Q4_K_M-GGUF

Qwen3-VL GGUF Repo: Qwen/Qwen3-VL-4B-Instruct-GGUF

Using the Model with transformers

from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained(
    "christian-bick/Qwen3-VL-4B-EduGraph", dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained("christian-bick/Qwen3-VL-4B-EduGraph")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://commons.wikimedia.org/wiki/Category:Mathematical_education#/media/File:Long_summation.png",
            },
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = inputs.to(model.device)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Training Details

For full details of the training procedure, see:

https://github.com/christian-bick/edugraph-qwen3vl

Training Data

Dataset for knowledge infusion:

https://huggingface.co/datasets/christian-bick/edugraph-knowledge

Dataset for multimodal training

https://huggingface.co/datasets/christian-bick/edugraph-worksheets

Training Procedure

Supervised Learning with Pytorch and transformers

License

This project is licensed under the GNU Affero General Public License. See the LICENSE file for details.

If these license terms are not working for you, then get in touch, and we can discuss your options.

Downloads last month
15
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for christian-bick/Qwen3-VL-4B-EduGraph

Adapter
(27)
this model
Adapters
2 models