| | --- |
| | pipeline_tag: image-text-to-text |
| | --- |
| | <br> |
| | <br> |
| |
|
| | # Math-LLaVA-13B Model Card |
| |
|
| | ## Model details |
| |
|
| | **Model type:** |
| | Math-LLaVA is an open-source MLLM by fine-tuning LLaVA-1.5-13B on selected and GPT4-Vision-assisted synthesized [MathV360K](https://huggingface.co/datasets/Zhiqiang007/MathV360K/tree/main) data. |
| |
|
| | **Model date:** |
| | Math-LLaVA-13B was trained in June 2024. |
| |
|
| | **Paper or resources for more information:** |
| | [[Paper](http://arxiv.org/abs/2406.17294)] [[Code](https://github.com/HZQ950419/Math-LLaVA)] |
| |
|
| | ## License |
| | Llama 2 is licensed under the LLAMA 2 Community License, |
| | Copyright (c) Meta Platforms, Inc. All Rights Reserved. |
| |
|
| | ## Intended use |
| | **Primary intended uses:** |
| | The primary use of Math-LLaVA is research on multimodal large language models, multimodal reasoning and question answering. |
| |
|
| | **Primary intended users:** |
| | The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. |
| |
|
| | ## Training dataset |
| | - MathV360K instruction-tuning data |
| |
|
| | ## Evaluation dataset |
| | A collection of 3 benchmarks, including 2 multimodal mathematical reasoning benchmarks and 1 benchmark for multi-discipline multimodal reasoning. |
| |
|