Update README.md
Browse files
README.md
CHANGED
|
@@ -11,9 +11,9 @@ library_name: transformers
|
|
| 11 |
|
| 12 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 13 |
|
| 14 |
-
Llama-3.2V-11B-cot is the first version of [LLaVA-
|
| 15 |
|
| 16 |
-
The model was proposed in [LLaVA-
|
| 17 |
|
| 18 |
## Model Details
|
| 19 |
|
|
@@ -61,7 +61,7 @@ You can use the inference code for Llama-3.2-11B-Vision-Instruct.
|
|
| 61 |
|
| 62 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 63 |
|
| 64 |
-
The model is trained on the LLaVA-
|
| 65 |
|
| 66 |
### Training Procedure
|
| 67 |
|
|
|
|
| 11 |
|
| 12 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 13 |
|
| 14 |
+
Llama-3.2V-11B-cot is the first version of [LLaVA-CoT](https://github.com/PKU-YuanGroup/LLaVA-CoT), which is a visual language model capable of spontaneous, systematic reasoning.
|
| 15 |
|
| 16 |
+
The model was proposed in [LLaVA-CoT: Let Vision Language Models Reason Step-by-Step](https://huggingface.co/papers/2411.10440).
|
| 17 |
|
| 18 |
## Model Details
|
| 19 |
|
|
|
|
| 61 |
|
| 62 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 63 |
|
| 64 |
+
The model is trained on the [LLaVA-CoT-100k dataset](https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k).
|
| 65 |
|
| 66 |
### Training Procedure
|
| 67 |
|