ThomasDh-C
/

RicoQwen2VL

@@ -4,13 +4,13 @@ tags:
 - llama-factory
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
@@ -18,69 +18,34 @@ tags:
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure

 - llama-factory
 ---
+# Fine-tune of Qwen2-VL on RICO dataset
+Qwen-2VL was trained to predict bounding boxes for elements in images. We further fine-tune it on the RICO android screenshot dataset to improve its performance.
 ## Model Details
+Qwen-2VL can use images of any size. We apply random crops to the RICO dataset to ensure a diverse range of aspect ratios and then fine-tune Qwen-2VL to predict bounding boxes of elements in screenshots.
 ### Model Description
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** Thomas Dhome-Casanova
+- **Model type:** VLM
+- **Language(s):** English
+- **Finetuned from model:** Qwen2-VL-7B
 ### Model Sources [optional]
+The base model is Qwen2-VL-7B-Instruct
+- **Repository:** https://github.com/QwenLM/Qwen2-VL
+- **Paper:** https://arxiv.org/pdf/2409.12191
 ## Uses
+This model is intended to be used for fast computer-use with strong visual understanding, but limited reasoning capabilities. It should hence be paired with a strong foundational model for reasoning.
 ## How to Get Started with the Model
+model = Qwen2VLForConditionalGeneration.from_pretrained(
+    "ThomasDh-C/RicoQwen2VL", torch_dtype="auto", device_map="auto"
+)
+processor = AutoProcessor.from_pretrained("ThomasDh-C/RicoQwen2VL")
 ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+RICO dataset with random crops
 ### Training Procedure