Safetensors
qwen2_5_vl

Add metadata for library_name, pipeline_tag, and license

#3
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -1,13 +1,17 @@
1
  ---
2
- datasets:
3
- - MINT-SJTU/RoboFAC-dataset
4
  base_model:
5
  - Qwen/Qwen2.5-VL-7B-Instruct
 
 
 
 
 
6
  ---
7
 
8
  # Model Card for RoboFAC-7B
9
  [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://mint-sjtu.github.io/RoboFAC.io/) [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/abs/2505.12224) [![Dataset](https://img.shields.io/badge/Dataset-Huggingface-green)](https://huggingface.co/datasets/MINT-SJTU/RoboFAC-dataset) [![Model](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/MINT-SJTU/RoboFAC-7B)
10
- RoboFAC-7B is a large-scale vision-language model specifically finetuned for **robotic failure understanding and correction**. It takes in visual observations of robot executions (usually video frames) and outputs detailed answers to questions that analyze, diagnose, and propose corrections for robotic manipulation failures.
 
11
 
12
  ## Model Details
13
 
@@ -70,5 +74,4 @@ print(processor.batch_decode(outputs, skip_special_tokens=True))
70
  primaryClass={cs.RO},
71
  url={https://arxiv.org/abs/2505.12224}
72
  }
73
- ```
74
-
 
1
  ---
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-VL-7B-Instruct
4
+ datasets:
5
+ - MINT-SJTU/RoboFAC-dataset
6
+ library_name: transformers
7
+ pipeline_tag: video-text-to-text
8
+ license: apache-2.0
9
  ---
10
 
11
  # Model Card for RoboFAC-7B
12
  [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://mint-sjtu.github.io/RoboFAC.io/) [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/abs/2505.12224) [![Dataset](https://img.shields.io/badge/Dataset-Huggingface-green)](https://huggingface.co/datasets/MINT-SJTU/RoboFAC-dataset) [![Model](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/MINT-SJTU/RoboFAC-7B)
13
+
14
+ RoboFAC-7B is a large-scale vision-language model introduced in the paper [RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction](https://huggingface.co/papers/2505.12224). It is specifically finetuned for **robotic failure understanding and correction**. It takes in visual observations of robot executions (usually video frames) and outputs detailed answers to questions that analyze, diagnose, and propose corrections for robotic manipulation failures.
15
 
16
  ## Model Details
17
 
 
74
  primaryClass={cs.RO},
75
  url={https://arxiv.org/abs/2505.12224}
76
  }
77
+ ```