MINT-SJTU
/

RoboFAC-7B

Add metadata for library_name, pipeline_tag, and license

by nielsr HF Staff - opened Mar 24

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,13 +1,17 @@
 ---
-datasets:
-- MINT-SJTU/RoboFAC-dataset
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
 ---
 # Model Card for RoboFAC-7B
 [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://mint-sjtu.github.io/RoboFAC.io/) [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/abs/2505.12224) [![Dataset](https://img.shields.io/badge/Dataset-Huggingface-green)](https://huggingface.co/datasets/MINT-SJTU/RoboFAC-dataset) [![Model](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/MINT-SJTU/RoboFAC-7B)
-RoboFAC-7B is a large-scale vision-language model specifically finetuned for **robotic failure understanding and correction**. It takes in visual observations of robot executions (usually video frames) and outputs detailed answers to questions that analyze, diagnose, and propose corrections for robotic manipulation failures.
 ## Model Details
@@ -70,5 +74,4 @@ print(processor.batch_decode(outputs, skip_special_tokens=True))
   primaryClass={cs.RO},
   url={https://arxiv.org/abs/2505.12224}
 }
-```

 ---
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
+datasets:
+- MINT-SJTU/RoboFAC-dataset
+library_name: transformers
+pipeline_tag: video-text-to-text
+license: apache-2.0
 ---
 # Model Card for RoboFAC-7B
 [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://mint-sjtu.github.io/RoboFAC.io/) [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://arxiv.org/abs/2505.12224) [![Dataset](https://img.shields.io/badge/Dataset-Huggingface-green)](https://huggingface.co/datasets/MINT-SJTU/RoboFAC-dataset) [![Model](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/MINT-SJTU/RoboFAC-7B)
+RoboFAC-7B is a large-scale vision-language model introduced in the paper [RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction](https://huggingface.co/papers/2505.12224). It is specifically finetuned for **robotic failure understanding and correction**. It takes in visual observations of robot executions (usually video frames) and outputs detailed answers to questions that analyze, diagnose, and propose corrections for robotic manipulation failures.
 ## Model Details
   primaryClass={cs.RO},
   url={https://arxiv.org/abs/2505.12224}
 }
+```