|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-VL-3B-Instruct |
|
|
tags: |
|
|
- reward-model |
|
|
- rfm |
|
|
- vision-language |
|
|
- multimodal |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# ykorkmaz/rfm_progress_only |
|
|
|
|
|
This is a Reward Function Model (RFM) for vision-language preference learning and similarity assessment. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: Qwen/Qwen2.5-VL-3B-Instruct |
|
|
- **Model Type**: qwen2_5_vl |
|
|
- **Architecture**: RFMModel |
|
|
- **Task**: Vision-Language Reward Modeling |
|
|
- **Training Method**: FSDP (Fully Sharded Data Parallel) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModel |
|
|
import torch |
|
|
|
|
|
# Load model and processor |
|
|
processor = AutoProcessor.from_pretrained("ykorkmaz/rfm_progress_only", trust_remote_code=True) |
|
|
model = AutoModel.from_pretrained("ykorkmaz/rfm_progress_only", trust_remote_code=True) |
|
|
|
|
|
# Example usage for preference scoring |
|
|
# inputs = processor(images=images, text=text, return_tensors="pt") |
|
|
# outputs = model(**inputs, sample_type="preference") |
|
|
``` |
|
|
|
|
|
## Model Capabilities |
|
|
|
|
|
This RFM model can perform: |
|
|
|
|
|
1. **Preference Prediction**: Given two trajectories A and B, predict which one is preferred |
|
|
2. **Similarity Assessment**: Evaluate how similar a trajectory is to a reference |
|
|
3. **Progress Estimation**: Estimate task completion progress |
|
|
|
|
|
## Training |
|
|
|
|
|
The model was trained using: |
|
|
- FSDP for distributed training |
|
|
- Mixed precision (bfloat16) |
|
|
- Custom loss functions for preference and similarity learning |
|
|
|
|
|
## Files |
|
|
|
|
|
This repository contains: |
|
|
- Model weights in SafeTensors format |
|
|
- Configuration files |
|
|
- Tokenizer/Processor files |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|