ykorkmaz
/

rfm_progress_only

vision-language

text-generation-inference

Model card Files Files and versions

rfm_progress_only / README.md

ykorkmaz's picture

Upload RFM model

6582d36 verified 4 months ago

|

history blame contribute delete

1.61 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-VL-3B-Instruct
	tags:
	- reward-model
	- rfm
	- vision-language
	- multimodal
	library_name: transformers
	---

	# ykorkmaz/rfm_progress_only

	This is a Reward Function Model (RFM) for vision-language preference learning and similarity assessment.

	## Model Details

	- Base Model: Qwen/Qwen2.5-VL-3B-Instruct
	- Model Type: qwen2_5_vl
	- Architecture: RFMModel
	- Task: Vision-Language Reward Modeling
	- Training Method: FSDP (Fully Sharded Data Parallel)

	## Usage

	```python
	from transformers import AutoProcessor, AutoModel
	import torch

	# Load model and processor
	processor = AutoProcessor.from_pretrained("ykorkmaz/rfm_progress_only", trust_remote_code=True)
	model = AutoModel.from_pretrained("ykorkmaz/rfm_progress_only", trust_remote_code=True)

	# Example usage for preference scoring
	# inputs = processor(images=images, text=text, return_tensors="pt")
	# outputs = model(**inputs, sample_type="preference")
	```

	## Model Capabilities

	This RFM model can perform:

	1. Preference Prediction: Given two trajectories A and B, predict which one is preferred
	2. Similarity Assessment: Evaluate how similar a trajectory is to a reference
	3. Progress Estimation: Estimate task completion progress

	## Training

	The model was trained using:
	- FSDP for distributed training
	- Mixed precision (bfloat16)
	- Custom loss functions for preference and similarity learning

	## Files

	This repository contains:
	- Model weights in SafeTensors format
	- Configuration files
	- Tokenizer/Processor files

	## Citation

	If you use this model, please cite: