Agents-X
/

PyVision-Video-7B-SFT

Video-Text-to-Text

image-text-to-text

text-generation-inference

Model card Files Files and versions

stzhao commited on 29 days ago

Commit

dd34da5

·

verified ·

1 Parent(s): 0d9c830

Update README.md

Files changed (1) hide show

README.md +12 -60

README.md CHANGED Viewed

@@ -1,60 +1,12 @@
----
-library_name: transformers
-license: other
-base_model: Qwen/Qwen2.5-VL-7B-Instruct
-tags:
-- llama-factory
-- full
-- generated_from_trainer
-model-index:
-- name: qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# qwen2_5vl_7b_full_sft_251013_all_wo_hint-EMA
-This model is a fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on the sft_data_vsi_wo_video_hint_sharegpt, the sft_data_gmai_reasoning_wo_image_hint_sharegpt, the sft_data_mmpr_wo_image_hint_sharegpt, the sft_data_mmk12_wo_image_hint_sharegpt, the sft_data_longvila_wo_video_hint_sharegpt and the sft_data_cosyn_chart_wo_image_hint_sharegpt datasets.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 2
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- total_train_batch_size: 16
-- total_eval_batch_size: 64
-- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 1.0
-### Training results
-### Framework versions
-- Transformers 4.52.1
-- Pytorch 2.5.1
-- Datasets 3.6.0
-- Tokenizers 0.21.1

+## PyVision-RL: Forging Open Agentic Vision Models via RL
+This is PyVision-Video-7B-SFT, post trained from Qwen2.5-VL-7B.
+```bibtex
+@article{pyvision2025,
+  title={PyVision-RL: Forging Open Agentic Vision Models},
+  author={Your Name},
+  journal={arXiv:2501.xxxxx},
+  year={2025}
+}
+```