prithivMLmods
/

Qwen3-VisionCaption-2B-Thinking

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

prithivMLmods commited on Dec 3, 2025

Commit

1992be9

·

verified ·

1 Parent(s): 53237a4

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -18,6 +18,8 @@ pipeline_tag: image-text-to-text
 library_name: transformers
 ---
 # **Qwen3-VisionCaption-2B-Thinking**
 > **Qwen3-VisionCaption-2B-Thinking** is an abliterated v1.0 variant built upon **Qwen3-VL-2B-Instruct-abliterated-v1**, which originates from the **Qwen3-VL-2B-Instruct** architecture. It is specifically optimized for seamless, high precision image captioning and uncensored visual analysis. The model is engineered for robust caption generation, deep reasoning, and unrestricted descriptive understanding across diverse visual and multimodal contexts.
@@ -107,4 +109,4 @@ print(output_text)
 * May produce explicit, sensitive, or offensive descriptions depending on visual content.
 * Not recommended for production environments requiring strict safety controls.
 * Performance may vary for heavily abstract or synthetic content.
-* Output tone depends on prompt phrasing and detail level settings.

 library_name: transformers
 ---
+![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/9IgGRwce_lx4HsDq8fGvW.png)
 # **Qwen3-VisionCaption-2B-Thinking**
 > **Qwen3-VisionCaption-2B-Thinking** is an abliterated v1.0 variant built upon **Qwen3-VL-2B-Instruct-abliterated-v1**, which originates from the **Qwen3-VL-2B-Instruct** architecture. It is specifically optimized for seamless, high precision image captioning and uncensored visual analysis. The model is engineered for robust caption generation, deep reasoning, and unrestricted descriptive understanding across diverse visual and multimodal contexts.
 * May produce explicit, sensitive, or offensive descriptions depending on visual content.
 * Not recommended for production environments requiring strict safety controls.
 * Performance may vary for heavily abstract or synthetic content.
+* Output tone depends on prompt phrasing and detail level settings.