--- title: Gradio Image Code emoji: ๐ŸŒ– colorFrom: pink colorTo: yellow sdk: gradio sdk_version: 5.32.1 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # ๐Ÿง  Qwen + DeepSeek Gradio App A Gradio web app that demonstrates: - **Image Captioning** using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4) - **Code Generation** using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) This app is tested and runs efficiently on **Kaggle notebooks** with **T4 x2 GPU accelerators**. > โš ๏ธ **Note:** Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable. --- ## ๐Ÿš€ Features - ๐Ÿ–ผ๏ธ Vision-Language tab: Upload an image + custom prompt โ†’ generate short description - ๐Ÿ’ป Code Generator tab: Write a prompt โ†’ get streaming code output - Adjustable decoding parameters: temperature, top-p, max_new_tokens --- ## ๐Ÿงฉ Installation ```bash pip install transformers pip install gradio pip install transformers_stream_generator optimum auto-gptq ``` Ensure your runtime supports GPU (e.g., Colab or local CUDA environment). --- ## ๐Ÿ“ฆ Model Details ### 1. Qwen-VL-Chat-Int4 (Image-to-Text) - Used for concise image descriptions. - Streaming output with `TextIteratorStreamer`. - Prompt format: ``` <|system|> You are a helpful assistant that describes images very concisely... <|end|> <|user|> Describe the image... <|end|> <|assistant|> ``` #### ๐Ÿ”ง Prompt Engineering Insight - Without `<|assistant|>` tag, the model sometimes overwrites or fails to complete properly. - Adding `<|assistant|>` clearly indicates the modelโ€™s turn, reducing hallucinations. - **Temperature capped to ~1.0** because higher values (e.g., 1.2+) lead to creative but false outputs. ### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code) - Generates Python or other code from natural language prompts. - Uses chat-based prompting with: - `...` block for reasoning. - Final answer separated to improve clarity. #### ๐Ÿ”ง Prompt Engineering Insight - Initially used no system prompt โ†’ vague reasoning. - Adding a system prompt improved guidance. - Separating "thinking" and "final answer" boosted relevance. - Future improvement: split thinking and answer into **separate UI tabs**. ## ๐Ÿ–ผ๏ธ Usage: Image Description Tab - Upload an image. - Write a natural prompt (e.g., "What is in this picture?") - Adjust: - `Temperature`: Higher = more creativity, but limit for stability. - `Top-p`: Controls sampling diversity. - `Max new tokens`: Max length of generated sentence. - Click **Generate** โ†’ streaming description appears. ## ๐Ÿ’ป Usage: Code Generation Tab - Write a programming task (e.g., "Write Python code to reverse a string.") - Adjust generation settings as above. - Streaming output displays generated code. - Stops early if vague prompt โ†’ clarify prompt to improve results. ## ๐Ÿง  Future Work - Add a **separate tab** for model โ€œthinkingโ€ (`...`) versus final code. - Optional logging for input-output pairs to track hallucinations or failures. - Add Markdown rendering for image descriptions.