--- title: ImageCaptionTestSpace emoji: 😻 colorFrom: blue colorTo: red sdk: gradio sdk_version: 5.49.0 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # Image Captioning (ViT-GPT2) — Hugging Face Space This Space serves an image captioning model using Hugging Face `VisionEncoderDecoderModel` (ViT + GPT-2). It runs out-of-the-box with the base model and can optionally load your **fine-tuned** weights. **Live app entrypoint:** `app.py` (Gradio) ## Quick Start (on Spaces) 1. Click **New Space** → **Gradio** → **Blank** → pick a free CPU or T4 small (GPU) runtime. 2. Upload all files from this repo. 3. (Optional) If you have fine-tuned weights: - Upload the saved folder to the Space (e.g., `outputs/caption_finetune/`) - Set a Space secret or environment variable: `MODEL_DIR = outputs/caption_finetune` - Alternatively push your weights to the Hub and set `MODEL_DIR = your-username/your-model-repo` If `MODEL_DIR` is not set, the app uses `nlpconnect/vit-gpt2-image-captioning`. ## Local Dev ```bash pip install -r requirements.txt python app.py # then open http://127.0.0.1:7860