ResNet18 Image Captioning Weights (CIFAR-10)

This repository contains the trained weights for an image captioning system consisting of a CNN Encoder and an RNN Decoder, fine-tuned on the CIFAR-10 dataset.

📦 Model Components

1. Encoder (`encoder`)

Architecture: ResNet18 (Feature Extractor)
Output Dim: 256
Purpose: Extracts high-level visual features from input images. The final fully connected layer was replaced to map features to the embedding space.

2. Decoder (`decoder`)

Architecture: LSTM-based RNN
Hidden Dim: 512
Embedding Dim: 256
Purpose: Generates descriptive sequences based on the features received from the Encoder.

🚀 Usage

You can load these weights directly using the huggingface_hub library in Python:

from huggingface_hub import hf_hub_download
import torch

# Download weights
encoder_path = hf_hub_download(repo_id="Sher1988/image-classifier-weights", filename="encoder")
decoder_path = hf_hub_download(repo_id="Sher1988/image-classifier-weights", filename="decoder")

# Load into your model classes
# encoder.load_state_dict(torch.load(encoder_path, map_location='cpu'))
# decoder.load_state_dict(torch.load(decoder_path, map_location='cpu'))

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Sher1988
/

image-classifier-weights

ResNet18 Image Captioning Weights (CIFAR-10)

📦 Model Components

1. Encoder (`encoder`)

2. Decoder (`decoder`)

🚀 Usage

Dataset used to train Sher1988/image-classifier-weights

Space using Sher1988/image-classifier-weights 1

ResNet18 Image Captioning Weights (CIFAR-10)

📦 Model Components

1. Encoder (encoder)

2. Decoder (decoder)

🚀 Usage

Dataset used to train Sher1988/image-classifier-weights

Space using Sher1988/image-classifier-weights 1

1. Encoder (`encoder`)

2. Decoder (`decoder`)