File size: 1,384 Bytes
39d0b94 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ---
license: mit
datasets:
- cifar10
metrics:
- accuracy
library_name: pytorch
tags:
- image-captioning
- resnet18
- lstm
---
# ResNet18 Image Captioning Weights (CIFAR-10)
This repository contains the trained weights for an image captioning system consisting of a **CNN Encoder** and an **RNN Decoder**, fine-tuned on the CIFAR-10 dataset.
## 📦 Model Components
### 1. Encoder (`encoder`)
- **Architecture:** ResNet18 (Feature Extractor)
- **Output Dim:** 256
- **Purpose:** Extracts high-level visual features from input images. The final fully connected layer was replaced to map features to the embedding space.
### 2. Decoder (`decoder`)
- **Architecture:** LSTM-based RNN
- **Hidden Dim:** 512
- **Embedding Dim:** 256
- **Purpose:** Generates descriptive sequences based on the features received from the Encoder.
## 🚀 Usage
You can load these weights directly using the `huggingface_hub` library in Python:
```python
from huggingface_hub import hf_hub_download
import torch
# Download weights
encoder_path = hf_hub_download(repo_id="Sher1988/image-classifier-weights", filename="encoder")
decoder_path = hf_hub_download(repo_id="Sher1988/image-classifier-weights", filename="decoder")
# Load into your model classes
# encoder.load_state_dict(torch.load(encoder_path, map_location='cpu'))
# decoder.load_state_dict(torch.load(decoder_path, map_location='cpu'))
|