---
title: CIFAR-100 Image Classifier
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.49.1"
app_file: app.py
pinned: false
license: mit
---

# CIFAR-100 ResNet Training from Scratch

A ResNet-34 model trained from scratch on CIFAR-100 dataset, achieving **76.68% top-1 accuracy** in 100 epochs with OneCycle Learning Rate scheduling.

## Project Overview

This project demonstrates training a ResNet architecture from scratch on the CIFAR-100 dataset without using any pre-trained models. The implementation leverages modern deep learning techniques including data augmentation, OneCycle LR scheduling, and mixed precision training.

## Results Summary

### Performance Metrics (100 Epochs)

| Metric | Score |
|--------|-------|
| **Top-1 Accuracy** | **76.68%** ✅ (Target: 73%) |
| **Top-3 Accuracy** | **90.95%** |
| **Top-5 Accuracy** | **94.07%** |
| **Best Test Accuracy** | **76.79%** (Epoch 99) |
| **Macro F1-Score** | **0.7670** |
| **Weighted F1-Score** | **0.7668** |

### Averaged Metrics

**Macro-Averaged (unweighted):**
- Precision: 0.7708
- Recall: 0.7668
- F1-Score: 0.7670

**Weighted-Averaged (by class support):**
- Precision: 0.7708
- Recall: 0.7668
- F1-Score: 0.7668

## Training Configuration

### Model Architecture

**Custom Lightweight ResNet for CIFAR-100**

A specially designed ResNet variant optimized for small image classification:

```
Model: ResNet34 (CIFAR-optimized)
Total Parameters: 4,949,412 (~5M)
Trainable Parameters: 4,949,412
Input Size: 32×32×3 (RGB)
Output Classes: 100
```

**Architecture Details** (from `model_cifar.py`):

### Layer-by-Layer Feature Map Progression

| Layer | Operation | Kernel | Stride | Padding | Input Size | Output Size | Channels | Receptive Field |
|-------|-----------|--------|--------|---------|------------|-------------|----------|-----------------|
| **Input** | - | - | - | - | 32×32 | 32×32 | 3 | 1×1 |
| **conv1** | Conv2d | 3×3 | 1 | 1 | 32×32×3 | 32×32×64 | 64 | **3×3** |
| **bn1+relu** | BN+ReLU | - | - | - | 32×32×64 | 32×32×64 | 64 | 3×3 |
| **layer1** | BasicBlock | 3×3,3×3 | 1,1 | 1,1 | 32×32×64 | **32×32×64** | 64 | **7×7** |
| **layer2** | BasicBlock | 3×3,3×3 | 2,1 | 1,1 | 32×32×64 | **16×16×128** | 128 | **15×15** |
| **layer3** | BasicBlock | 3×3,3×3 | 2,1 | 1,1 | 16×16×128 | **8×8×256** | 256 | **31×31** |
| **layer4** | BasicBlock | 3×3,3×3 | 2,1 | 1,1 | 8×8×256 | **4×4×512** | 512 | **63×63** |
| **avgpool** | AdaptiveAvgPool2d | 4×4 | - | - | 4×4×512 | 1×1×512 | 512 | **Full image** |
| **fc** | Linear | - | - | - | 512 | 100 | 100 | - |

**Key Observations:**
- **Receptive field at layer4**: 63×63 pixels (covers **full 32×32 image** with 2× margin)
- **Spatial downsampling**: 3 stride-2 operations reduce 32×32 → 4×4 (8× reduction)
- **Channel expansion**: 3 → 64 → 128 → 256 → 512 (progressive feature richness)
- **Feature map efficiency**: No information loss from MaxPooling (common in ImageNet models)

### Detailed Architecture Components

1. **Initial Convolution Block**
   ```
   Input: 32×32×3 → Conv2d(3→64, k=3×3, s=1, p=1) → BN → ReLU → Output: 32×32×64
   Receptive Field: 1×1 → 3×3
   ```
   - CIFAR-optimized: 3×3 conv (not 7×7 like ImageNet ResNets)
   - Preserves spatial resolution (no stride-2 or MaxPool)
   - Captures fine-grained details essential for small images

2. **Layer 1: Residual Stage 1** (64 channels, no downsampling)
   ```
   Input: 32×32×64
   BasicBlock:
     ├─ Conv(64→64, k=3×3, s=1, p=1) → BN → ReLU → 32×32×64
     ├─ Conv(64→64, k=3×3, s=1, p=1) → BN → 32×32×64
     └─ Add(identity) → ReLU → Output: 32×32×64
   Receptive Field: 3×3 → 7×7
   ```
   - No spatial downsampling (stride=1)
   - Identity skip connection (no projection needed)
   - RF grows by 4 pixels (2 conv layers × 2 pixels each)

3. **Layer 2: Residual Stage 2** (128 channels, downsample)
   ```
   Input: 32×32×64
   BasicBlock:
     ├─ Conv(64→128, k=3×3, s=2, p=1) → BN → ReLU → 16×16×128
     ├─ Conv(128→128, k=3×3, s=1, p=1) → BN → 16×16×128
     ├─ Skip: Conv(64→128, k=1×1, s=2) → BN → 16×16×128 (projection)
     └─ Add(skip) → ReLU → Output: 16×16×128
   Receptive Field: 7×7 → 15×15
   ```
   - **Spatial downsampling**: 32×32 → 16×16 (stride=2 in first conv)
   - **Channel expansion**: 64 → 128
   - **Projection shortcut**: 1×1 conv matches dimensions
   - RF doubles due to stride-2 convolution

4. **Layer 3: Residual Stage 3** (256 channels, downsample)
   ```
   Input: 16×16×128
   BasicBlock:
     ├─ Conv(128→256, k=3×3, s=2, p=1) → BN → ReLU → 8×8×256
     ├─ Conv(256→256, k=3×3, s=1, p=1) → BN → 8×8×256
     ├─ Skip: Conv(128→256, k=1×1, s=2) → BN → 8×8×256 (projection)
     └─ Add(skip) → ReLU → Output: 8×8×256
   Receptive Field: 15×15 → 31×31
   ```
   - **Spatial downsampling**: 16×16 → 8×8
   - **Channel expansion**: 128 → 256
   - RF now covers most of the input image

5. **Layer 4: Residual Stage 4** (512 channels, downsample)
   ```
   Input: 8×8×256
   BasicBlock:
     ├─ Conv(256→512, k=3×3, s=2, p=1) → BN → ReLU → 4×4×512
     ├─ Conv(512→512, k=3×3, s=1, p=1) → BN → 4×4×512
     ├─ Skip: Conv(256→512, k=1×1, s=2) → BN → 4×4×512 (projection)
     └─ Add(skip) → ReLU → Output: 4×4×512
   Receptive Field: 31×31 → 63×63
   ```
   - **Final spatial downsampling**: 8×8 → 4×4
   - **Maximum channels**: 512 (highest feature richness)
   - **RF exceeds input size**: 63×63 > 32×32 (full image context)

6. **Classification Head**
   ```
   Input: 4×4×512
     ├─ AdaptiveAvgPool2d((1,1)) → 1×1×512 (global spatial pooling)
     ├─ Flatten → 512
     └─ Linear(512 → 100) → 100 class logits
   ```
   - Global Average Pooling: Each of 512 channels → single value
   - Reduces overfitting vs fully-connected layers
   - Translation invariant features

7. **Initialization Strategy**
   - **Kaiming (He) Normal** for Conv2d weights
     - Optimal for ReLU activations
     - `std = sqrt(2 / fan_in)`
   - **Constant initialization** for BatchNorm
     - weight = 1, bias = 0

### Architecture Flow Diagram

```
Input Image (32×32×3, RF=1×1)
    ↓
┌─────────────────────────────────────────────────────────┐
│ STEM: Conv 3×3 → BN → ReLU                             │
│ Output: 32×32×64, RF=3×3                               │
└─────────────────────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────────────────────┐
│ STAGE 1: BasicBlock (64 channels, stride=1)           │
│   Conv 3×3 → BN → ReLU → Conv 3×3 → BN → (+) → ReLU   │
│   Output: 32×32×64, RF=7×7                             │
│   Skip: Identity (no projection needed)                │
└─────────────────────────────────────────────────────────┘
    ↓ [Spatial: 32×32, Channels: 64, RF: 7×7]
┌─────────────────────────────────────────────────────────┐
│ STAGE 2: BasicBlock (128 channels, stride=2) ↓↓       │
│   Conv 3×3,s2 → BN → ReLU → Conv 3×3 → BN → (+) → ReLU│
│   Output: 16×16×128, RF=15×15                          │
│   Skip: Conv 1×1,s2 (projection: 64→128)               │
└─────────────────────────────────────────────────────────┘
    ↓ [Spatial: 16×16, Channels: 128, RF: 15×15]
┌─────────────────────────────────────────────────────────┐
│ STAGE 3: BasicBlock (256 channels, stride=2) ↓↓       │
│   Conv 3×3,s2 → BN → ReLU → Conv 3×3 → BN → (+) → ReLU│
│   Output: 8×8×256, RF=31×31                            │
│   Skip: Conv 1×1,s2 (projection: 128→256)              │
└─────────────────────────────────────────────────────────┘
    ↓ [Spatial: 8×8, Channels: 256, RF: 31×31]
┌─────────────────────────────────────────────────────────┐
│ STAGE 4: BasicBlock (512 channels, stride=2) ↓↓       │
│   Conv 3×3,s2 → BN → ReLU → Conv 3×3 → BN → (+) → ReLU│
│   Output: 4×4×512, RF=63×63 (exceeds 32×32!)          │
│   Skip: Conv 1×1,s2 (projection: 256→512)              │
└─────────────────────────────────────────────────────────┘
    ↓ [Spatial: 4×4, Channels: 512, RF: Full Image]
┌─────────────────────────────────────────────────────────┐
│ HEAD: Global Average Pooling → FC                      │
│   AdaptiveAvgPool2d(1,1) → Flatten → Linear(512→100)  │
│   Output: 100 class logits                             │
└─────────────────────────────────────────────────────────┘
    ↓
Predictions (100 classes)
```

**Key Design Choices:**
- ✅ **CIFAR-specific stem**: 3×3 conv instead of 7×7 (ImageNet-style)
- ✅ **No aggressive downsampling**: Preserves spatial information for 32×32 images
- ✅ **Lightweight**: 1 block per stage instead of [3,4,6,3] for efficient training
- ✅ **Residual connections**: Enable gradient flow for deeper networks
- ✅ **Global Average Pooling**: Reduces overfitting vs fully-connected layers
- ✅ **Progressive RF growth**: Each layer sees more context (7→15→31→63 pixels)

### Training Hyperparameters
```python
Epochs: 100
Batch Size: 512
Optimizer: SGD with Nesterov momentum
Momentum: 0.9
Weight Decay: 1e-4
Label Smoothing: 0.1
Mixed Precision: Enabled (AMP)
Gradient Clipping: 1.0

# OneCycle Learning Rate Schedule
LR Schedule: OneCycle (Custom)
  - Phase 1 (Epochs 0-40): 0.01 → 0.1 (warmup)
  - Phase 2 (Epochs 41-81): 0.1 → 0.01 (cooldown)
  - Phase 3 (Epochs 82-99): 0.01 → 0.001 (annihilation)
```

### Data Augmentation
Using Albumentations library:
- **Training:**
  - Random padding (32→36) + Random crop (36→32)
  - Horizontal flip (p=0.5)
  - ShiftScaleRotate (shift=0.05, scale=0.05, rotate=5°, p=0.3)
  - CoarseDropout/Cutout (16×16, p=0.4)
  - Color jitter (brightness, contrast, saturation, hue, p=0.4)
  - Normalization (CIFAR-100 mean/std)

- **Testing:**
  - Normalization only

## Training Results

### Training Curves

![Training Curves](plots_complete/training_curves.png)

The training curves show:
- **Steady convergence** with minimal overfitting
- **Effective learning rate schedule** with OneCycle policy
- **Generalization gap** maintained below 5% throughout training
- **Final training accuracy:** 80.47%

### Learning Rate Schedule

![Learning Rate Schedule](plots_complete/learning_rate_schedule.png)

The OneCycle LR schedule implementation:
1. **Warmup Phase (41 epochs):** Linear increase from 0.01 to 0.1
2. **Cooldown Phase (41 epochs):** Linear decrease from 0.1 to 0.01
3. **Annihilation Phase (18 epochs):** Linear decrease from 0.01 to 0.001

This schedule helps the model:
- Escape local minima early in training
- Find a wide minimum for better generalization
- Fine-tune with very small learning rates at the end

### Per-Class Performance

![Class Metrics](plots_complete/class_metrics.png)

**Top 5 Best Performing Classes:**
1. **wardrobe** - F1: 0.9458 (Precision: 0.9320, Recall: 0.9600)
2. **sunflower** - F1: 0.9381 (Precision: 0.9681, Recall: 0.9100)
3. **poppy** - F1: 0.9315 (Precision: 0.9444, Recall: 0.9189)
4. **can** - F1: 0.9310 (Precision: 0.9000, Recall: 0.9643)
5. **skyscraper** - F1: 0.9100 (Precision: 0.9100, Recall: 0.9100)

**Most Challenging Classes:**
- **boy** - F1: 0.4286 (Fine-grained human features)
- **girl** - F1: 0.4646 (Similar to boy)
- **baby** - F1: 0.5079 (Fine-grained human features)
- **man** - F1: 0.5758 (Similar to boy)
- **plate** - F1: 0.5797 (Simple objects, easily confused)

The model performs exceptionally well on distinct objects (flowers, buildings, furniture) but struggles with fine-grained human categorization, which is expected for CIFAR-100's 32×32 resolution.

## Model Architecture Summary

From `model_cifar.py`:

| Component | Specification |
|-----------|---------------|
| **Model Name** | ResNet34 (CIFAR-optimized) |
| **Total Parameters** | 4,949,412 (~5M) |
| **Architecture Depth** | 10 weight layers (1 initial + 8 residual + 1 FC) |
| **Residual Blocks** | 4 BasicBlocks (1 per stage) |
| **Channel Progression** | 3 → 64 → 128 → 256 → 512 → 100 |
| **Spatial Downsampling** | 32×32 → 16×16 → 8×8 → 4×4 → 1×1 |
| **Receptive Field Growth** | 1×1 → 3×3 → 7×7 → 15×15 → 31×31 → 63×63 |
| **Skip Connections** | 4 (1 identity + 3 projection shortcuts) |
| **Pooling Strategy** | Global Average Pooling (4×4 → 1×1) |
| **Initialization** | Kaiming Normal (He) for Conv, Constant for BN |
| **Downsampling Method** | Strided convolutions (no MaxPool) |

**Why This Architecture Works for CIFAR-100:**

1. **Right-sized capacity**: 5M parameters balances expressiveness vs overfitting risk
2. **Preserved resolution**: No aggressive downsampling maintains spatial detail in 32×32 images
3. **Optimal receptive field**: 63×63 RF exceeds input size (32×32), capturing full image context
4. **Progressive downsampling**: 3 stride-2 ops (vs 1 MaxPool + 4 stride-2 in ImageNet ResNet)
5. **Residual learning**: Skip connections enable gradient flow through 10 weight layers
6. **Efficient computation**: Lightweight design trains in ~2-3 hours on single GPU

**Receptive Field Analysis:**
- By **layer2** (16×16×128): RF = 15×15 → covers ~50% of image
- By **layer3** (8×8×256): RF = 31×31 → covers ~95% of image  
- By **layer4** (4×4×512): RF = 63×63 → covers **full image + context**
- Each neuron in final feature map can "see" the entire input image

## Project Structure

```
CIFAR100/
├── main.py                 # Main training script with OneCycle LR
├── model_cifar.py         # Custom ResNet architecture (5M params)
│   ├── BasicBlock         # 2-layer residual block with skip connection
│   └── ResNet34           # CIFAR-optimized ResNet variant
├── train.py               # Training and evaluation loops
├── preprocess.py          # Data loading with Albumentations
├── visualization.py       # Metrics calculation and plotting
├── inference.py           # Model inference utilities
├── app.py                 # Gradio web interface for demo
├── run_complete_training.py  # Full training pipeline with logging
├── requirements.txt       # Python dependencies
├── log/                   # Training logs
│   └── training_complete_20251010-103227.log
└── plots_complete/        # Training visualizations
    ├── training_curves.png
    ├── learning_rate_schedule.png
    ├── class_metrics.png
    ├── confusion_matrix.png
    └── classification_report.txt
```

## Quick Start

### Installation

```bash
# Clone the repository
git clone <your-repo-url>
cd CIFAR100

# Install dependencies
pip install -r requirements.txt
```

### Training

```bash
# Train with OneCycle LR for 100 epochs
python main.py \
    --scheduler onecycle \
    --epochs 100 \
    --batch_size 512 \
    --lr 0.1 \
    --momentum 0.9 \
    --weight_decay 1e-4 \
    --amp \
    --plot_training \
    --plot_evaluation

# Or use the complete training script with logging
python run_complete_training.py
```

### Inference

```bash
# Run interactive web demo
python app.py

# Or use inference script
python inference.py --image path/to/image.jpg --model snapshots/best_model.pth
```

## Key Features

### 1. **OneCycle Learning Rate Policy**
Implements the OneCycle LR schedule from "Super-Convergence: Very Fast Training of Neural Networks" paper:
- Achieves faster convergence
- Better generalization
- Higher final accuracy

### 2. **Comprehensive Metrics Logging**
After each training run, the script automatically outputs:
- Training and test accuracy/loss curves
- Top-1, Top-3, Top-5 accuracies
- Precision, Recall, F1-Score (macro and weighted)
- Per-class performance breakdown
- Confusion matrix and classification report

### 3. **Mixed Precision Training (AMP)**
- 2-3x faster training on modern GPUs
- Reduced memory usage
- Maintains accuracy with float16/float32 mixed precision

### 4. **Advanced Data Augmentation**
Uses Albumentations for efficient augmentation:
- Faster than torchvision transforms
- More augmentation options
- GPU-compatible with minimal overhead

### 5. **Model Checkpointing**
- Automatic snapshot saving at specified intervals
- Best model tracking based on test accuracy
- Resume training from any checkpoint

## Detailed Training Log

Full training logs are available in `log/training_complete_20251010-103227.log`, including:
- Per-epoch train/test loss and accuracy
- Learning rate at each epoch
- Final comprehensive evaluation with per-class metrics
- Training time and resource utilization

Example final output:
```
======================================================================
TRAINING COMPLETED - FINAL EVALUATION
======================================================================

TRAINING SUMMARY
----------------------------------------------------------------------
Total Epochs Trained: 100
Final Training Loss: 0.5584
Final Training Accuracy: 80.47%
Best Training Accuracy: 81.05% (Epoch 94)
Final Learning Rate: 0.001500

TEST/VALIDATION SUMMARY
----------------------------------------------------------------------
Final Test Loss: 0.8985
Final Test Accuracy: 76.68%
Best Test Accuracy: 76.79% (Epoch 99)

COMPREHENSIVE TEST SET METRICS
----------------------------------------------------------------------
Top-1 Accuracy (Test): 76.68%
Top-3 Accuracy (Test): 90.95%
Top-5 Accuracy (Test): 94.07%
```

## Requirements Met

✅ **Training from Scratch**: Custom ResNet (5M params) trained without pre-trained weights  
✅ **CIFAR-100 Dataset**: All 100 classes used (50,000 train / 10,000 test)  
✅ **Target Accuracy**: **76.68% achieved** (target: 73%) - **Exceeded by 3.68%**  
✅ **Training Duration**: 100 epochs with OneCycle LR schedule  
✅ **Modern Tools**: Extensive use of ChatGPT/Cursor for development  
✅ **Comprehensive Evaluation**: Full metrics, plots, and detailed analysis  
✅ **Model Architecture**: Custom lightweight ResNet optimized for CIFAR-100  
✅ **Reproducibility**: Complete logs, checkpoints, and configuration documented  

## Technologies Used

- **PyTorch** - Deep learning framework
- **Albumentations** - Data augmentation
- **Gradio** - Web interface for inference
- **scikit-learn** - Metrics calculation
- **matplotlib/seaborn** - Visualization
- **numpy** - Numerical operations

## Model Comparison

| Model Variant | Parameters | Expected Accuracy | Notes |
|---------------|------------|-------------------|-------|
| **Our Model** (4 blocks) | **5M** | **76.68%** | Balanced efficiency & accuracy |
| Standard ResNet-18 | 11M | ~76-78% | Good baseline for CIFAR |
| Standard ResNet-34 | 21M | ~78-80% | More capacity, slower training |
| Wide-ResNet-28-10 | 36M | ~80-82% | State-of-art, requires more resources |
| PyramidNet | 26M | ~82-84% | Complex architecture |

**Our lightweight design achieves competitive accuracy with 2-4× fewer parameters than standard ResNets.**

## Future Improvements

Potential enhancements to reach higher accuracy (78%+):
1. **Architecture upgrades**: 
   - Increase blocks per stage: [2, 2, 2, 2] or [3, 3, 3, 3]
   - Try Wide-ResNet with wider channels
   - Add Squeeze-and-Excitation (SE) blocks
2. **Training tricks**: 
   - Mixup (α=0.2) for better generalization
   - CutMix for spatial regularization
   - AutoAugment or RandAugment policies
3. **Regularization**: 
   - Stochastic Depth (survival probability 0.8-0.9)
   - DropBlock for spatial dropout
   - Increased label smoothing (0.2)
4. **Ensemble methods**: 
   - Train 3-5 models with different seeds
   - Snapshot ensembles (save last N checkpoints)
5. **Longer training**: 
   - 200-300 epochs with cosine annealing
   - Multi-step or exponential LR decay
6. **Knowledge distillation**: 
   - Train larger teacher model first
   - Use soft targets for student training

## Technical Implementation Details

### Architecture Design Rationale

**Why a lightweight ResNet variant?**

1. **CIFAR-100 Image Size**: At 32×32 pixels, CIFAR images contain less spatial information than ImageNet (224×224)
   - Standard ResNet-34's [3,4,6,3] block structure is over-parameterized
   - Our [1,1,1,1] structure provides sufficient capacity without overfitting

2. **Parameter Efficiency**:
   - 5M parameters: Sweet spot between underfitting and overfitting
   - Faster training: 100 epochs in ~2-3 hours vs 5-6 hours for ResNet-34
   - Lower memory footprint: Can use larger batch sizes

3. **CIFAR-Specific Modifications**:
   - **3×3 initial conv** (vs 7×7): Preserves fine details in small images
   - **No MaxPool layer**: Maintains spatial resolution (32×32 → 4×4 over 4 stages)
   - **Stride-2 convolutions**: Gradual downsampling for feature hierarchy

### Code Reference

From `model_cifar.py`:
```python
class ResNet34(nn.Module):
    def __init__(self, num_classes=100):
        super().__init__()
        self.in_channels = 64
        
        # CIFAR-specific: 3×3 conv, no maxpool
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        
        # 4 stages with 1 BasicBlock each
        self.layer1 = self._make_layer(64, 1)         # 32×32×64
        self.layer2 = self._make_layer(128, 1, stride=2)  # 16×16×128
        self.layer3 = self._make_layer(256, 1, stride=2)  # 8×8×256
        self.layer4 = self._make_layer(512, 1, stride=2)  # 4×4×512
        
        # Classification head
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)
```

**BasicBlock** (2 conv layers + skip connection):
```python
class BasicBlock(nn.Module):
    def forward(self, x):
        identity = x
        out = F.relu(self.bn1(self.conv1(x)))    # Conv → BN → ReLU
        out = self.bn2(self.conv2(out))           # Conv → BN
        out += identity                            # Add skip connection
        out = F.relu(out)                         # ReLU
        return out
```

## References

**Papers:**
- He et al., "Deep Residual Learning for Image Recognition" (2016) - ResNet architecture
- Smith, "Super-Convergence: Very Fast Training of Neural Networks" (2018) - OneCycle LR
- Krizhevsky, "Learning Multiple Layers of Features from Tiny Images" (2009) - CIFAR-100

**Implementation Resources:**
- PyTorch official ResNet implementation
- Albumentations library for efficient augmentation
- torchvision.datasets for CIFAR-100 loading

## License

MIT License

## Acknowledgments

This project was developed with extensive assistance from:
- ChatGPT for architecture design and debugging
- Cursor AI for code completion and refactoring
- PyTorch and torchvision communities for reference implementations

---

**Note:** Training logs, model checkpoints, and detailed per-class metrics are available in the `log/` and `plots_complete/` directories.