Improve model card: Add metadata, paper information, and detailed usage
#3
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1 +1,89 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: image-feature-extraction
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation
|
| 7 |
+
|
| 8 |
+
The models and code in this repository are part of the research presented in the paper:
|
| 9 |
+
[**A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation**](https://huggingface.co/papers/2501.13718)
|
| 10 |
+
|
| 11 |
+
**Authors**: Dario Serez, Marco Cristani, Alessio Del Bue, Vittorio Murino, Pietro Morerio
|
| 12 |
+
|
| 13 |
+
**Code and Pre-trained Models**: [https://github.com/SerezD/mi_ml_gen](https://github.com/SerezD/mi_ml_gen)
|
| 14 |
+
|
| 15 |
+
## Abstract
|
| 16 |
+
In image generation, Multiple Latent Variable Generative Models (MLVGMs) employ multiple latent variables to gradually shape the final images, from global characteristics to finer and local details (e.g., StyleGAN, NVAE), emerging as powerful tools for diverse applications. Yet their generative dynamics remain only empirically observed, without a systematic understanding of each latent variable's impact. In this work, we propose a novel framework that quantifies the contribution of each latent variable using Mutual Information (MI) as a metric. Our analysis reveals that current MLVGMs often underutilize some latent variables, and provides actionable insights for their use in downstream applications. With this foundation, we introduce a method for generating synthetic data for Self-Supervised Contrastive Representation Learning (SSCRL). By leveraging the hierarchical and disentangled variables of MLVGMs, our approach produces diverse and semantically meaningful views without the need for real image data. Additionally, we introduce a Continuous Sampling (CS) strategy, where the generator dynamically creates new samples during SSCRL training, greatly increasing data variability. Our comprehensive experiments demonstrate the effectiveness of these contributions, showing that MLVGMs' generated views compete on par with or even surpass views generated from real data. This work establishes a principled approach to understanding and exploiting MLVGMs, advancing both generative modeling and self-supervised learning.
|
| 17 |
+
|
| 18 |
+
## Usage
|
| 19 |
+
|
| 20 |
+
This repository provides the tools and pre-trained models to generate multiple views from generative models and to train image feature encoders (like SimSiam and Byol) using these views for self-supervised contrastive representation learning, as well as evaluating them on downstream classification tasks.
|
| 21 |
+
|
| 22 |
+
### Installation
|
| 23 |
+
|
| 24 |
+
First, set up the environment and install dependencies as described in the [GitHub repository](https://github.com/SerezD/mi_ml_gen#installation):
|
| 25 |
+
|
| 26 |
+
```bash
|
| 27 |
+
# Dependencies Install
|
| 28 |
+
conda env create --file environment.yml
|
| 29 |
+
conda activate mi_ml_gen
|
| 30 |
+
|
| 31 |
+
# package install (in development mode)
|
| 32 |
+
conda develop ./mi_ml_gen
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
### Generating Multiple Views (Image Generation)
|
| 36 |
+
|
| 37 |
+
To generate diverse image views using the trained generative models (e.g., BigBiGAN, StyleGAN-2), refer to the script and configurations in the GitHub repository:
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
python mi_ml_gen/src/scripts/view_generation.py --configuration ./conf.yaml --save_folder ./tmp/
|
| 41 |
+
```
|
| 42 |
+
Examples of valid configurations are available at `mi_ml_gen/configurations/view_generation/bigbigan.yaml` and `mi_ml_gen/configurations/view_generation/stylegan.yaml`.
|
| 43 |
+
|
| 44 |
+
Example generated views:
|
| 45 |
+
   
|
| 46 |
+
   
|
| 47 |
+
|
| 48 |
+
### Training Encoders (Image Feature Extraction)
|
| 49 |
+
|
| 50 |
+
To train new image feature encoders (SimSiam, Byol) from scratch using generated data, or to use them for classification tasks (effectively as feature extractors), follow the instructions in the [GitHub repository](https://github.com/SerezD/mi_ml_gen#train-encoders-simclr-siam-byol).
|
| 51 |
+
|
| 52 |
+
Pre-trained encoder models are available for download at: [https://huggingface.co/SerezD/mi_ml_gen/tree/main/runs/encoders/](https://huggingface.co/SerezD/mi_ml_gen/tree/main/runs/encoders/)
|
| 53 |
+
|
| 54 |
+
You can train a new encoder using the script:
|
| 55 |
+
```bash
|
| 56 |
+
python mi_ml_gen/src/multiview_encoders/train_encoder.py --seed 0 --encoder simsiam --conf simsiam_bigbigan/encoder_imagenet_baseline_real --data_path ./datasets/imagenet/ffcv/ --logging
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
### Training and Testing Linear Classifiers (Downstream Evaluation of Features)
|
| 60 |
+
|
| 61 |
+
To evaluate the quality of the extracted features from a pre-trained encoder on various downstream classification tasks, you can train a linear classifier on top of the encoder. More details and configurations can be found in the [GitHub repository](https://github.com/SerezD/mi_ml_gen#train-and-test-linear-classifiers).
|
| 62 |
+
|
| 63 |
+
Pre-trained linear classifiers are available at: [https://huggingface.co/SerezD/mi_ml_gen/tree/main/runs/linear_classifiers/](https://huggingface.co/SerezD/mi_ml_gen/tree/main/runs/linear_classifiers/)
|
| 64 |
+
|
| 65 |
+
An example for training a classifier:
|
| 66 |
+
```bash
|
| 67 |
+
python mi_ml_gen/src/evaluations/classification/train_classifier.py --encoder_path './runs/encoder_lsun_baseline_real/last.ckpt' --data_path './datasets/StanfordCars/ffcv' --conf 'classifier_lsun' --dataset 'StanfordCars' --run_name 'tmp' --seed 0 --logging
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
And for evaluating a classifier on the test set:
|
| 71 |
+
```bash
|
| 72 |
+
python mi_ml_gen/src/evaluations/classification/eval_classifier.py --lin_cls_path './runs/LinCls-StanfordCars-encoder_lsun_chunks_learned_classifier/last.ckpt' --data_path './datasets/StanfordCars/ffcv/' --dataset 'StanfordCars' --batch_size 16 --out_log_file tmp
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
## Citation
|
| 76 |
+
|
| 77 |
+
If you find this work useful, please cite our paper:
|
| 78 |
+
```bibtex
|
| 79 |
+
@article{
|
| 80 |
+
serez2025a,
|
| 81 |
+
title={A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation},
|
| 82 |
+
author={Dario Serez and Marco Cristani and Alessio Del Bue and Vittorio Murino and Pietro Morerio},
|
| 83 |
+
journal={Transactions on Machine Learning Research},
|
| 84 |
+
issn={2835-8856},
|
| 85 |
+
year={2025},
|
| 86 |
+
url={https://openreview.net/forum?id=uaj8ZL2PtK},
|
| 87 |
+
note={}
|
| 88 |
+
}
|
| 89 |
+
```
|