TempVerseFormer / README.md

update readme links

4c58b41 verified 11 months ago

4.57 kB

	---
	license: mit
	---

	# TempVerseFormer - Pre-trained Models

	[![Hugging Face Hub](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-HuggingFace%20Hub-blue?style=flat-square&logo=huggingface)](https://huggingface.co/LKyluk/TempVerseFormer)
	[![GitHub Code](https://img.shields.io/github/v/release/leo27heady/TempVerseFormer?label=TempVerseFormer&sstyle=flat-square)](https://github.com/leo27heady/TempVerseFormer)
	[![Shape Dataset Toolbox](https://img.shields.io/github/v/release/leo27heady/simple-shape-dataset-toolbox?label=shapekit&style=flat-square)](https://github.com/leo27heady/simple-shape-dataset-toolbox)
	[![WandB Logs](https://img.shields.io/badge/WandB-Training%20Logs-blue?style=flat-square&logo=wandb)](https://wandb.ai/leo27heady/pipe-transformer/reports/TempVerseFormer-Training-Logs--VmlldzoxMTg3OTQ3NQ)

	This repository hosts pre-trained models for TempVerseFormer: Temporal Modeling with Reversible Transformers, a novel architecture introduced in the research article "Temporal Modeling with Reversible Transformers".

	These models are designed for memory-efficient temporal sequence prediction, particularly for tasks involving continuous and evolving data streams. They are trained on a synthetic dataset of rotating 2D shapes, designed to evaluate temporal modeling capabilities in a controlled environment.

	## Models Included

	This repository contains pre-trained weights for the following models, as described in the research article:

	* TempFormer (Vanilla-Transformer): A standard Vanilla Transformer architecture with temporal chaining, serving as a baseline to compare against TempVerseFormer.
	* TempVerseFormer (Rev-Transformer): The core Reversible Temporal Transformer architecture, leveraging reversible blocks and time-agnostic backpropagation for memory efficiency.
	* Standard Transformer (Pipe-Transformer): A standard Transformer model that predicts only one next element at once.
	* LSTM: A Long Short-Term Memory network, representing a traditional recurrent sequence modeling approach.
	* VAE Models: Variational Autoencoder (VAE) models used for encoding and decoding images to and from a latent space:
	* Vanilla VAE: Standard VAE architecture.

	Each model checkpoint is provided as a `.pt` file containing the `state_dict` of the trained model.
	* For all of the models checkpoints available for different training configurations (e.g., with/without temporal patterns).*

	## Intended Use

	These pre-trained models are intended for:

	* Research: Facilitating further research in memory-efficient temporal modeling, reversible architectures, and time-agnostic backpropagation.
	* Benchmarking: Providing baselines for comparison with new temporal sequence modeling architectures.
	* Fine-tuning: Serving as a starting point for fine-tuning on new datasets or for related temporal prediction tasks.
	* Demonstration: Illustrating the capabilities of TempVerseFormer and its memory efficiency advantages.

	Please note: These models were primarily trained and evaluated on a synthetic dataset of rotating shapes. While they demonstrate promising results in this controlled environment, their performance on real-world datasets may vary and require further evaluation and fine-tuning.


	## How to Use

	* Configuration: Ensure you use the correct model configuration (e.g., `config_rev_transformer`, `config_vae`) that corresponds to the pre-trained checkpoint you are loading. You can find example configurations in the `configs/train` directory of the [GitHub repository](https://github.com/leo27heady/TempVerseFormer).
	* Data Preprocessing: Input data should be preprocessed in the same way as the training data. Refer to the `ShapeDataset` class in the GitHub repository for details on data loading and preprocessing.
	* Device: Load models and data onto the appropriate device (`'cpu'` or `'cuda'`).
	* Evaluation Mode: Remember to set models to `.eval()` mode for inference.

	For more detailed usage examples and specific code for different models and tasks, please refer to the [GitHub repository](https://github.com/leo27heady/TempVerseFormer) and the `train.py`, `eval.py`, and `memory_test.py` scripts.

	## Dataset

	The models were trained on a synthetic dataset of rotating 2D shapes generated using the [Simple Shape Dataset Toolbox](https://github.com/leo27heady/simple-shape-dataset-toolbox). This toolbox allows for procedural generation of customizable shape datasets.

	## License

	These pre-trained models are released under the [MIT] license.