TempVerseFormer / README.md
LKyluk's picture
update readme links
4c58b41 verified
---
license: mit
---
# TempVerseFormer - Pre-trained Models
[![Hugging Face Hub](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-HuggingFace%20Hub-blue?style=flat-square&logo=huggingface)](https://huggingface.co/LKyluk/TempVerseFormer)
[![GitHub Code](https://img.shields.io/github/v/release/leo27heady/TempVerseFormer?label=TempVerseFormer&sstyle=flat-square)](https://github.com/leo27heady/TempVerseFormer)
[![Shape Dataset Toolbox](https://img.shields.io/github/v/release/leo27heady/simple-shape-dataset-toolbox?label=shapekit&style=flat-square)](https://github.com/leo27heady/simple-shape-dataset-toolbox)
[![WandB Logs](https://img.shields.io/badge/WandB-Training%20Logs-blue?style=flat-square&logo=wandb)](https://wandb.ai/leo27heady/pipe-transformer/reports/TempVerseFormer-Training-Logs--VmlldzoxMTg3OTQ3NQ)
This repository hosts pre-trained models for **TempVerseFormer: Temporal Modeling with Reversible Transformers**, a novel architecture introduced in the research article **"Temporal Modeling with Reversible Transformers"**.
These models are designed for memory-efficient temporal sequence prediction, particularly for tasks involving continuous and evolving data streams. They are trained on a synthetic dataset of rotating 2D shapes, designed to evaluate temporal modeling capabilities in a controlled environment.
## Models Included
This repository contains pre-trained weights for the following models, as described in the research article:
* **TempFormer (Vanilla-Transformer):** A standard Vanilla Transformer architecture with temporal chaining, serving as a baseline to compare against TempVerseFormer.
* **TempVerseFormer (Rev-Transformer):** The core Reversible Temporal Transformer architecture, leveraging reversible blocks and time-agnostic backpropagation for memory efficiency.
* **Standard Transformer (Pipe-Transformer):** A standard Transformer model that predicts only one next element at once.
* **LSTM:** A Long Short-Term Memory network, representing a traditional recurrent sequence modeling approach.
* **VAE Models:** Variational Autoencoder (VAE) models used for encoding and decoding images to and from a latent space:
* **Vanilla VAE:** Standard VAE architecture.
Each model checkpoint is provided as a `.pt` file containing the `state_dict` of the trained model.
* For all of the models checkpoints available for different training configurations (e.g., with/without temporal patterns).*
## Intended Use
These pre-trained models are intended for:
* **Research:** Facilitating further research in memory-efficient temporal modeling, reversible architectures, and time-agnostic backpropagation.
* **Benchmarking:** Providing baselines for comparison with new temporal sequence modeling architectures.
* **Fine-tuning:** Serving as a starting point for fine-tuning on new datasets or for related temporal prediction tasks.
* **Demonstration:** Illustrating the capabilities of TempVerseFormer and its memory efficiency advantages.
**Please note:** These models were primarily trained and evaluated on a synthetic dataset of rotating shapes. While they demonstrate promising results in this controlled environment, their performance on real-world datasets may vary and require further evaluation and fine-tuning.
## How to Use
* **Configuration:** Ensure you use the correct model configuration (e.g., `config_rev_transformer`, `config_vae`) that corresponds to the pre-trained checkpoint you are loading. You can find example configurations in the `configs/train` directory of the [GitHub repository](https://github.com/leo27heady/TempVerseFormer).
* **Data Preprocessing:** Input data should be preprocessed in the same way as the training data. Refer to the `ShapeDataset` class in the GitHub repository for details on data loading and preprocessing.
* **Device:** Load models and data onto the appropriate device (`'cpu'` or `'cuda'`).
* **Evaluation Mode:** Remember to set models to `.eval()` mode for inference.
For more detailed usage examples and specific code for different models and tasks, please refer to the [GitHub repository](https://github.com/leo27heady/TempVerseFormer) and the `train.py`, `eval.py`, and `memory_test.py` scripts.
## Dataset
The models were trained on a synthetic dataset of rotating 2D shapes generated using the [Simple Shape Dataset Toolbox](https://github.com/leo27heady/simple-shape-dataset-toolbox). This toolbox allows for procedural generation of customizable shape datasets.
## License
These pre-trained models are released under the [**MIT**] license.