metadata
license: apache-2.0
datasets:
- timm/imagenet-1k-wds
language:
- en
base_model:
- facebook/DiT-XL-2-512
DiT Model
This repository contains the implementation of the DiT (Diffusion Transformer) model, which leverages NdLinear layers for efficient multi-dimensional linear transformations. The model is designed to be compact yet powerful, suitable for various tasks requiring high-dimensional data processing.
Overview
The DiT model is built using several components:
- NdLinear: A custom PyTorch layer for projecting tensors into multi-space representations, capturing multivariate structures.
- NdMlp: A multi-layer perceptron using NdLinear layers for enhanced feature extraction.
- NdTimestepEmbedder: Embeds scalar timesteps into vector representations using NdLinear transformations.
Files
- mlp.py: Contains the implementation of various MLP architectures, including NdMlp and GluMlp.
- models_hf.py: Defines the DiT model architecture, including the DiTBlock and FinalLayer.
- ndlinear.py: Implements the NdLinear layer, which is central to the model's ability to handle multi-dimensional data efficiently.
Installation
To use the DiT model, ensure you have the required dependencies installed:
pip install torch transformers==4.52.4
License
This project is licensed under the MIT License. See the LICENSE file for more details.