To accelerate contributions to and innovations around torchtitan, we are adding this new, experimental folder. Below are the general contributing guidelines, and we look forward to your contributions!
Contributing Guidelines
We provide this experiments/ folder to host experiments that add significant value to torchtitan, with the following principles. We refer to the part of torchtitan outside experiments as core.
- Each subfolder in
experimentswill be an experiment, with a clear theme which can be flexible, such as- a new model, or preferably a new model architecture, with its training infrastructure including parallelization functions;
- an enhancement or addition to the existing infrastructure of
torchtitan.
- It is the contributors' responsibility to justify the value of an experiment.
torchtitanteam will review proposals on a case-by-case basis. As part of the contribution, the contributors should provide documentation that clearly showcases the motivation and innovation of an experiment, including reports on performance and loss convergence. - An experiment should reuse existing
torchtitancode as much as possible, such as modules incomponents/(via a newTrainSpec) andtrain.py. For a list of extension points we provide, please refer to docs/extension.md.- The extension points are subject to change. We kindly request that contributors provide feedback if they encounter issues reusing any components, rather than simply using a copy-and-paste approach.
- The degree to which existing components are reused and whether duplications are legit will also be a criteria of whether an experiment would be accepted.
- Each experiment is independent from other experiments, and can have its own dependencies (on top of core dependencies), and its own tests.
- The dependency from
experimentstocoreis one-way. Anything inexperimentsis optional forcoreto run successfully. In particular, development incoreis not blocked by breakage inexperiments. We will utilize GitHub's CI mechanism to help test an experiment periodically and only if the experiment itself is affected by a PR. - Each experiment needs to have an owner. The owner is responsible to work with
torchtitanteam to maintain the quality and healthiness of an experiment, which includes- adapting an experiment to changes in
coreand fix broken tests, no later than the next officialtorchtitanrelease; - responding to GitHub issues and questions in a timely manner.
- adapting an experiment to changes in
torchtitanteam reserve the right to remove an experiment. In particular, an experiment should be removed if- it has served its purpose (e.g., providing findings, or getting some features upstreamed to
coreor PyTorch, etc.), or - it gets stale (e.g. not being maintained).
- it has served its purpose (e.g., providing findings, or getting some features upstreamed to