YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

  license: apache-2.0
  tags:
  - test-fixtures
  - training
  - autograd
  - optimizer
  - dataloader
  - pytorch
  ---

  # ferrotorch / training-trajectory-v1

  Multi-epoch training-trajectory parity fixtures for ferrotorch's
  full training stack — autograd + loss + optimizer + DataLoader.
  Phase E of real-artifact-driven development (#1161).

  Generated by running `torch.optim.Adam` on a fixed 3-layer MLP
  against a fixed deterministic regression dataset for 5 epochs of
  sequential iteration (`batch_size=4`, `drop_last=False`,
  no shuffling) and snapshotting the state_dict after each epoch.
  125 optimizer steps total.

  Companion to:
    * `scripts/pin_pretrained_training_trajectory.py` (this pin)
    * `scripts/verify_training_trajectory.py` (the harness)
    * `ferrotorch-train/examples/multi_epoch_train_dump.rs`
    * `ferrotorch-train/tests/conformance_multi_epoch_training.rs`

  ## Why live autograd

  Phase C.2 (#1155) verified the *optimizer step math* with
  **frozen** gradients (snapshotted from torch, re-applied on the
  ferrotorch side) to isolate one suspect at a time. This pin
  verifies the *full training loop* with **live** autograd — the
  ferrotorch side has to re-derive the gradients itself. If
  anything in the stack diverges (linear backward, relu backward,
  mse backward, Adam state, sequential dataloader iteration order)
  the harness will catch it as a per-epoch state_dict drift.

  ## Architecture

  ```
  MLP(
    Linear(64 -> 32) -> ReLU
    Linear(32 -> 16)      -> ReLU
    Linear(16 -> 8)
  )
  ```

  ## Dataset

  * `X_full.bin`  — `torch.randn(100, 64)` with seed 42
  * `y_full.bin`  — `torch.randn(100, 8)` with seed 42
  * Loss target: `F.mse_loss(pred, y, reduction='mean')`

  ## Training

  * Optimizer: `Adam(lr=0.001, betas=(0.9, 0.999), eps=1e-8)`
  * Batch size: `4`
  * Iteration: sequential (`for i in range(0, N, BATCH)` —
    equivalent to `DataLoader(shuffle=False, drop_last=False)`)
  * Epochs: `5`
  * Per-epoch losses (mean over 25 batches):
    * epoch 1: `1.099803`

epoch 2: 1.059872
epoch 3: 1.033610
epoch 4: 1.009329

epoch 5: 0.982952

## Layout

```
epoch_0_state.safetensors    # initial state (alias: initial_state.safetensors)
epoch_1_state.safetensors    # after epoch 1
epoch_2_state.safetensors    # after epoch 2
epoch_3_state.safetensors    # after epoch 3
epoch_4_state.safetensors    # after epoch 4
epoch_5_state.safetensors    # after epoch 5
X_full.bin                   # full dataset features
y_full.bin                   # full dataset targets
meta.json                    # hyperparameters + per-epoch losses
bundle.tar                   # convenience archive (registry pin checksum)
```

State-dict keys: `fc1.weight`, `fc1.bias`, `fc2.weight`,
`fc2.bias`, `fc3.weight`, `fc3.bias`.

## Tolerance

The harness gate is `max_abs <= 1e-4` and `cosine_sim >= 0.9999`
per tensor for every epoch — autograd noise budget for 125 steps
of accumulated f32 noise across two independent runtimes.

## License

Apache 2.0. Synthetic fixtures generated by this repo's pin
script; no upstream weights / data.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support