YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
license: apache-2.0
tags:
- test-fixtures
- dataloader
- pytorch
---
# ferrotorch / dataloader-batches-v1
DataLoader-iteration parity fixtures for ferrotorch's
`DataLoader::iter()` implementation, generated by iterating
`torch.utils.data.DataLoader` over a deterministic 10-item dict
dataset and snapshotting every batch as a `.bin` file.
Phase C.3 of real-artifact-driven development (#1156). Companion to:
* `scripts/pin_pretrained_dataloader_batches.py` (this pin)
* `scripts/verify_dataloader_inference.py` (the harness)
* `ferrotorch-data/examples/dataloader_iterate_dump.rs`
* `ferrotorch-data/tests/conformance_dataloader_iteration.rs`
## Dataset
Fixed, deterministic, 10 items:
```
item[i] = {
"features": arange(8, dtype=f32) + i * 0.1 # shape [8]
"label": i % 3 # int
}
```
## Configurations
* `sequential` β batch_size=4 shuffle=False drop_last=False seed=None β 3 batches (equality_mode=ORDER)
sequential_droplastβ batch_size=4 shuffle=False drop_last=True seed=None β 2 batches (equality_mode=ORDER)shuffled_seededβ batch_size=4 shuffle=True drop_last=False seed=42 β 3 batches (equality_mode=SET)shuffled_droplastβ batch_size=4 shuffle=True drop_last=True seed=42 β 2 batches (equality_mode=SUBSET)batch_size_3β batch_size=3 shuffle=False drop_last=False seed=None β 4 batches (equality_mode=ORDER)## Layout One subfolder per configuration: ``` <config_name>/ meta.json batch_0000.bin batch_0001.bin batch_NNNN.bin # one file per batch, count recorded in meta.json ``` ## Binary format Each `.bin` file is a little-endian multi-tensor dump: ``` [u32 num_tensors=2] tensor 0 (features): [u32 ndim=2] [u32 B] [u32 8] [f32 * B*8] tensor 1 (labels): [u32 ndim=1] [u32 B] [f32 * B] # label-as-f32 ``` ## Equality semantics * Sequential (`shuffle=False`) configs: ORDER-equality. Rust and torch must yield items in identical order. * Shuffled (`shuffle=True`), `drop_last=False`: SET-equality. Rust's `rand` crate and torch's `torch.Generator` are different PRNGs, so the shuffle permutations cannot byte-match. The verifier requires that the multiset of items is identical. * Shuffled + `drop_last=True`: SUBSET-equality. With drop_last each side drops the trailing partial batch; because torch and rust permute differently the *kept* items differ as well, so the verifier checks that rust's kept items are a no-duplicate subset of the full 10-item dataset (encoded in `meta.json` as `full_dataset_features` / `full_dataset_labels`) of the expected length. ## License Apache 2.0. Synthetic fixtures generated by this repo's pin script; no upstream weights / data.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support