YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.


  license: apache-2.0
  tags:
  - test-fixtures
  - dataloader
  - pytorch
  ---

  # ferrotorch / dataloader-batches-v1

  DataLoader-iteration parity fixtures for ferrotorch's
  `DataLoader::iter()` implementation, generated by iterating
  `torch.utils.data.DataLoader` over a deterministic 10-item dict
  dataset and snapshotting every batch as a `.bin` file.

  Phase C.3 of real-artifact-driven development (#1156). Companion to:
    * `scripts/pin_pretrained_dataloader_batches.py` (this pin)
    * `scripts/verify_dataloader_inference.py` (the harness)
    * `ferrotorch-data/examples/dataloader_iterate_dump.rs`
    * `ferrotorch-data/tests/conformance_dataloader_iteration.rs`

  ## Dataset

  Fixed, deterministic, 10 items:

  ```
  item[i] = {
      "features": arange(8, dtype=f32) + i * 0.1   # shape [8]
      "label": i % 3                                # int
  }
  ```

  ## Configurations

    * `sequential` β€” batch_size=4 shuffle=False drop_last=False seed=None β†’ 3 batches (equality_mode=ORDER)
  • sequential_droplast β€” batch_size=4 shuffle=False drop_last=True seed=None β†’ 2 batches (equality_mode=ORDER)

  • shuffled_seeded β€” batch_size=4 shuffle=True drop_last=False seed=42 β†’ 3 batches (equality_mode=SET)

  • shuffled_droplast β€” batch_size=4 shuffle=True drop_last=True seed=42 β†’ 2 batches (equality_mode=SUBSET)

  • batch_size_3 β€” batch_size=3 shuffle=False drop_last=False seed=None β†’ 4 batches (equality_mode=ORDER)

    ## Layout
    
    One subfolder per configuration:
    
    ```
    <config_name>/
      meta.json
      batch_0000.bin
      batch_0001.bin
      batch_NNNN.bin    # one file per batch, count recorded in meta.json
    ```
    
    ## Binary format
    
    Each `.bin` file is a little-endian multi-tensor dump:
    
    ```
    [u32 num_tensors=2]
    tensor 0 (features):
      [u32 ndim=2] [u32 B] [u32 8] [f32 * B*8]
    tensor 1 (labels):
      [u32 ndim=1] [u32 B] [f32 * B]   # label-as-f32
    ```
    
    ## Equality semantics
    
    * Sequential (`shuffle=False`) configs: ORDER-equality. Rust and
      torch must yield items in identical order.
    * Shuffled (`shuffle=True`), `drop_last=False`: SET-equality. Rust's
      `rand` crate and torch's `torch.Generator` are different PRNGs,
      so the shuffle permutations cannot byte-match. The verifier
      requires that the multiset of items is identical.
    * Shuffled + `drop_last=True`: SUBSET-equality. With drop_last each
      side drops the trailing partial batch; because torch and rust
      permute differently the *kept* items differ as well, so the
      verifier checks that rust's kept items are a no-duplicate subset
      of the full 10-item dataset (encoded in `meta.json` as
      `full_dataset_features` / `full_dataset_labels`) of the
      expected length.
    
    ## License
    
    Apache 2.0. Synthetic fixtures generated by this repo's pin
    script; no upstream weights / data.
    
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support