YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.


  license: bsd-3-clause
  tags:
  - test-fixtures
  - sklearn
  - tabular
  ---

  # ferrotorch / ml-sklearn-parity-v1

  scikit-learn reference outputs for ferrotorch-ml's tabular
  operations, generated by running the 5-config matrix on a fixed
  deterministic dataset and snapshotting the inputs + outputs as
  `.bin` (multi-tensor f32) and `.json` (integer indices) files.

  Phase D.3 of real-artifact-driven development (#1159). Companion to:
    * `scripts/pin_pretrained_ml_fixtures.py` (this pin)
    * `scripts/verify_ml_inference.py` (the harness)
    * `ferrotorch-ml/examples/ml_op_dump.rs`
    * `ferrotorch-ml/tests/conformance_sklearn_parity.rs`

  sklearn version: 1.5.2.

  ## Configurations

    * `pca_n4` β€” sklearn.decomposition.PCA(n_components=4).fit_transform (equality_mode=COSINE_SIM_PER_PC)
  • standard_scaler β€” sklearn.preprocessing.StandardScaler().fit_transform (equality_mode=MAX_ABS)

  • one_hot_encoder β€” sklearn.preprocessing.OneHotEncoder(sparse_output=False).fit_transform (equality_mode=EXACT)

  • kfold_5 β€” sklearn.model_selection.KFold(n_splits=5, shuffle=True, random_state=42).split(arange(50)) (equality_mode=SET)

  • train_test_split_80_20 β€” sklearn.model_selection.train_test_split(X[100,10], y[100], test_size=0.2, random_state=42) (equality_mode=SET)

    ## Layout
    
    One subfolder per configuration:
    
    ```
    <config_name>/
      meta.json
      input_*.bin        # one or more input tensors (f32 LE multi-tensor)
      output_*.bin       # sklearn reference output(s) (f32 LE multi-tensor)
      fold_indices.json  # kfold_5 only β€” integer fold index lists
      split_indices.json # train_test_split_80_20 only β€” split indices
    ```
    
    ## Binary format
    
    Each `.bin` file is a little-endian multi-tensor dump (same as
    ferrotorch/dataloader-batches-v1 and ferrotorch/optimizer-trajectories-v1):
    
    ```
    [u32 num_tensors]
    per-tensor:
      [u32 ndim] [u32 Γ— ndim shape] [f32 Γ— prod(shape)]
    ```
    
    ## Equality semantics
    
    * `pca_n4` β€” cosine_sim β‰₯ 0.9999 PER PRINCIPAL COMPONENT (PCs may
      flip sign across implementations; the harness aligns each PC's
      sign before computing max_abs).
    * `standard_scaler` β€” max_abs ≀ 1e-6 (essentially exact f32
      arithmetic; sklearn + ferrolearn both use biased variance /n).
    * `one_hot_encoder` β€” exact integer equality.
    * `kfold_5` β€” SET-equality. rust's `rand` crate (SmallRng) and
      numpy's PRNG cannot byte-match the shuffle permutation; each
      test fold must have exact size 10, and the union of all test
      folds must equal [0, 50).
    * `train_test_split_80_20` β€” SET-equality. Sizes must be exactly
      80/20; union of train+test indices == [0, 100); test labels
      must match test X rows (label consistency invariant).
    
    ## License
    
    BSD-3-Clause (scikit-learn inherits BSD-3-Clause; the reference
    outputs are deterministic projections of public-domain numpy
    random state, so the BSD-3-Clause notice flows through).
    
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support