YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
license: bsd-3-clause
tags:
- test-fixtures
- sklearn
- tabular
---
# ferrotorch / ml-sklearn-parity-v1
scikit-learn reference outputs for ferrotorch-ml's tabular
operations, generated by running the 5-config matrix on a fixed
deterministic dataset and snapshotting the inputs + outputs as
`.bin` (multi-tensor f32) and `.json` (integer indices) files.
Phase D.3 of real-artifact-driven development (#1159). Companion to:
* `scripts/pin_pretrained_ml_fixtures.py` (this pin)
* `scripts/verify_ml_inference.py` (the harness)
* `ferrotorch-ml/examples/ml_op_dump.rs`
* `ferrotorch-ml/tests/conformance_sklearn_parity.rs`
sklearn version: 1.5.2.
## Configurations
* `pca_n4` β sklearn.decomposition.PCA(n_components=4).fit_transform (equality_mode=COSINE_SIM_PER_PC)
standard_scalerβ sklearn.preprocessing.StandardScaler().fit_transform (equality_mode=MAX_ABS)one_hot_encoderβ sklearn.preprocessing.OneHotEncoder(sparse_output=False).fit_transform (equality_mode=EXACT)kfold_5β sklearn.model_selection.KFold(n_splits=5, shuffle=True, random_state=42).split(arange(50)) (equality_mode=SET)train_test_split_80_20β sklearn.model_selection.train_test_split(X[100,10], y[100], test_size=0.2, random_state=42) (equality_mode=SET)## Layout One subfolder per configuration: ``` <config_name>/ meta.json input_*.bin # one or more input tensors (f32 LE multi-tensor) output_*.bin # sklearn reference output(s) (f32 LE multi-tensor) fold_indices.json # kfold_5 only β integer fold index lists split_indices.json # train_test_split_80_20 only β split indices ``` ## Binary format Each `.bin` file is a little-endian multi-tensor dump (same as ferrotorch/dataloader-batches-v1 and ferrotorch/optimizer-trajectories-v1): ``` [u32 num_tensors] per-tensor: [u32 ndim] [u32 Γ ndim shape] [f32 Γ prod(shape)] ``` ## Equality semantics * `pca_n4` β cosine_sim β₯ 0.9999 PER PRINCIPAL COMPONENT (PCs may flip sign across implementations; the harness aligns each PC's sign before computing max_abs). * `standard_scaler` β max_abs β€ 1e-6 (essentially exact f32 arithmetic; sklearn + ferrolearn both use biased variance /n). * `one_hot_encoder` β exact integer equality. * `kfold_5` β SET-equality. rust's `rand` crate (SmallRng) and numpy's PRNG cannot byte-match the shuffle permutation; each test fold must have exact size 10, and the union of all test folds must equal [0, 50). * `train_test_split_80_20` β SET-equality. Sizes must be exactly 80/20; union of train+test indices == [0, 100); test labels must match test X rows (label consistency invariant). ## License BSD-3-Clause (scikit-learn inherits BSD-3-Clause; the reference outputs are deterministic projections of public-domain numpy random state, so the BSD-3-Clause notice flows through).
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support