You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

NetCDF CF scale_factor Metadata Visibility Gap — PoC

This repository contains a proof-of-concept for a metadata visibility gap in NetCDF files using CF Conventions scale_factor and add_offset with xarray.

Overview

xarray.open_dataset() defaults to decode_cf=True, which correctly applies CF scale_factor and add_offset transforms during loading. After decoding, xarray relocates these attributes from the variable's .attrs dictionary to .encoding.

xarray's own API documentation distinguishes these two namespaces by design:

Property	Official definition
`DataArray.attrs`	"Dictionary storing arbitrary metadata with this array"
`DataArray.encoding`	"Dictionary of format-specific settings for how this array should be serialized"

After decode, scale_factor moves from the semantic metadata namespace (.attrs) to the serialization namespace (.encoding). Standard post-load inspection paths — .attrs, repr(ds), repr(ds['var']), ds.to_dict(), ds.info() — do not expose that a scale transform was applied.

This creates an auditability gap: a validator consulting .attrs to audit variable metadata (the documented user-facing semantic path) will not find that a packing transform shaped the loaded values.

Evidence

Observation	Result
xarray default decoded value	999.0
Raw stored int16 value	1
`scale_factor` in `.attrs` after decode	False
`scale_factor` in any standard view	False
`scale_factor` in `.encoding`	True
Warning emitted during load	False

Reproduction

pip install scipy xarray numpy
python3 create_netcdf.py ./
python3 inspect_netcdf.py model_weights.nc
python3 reproduce.py model_weights.nc

Files

File	Description
`model_weights.nc`	PoC NetCDF file (CDF-1, 332 bytes)
`create_netcdf.py`	Creates the PoC file
`inspect_netcdf.py`	Demonstrates all read paths
`reproduce.py`	Standalone reproduction
`requirements.txt`	Dependencies
`expected_output.txt`	Expected key-value results
`SHA256SUMS_T1.txt`	File integrity hashes

Note

This PoC does not claim that xarray's decode_cf=True behavior is incorrect or that CF Conventions are violated. The finding is an auditability and visibility issue: after default CF decoding, the applied transform metadata is relocated from the semantic attribute namespace (.attrs) to the serialization namespace (.encoding). Standard attribute access patterns — the user-facing paths documented for variable metadata inspection — do not surface the applied transform. Users who explicitly inspect .encoding can recover the metadata; the gap is that the standard semantic path does not surface it.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support