You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NetCDF CF scale_factor Metadata Visibility Gap β€” PoC

This repository contains a proof-of-concept for a metadata visibility gap in NetCDF files using CF Conventions scale_factor and add_offset with xarray.

Overview

xarray.open_dataset() defaults to decode_cf=True, which correctly applies CF scale_factor and add_offset transforms during loading. After decoding, xarray relocates these attributes from the variable's .attrs dictionary to .encoding.

xarray's own API documentation distinguishes these two namespaces by design:

Property Official definition
DataArray.attrs "Dictionary storing arbitrary metadata with this array"
DataArray.encoding "Dictionary of format-specific settings for how this array should be serialized"

After decode, scale_factor moves from the semantic metadata namespace (.attrs) to the serialization namespace (.encoding). Standard post-load inspection paths β€” .attrs, repr(ds), repr(ds['var']), ds.to_dict(), ds.info() β€” do not expose that a scale transform was applied.

This creates an auditability gap: a validator consulting .attrs to audit variable metadata (the documented user-facing semantic path) will not find that a packing transform shaped the loaded values.

Evidence

Observation Result
xarray default decoded value 999.0
Raw stored int16 value 1
scale_factor in .attrs after decode False
scale_factor in any standard view False
scale_factor in .encoding True
Warning emitted during load False

Reproduction

pip install scipy xarray numpy
python3 create_netcdf.py ./
python3 inspect_netcdf.py model_weights.nc
python3 reproduce.py model_weights.nc

Files

File Description
model_weights.nc PoC NetCDF file (CDF-1, 332 bytes)
create_netcdf.py Creates the PoC file
inspect_netcdf.py Demonstrates all read paths
reproduce.py Standalone reproduction
requirements.txt Dependencies
expected_output.txt Expected key-value results
SHA256SUMS_T1.txt File integrity hashes

Note

This PoC does not claim that xarray's decode_cf=True behavior is incorrect or that CF Conventions are violated. The finding is an auditability and visibility issue: after default CF decoding, the applied transform metadata is relocated from the semantic attribute namespace (.attrs) to the serialization namespace (.encoding). Standard attribute access patterns β€” the user-facing paths documented for variable metadata inspection β€” do not surface the applied transform. Users who explicitly inspect .encoding can recover the metadata; the gap is that the standard semantic path does not surface it.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support