Buckets:

HuggingFaceDocBuilder's picture
|
download
raw
8.92 kB

Strict Dataclasses

The huggingface_hub package provides a utility to create strict dataclasses. These are enhanced versions of Python's standard dataclass with additional validation features. Strict dataclasses ensure that fields are validated both during initialization and assignment, making them ideal for scenarios where data integrity is critical.

Overview

Strict dataclasses are created using the @strict decorator. They extend the functionality of regular dataclasses by:

  • Validating field types based on type hints
  • Supporting custom validators for additional checks
  • Optionally allowing arbitrary keyword arguments in the constructor
  • Validating fields both at initialization and during assignment

Benefits

  • Data Integrity: Ensures fields always contain valid data
  • Ease of Use: Integrates seamlessly with Python's dataclass module
  • Flexibility: Supports custom validators for complex validation logic
  • Lightweight: Requires no additional dependencies such as Pydantic, attrs, or similar libraries

Usage

Basic Example

from dataclasses import dataclass
from huggingface_hub.dataclasses import strict, as_validated_field

# Custom validator to ensure a value is positive
@as_validated_field
def positive_int(value: int):
    if not value > 0:
        raise ValueError(f"Value must be positive, got {value}")

@strict
@dataclass
class Config:
    model_type: str
    hidden_size: int = positive_int(default=16)
    vocab_size: int = 32  # Default value

    # Methods named `validate_xxx` are treated as class-wise validators
    def validate_big_enough_vocab(self):
        if self.vocab_size  [!WARNING]
> Method `.validate()` is a reserved name on strict dataclasses.
> To prevent unexpected behaviors, a `StrictDataclassDefinitionError` error will be raised if your class already defines one.

## API Reference

### `@strict`[[huggingface_hub.dataclasses.strict]]

The `@strict` decorator enhances a dataclass with strict validation.

#### huggingface_hub.dataclasses.strict[[huggingface_hub.dataclasses.strict]]

[Source](https://github.com/huggingface/huggingface_hub/blob/vr_4113/src/huggingface_hub/dataclasses.py#L56)

Decorator to add strict validation to a dataclass.

This decorator must be used on top of `@dataclass` to ensure IDEs and static typing tools
recognize the class as a dataclass.

Can be used with or without arguments:
- `@strict`
- `@strict(accept_kwargs=True)`

Example:
```py
>>> from dataclasses import dataclass
>>> from huggingface_hub.dataclasses import as_validated_field, strict, validated_field

>>> @as_validated_field
>>> def positive_int(value: int):
...     if not value >= 0:
...         raise ValueError(f"Value must be positive, got {value}")

>>> @strict(accept_kwargs=True)
... @dataclass
... class User:
...     name: str
...     age: int = positive_int(default=10)

# Initialize
>>> User(name="John")
User(name='John', age=10)

# Extra kwargs are accepted
>>> User(name="John", age=30, lastname="Doe")
User(name='John', age=30, *lastname='Doe')

# Invalid type => raises
>>> User(name="John", age="30")
huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'age':
    TypeError: Field 'age' expected int, got str (value: '30')

# Invalid value => raises
>>> User(name="John", age=-1)
huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'age':
    ValueError: Value must be positive, got -1

Parameters:

cls : The class to convert to a strict dataclass.

accept_kwargs (bool, optional) : If True, allows arbitrary keyword arguments in __init__. Defaults to False.

Returns:

The enhanced dataclass with strict validation on field assignment.

validate_typed_dict[[huggingface_hub.dataclasses.validate_typed_dict]]

Method to validate that a dictionary conforms to the types defined in a TypedDict class.

This is the equivalent to dataclass validation but for TypedDicts. Since typed dicts are never instantiated (only used by static type checkers), validation step must be manually called.

huggingface_hub.dataclasses.validate_typed_dict[[huggingface_hub.dataclasses.validate_typed_dict]]

Source

Validate that a dictionary conforms to the types defined in a TypedDict class.

Under the hood, the typed dict is converted to a strict dataclass and validated using the @strict decorator.

Example:

>>> from typing import Annotated, TypedDict
>>> from huggingface_hub.dataclasses import validate_typed_dict

>>> def positive_int(value: int):
...     if not value >= 0:
...         raise ValueError(f"Value must be positive, got {value}")

>>> class User(TypedDict):
...     name: str
...     age: Annotated[int, positive_int]

>>> # Valid data
>>> validate_typed_dict(User, {"name": "John", "age": 30})

>>> # Invalid type for age
>>> validate_typed_dict(User, {"name": "John", "age": "30"})
huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'age':
    TypeError: Field 'age' expected int, got str (value: '30')

>>> # Invalid value for age
>>> validate_typed_dict(User, {"name": "John", "age": -1})
huggingface_hub.errors.StrictDataclassFieldValidationError: Validation error for field 'age':
    ValueError: Value must be positive, got -1

Parameters:

schema (type[TypedDictType]) : The TypedDict class defining the expected structure and types.

data (dict) : The dictionary to validate.

as_validated_field[[huggingface_hub.dataclasses.as_validated_field]]

Decorator to create a validated_field. Recommended for fields with a single validator to avoid boilerplate code.

huggingface_hub.dataclasses.as_validated_field[[huggingface_hub.dataclasses.as_validated_field]]

Source

Decorates a validator function as a validated_field (i.e. a dataclass field with a custom validator).

Parameters:

validator (Callable) : A method that takes a value as input and raises ValueError/TypeError if the value is invalid.

validated_field[[huggingface_hub.dataclasses.validated_field]]

Creates a dataclass field with custom validation.

huggingface_hub.dataclasses.validated_field[[huggingface_hub.dataclasses.validated_field]]

Source

Create a dataclass field with a custom validator.

Useful to apply several checks to a field. If only applying one rule, check out the as_validated_field decorator.

Parameters:

validator (Callable or list[Callable]) : A method that takes a value as input and raises ValueError/TypeError if the value is invalid. Can be a list of validators to apply multiple checks.

  • **kwargs : Additional arguments to pass to dataclasses.field().

Returns:

A field with the validator attached in metadata

Errors[[huggingface_hub.errors.StrictDataclassError]]

huggingface_hub.errors.StrictDataclassError[[huggingface_hub.errors.StrictDataclassError]]

Source

Base exception for strict dataclasses.

huggingface_hub.errors.StrictDataclassDefinitionError[[huggingface_hub.errors.StrictDataclassDefinitionError]]

Source

Exception thrown when a strict dataclass is defined incorrectly.

huggingface_hub.errors.StrictDataclassFieldValidationError[[huggingface_hub.errors.StrictDataclassFieldValidationError]]

Source

Exception thrown when a strict dataclass fails validation for a given field.

Why Not Use pydantic? (or attrs? or marshmallow_dataclass?)

  • See discussion in https://github.com/huggingface/transformers/issues/36329 regarding adding Pydantic as a dependency. It would be a heavy addition and require careful logic to support both v1 and v2.
  • We don't need most of Pydantic's features, especially those related to automatic casting, jsonschema, serialization, aliases, etc.
  • We don't need the ability to instantiate a class from a dictionary.
  • We don't want to mutate data. In @strict, "validation" means "checking if a value is valid." In Pydantic, "validation" means "casting a value, possibly mutating it, and then checking if it's valid."
  • We don't need blazing-fast validation. @strict isn't designed for heavy loads where performance is critical. Common use cases involve validating a model configuration (performed once and negligible compared to running a model). This allows us to keep the code minimal.

Xet Storage Details

Size:
8.92 kB
·
Xet hash:
755ecfd4cfcd0cc2574215d8ed91b7d2f0c3f40c20afd6fe51c8c53ecbd656fa

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.