Buckets:
| # Scripts Utilities | |
| ## ScriptArguments[[trl.ScriptArguments]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class trl.ScriptArguments</name><anchor>trl.ScriptArguments</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/trl/scripts/utils.py#L156</source><parameters>[{"name": "dataset_name", "val": ": typing.Optional[str] = None"}, {"name": "dataset_config", "val": ": typing.Optional[str] = None"}, {"name": "dataset_train_split", "val": ": str = 'train'"}, {"name": "dataset_test_split", "val": ": str = 'test'"}, {"name": "dataset_streaming", "val": ": bool = False"}, {"name": "gradient_checkpointing_use_reentrant", "val": ": bool = False"}, {"name": "ignore_bias_buffers", "val": ": bool = False"}]</parameters><paramsdesc>- **dataset_name** (`str`,, *optional*) -- | |
| Path or name of the dataset to load. If `datasets` is provided, this will be ignored. | |
| - **dataset_config** (`str`, *optional*) -- | |
| Dataset configuration name. Corresponds to the `name` argument of the [load_dataset](https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset) function. | |
| If `datasets` is provided, this will be ignored. | |
| - **dataset_train_split** (`str`, *optional*, defaults to `"train"`) -- | |
| Dataset split to use for training. If `datasets` is provided, this will be ignored. | |
| - **dataset_test_split** (`str`, *optional*, defaults to `"test"`) -- | |
| Dataset split to use for evaluation. If `datasets` is provided, this will be ignored. | |
| - **dataset_streaming** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to stream the dataset. If True, the dataset will be loaded in streaming mode. If `datasets` is | |
| provided, this will be ignored. | |
| - **gradient_checkpointing_use_reentrant** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to apply `use_reentrant` for gradient checkpointing. | |
| - **ignore_bias_buffers** (`bool`, *optional*, defaults to `False`) -- | |
| Debug argument for distributed training. Fix for DDP issues with LM bias/mask buffers - invalid scalar | |
| type, inplace operation. See | |
| https://github.com/huggingface/transformers/issues/22482#issuecomment-1595790992.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Arguments common to all scripts. | |
| </div> | |
| ## TrlParser[[trl.TrlParser]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class trl.TrlParser</name><anchor>trl.TrlParser</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/trl/scripts/utils.py#L248</source><parameters>[{"name": "dataclass_types", "val": ": typing.Union[transformers.hf_argparser.DataClassType, collections.abc.Iterable[transformers.hf_argparser.DataClassType], NoneType] = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **dataclass_types** (`Union[DataClassType, Iterable[DataClassType]]`, *optional*) -- | |
| Dataclass types to use for argument parsing. | |
| - ****kwargs** -- | |
| Additional keyword arguments passed to the [transformers.HfArgumentParser](https://huggingface.co/docs/transformers/main/en/internal/trainer_utils#transformers.HfArgumentParser) constructor.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| A subclass of [transformers.HfArgumentParser](https://huggingface.co/docs/transformers/main/en/internal/trainer_utils#transformers.HfArgumentParser) designed for parsing command-line arguments with dataclass-backed | |
| configurations, while also supporting configuration file loading and environment variable management. | |
| <ExampleCodeBlock anchor="trl.TrlParser.example"> | |
| Examples: | |
| ```yaml | |
| # config.yaml | |
| env: | |
| VAR1: value1 | |
| arg1: 23 | |
| ``` | |
| </ExampleCodeBlock> | |
| <ExampleCodeBlock anchor="trl.TrlParser.example-2"> | |
| ```python | |
| # main.py | |
| import os | |
| from dataclasses import dataclass | |
| from trl import TrlParser | |
| @dataclass | |
| class MyArguments: | |
| arg1: int | |
| arg2: str = "alpha" | |
| parser = TrlParser(dataclass_types=[MyArguments]) | |
| training_args = parser.parse_args_and_config() | |
| print(training_args, os.environ.get("VAR1")) | |
| ``` | |
| </ExampleCodeBlock> | |
| <ExampleCodeBlock anchor="trl.TrlParser.example-3"> | |
| ```bash | |
| $ python main.py --config config.yaml | |
| (MyArguments(arg1=23, arg2='alpha'),) value1 | |
| $ python main.py --arg1 5 --arg2 beta | |
| (MyArguments(arg1=5, arg2='beta'),) None | |
| ``` | |
| </ExampleCodeBlock> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>parse_args_and_config</name><anchor>trl.TrlParser.parse_args_and_config</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/trl/scripts/utils.py#L317</source><parameters>[{"name": "args", "val": ": typing.Optional[collections.abc.Iterable[str]] = None"}, {"name": "return_remaining_strings", "val": ": bool = False"}, {"name": "fail_with_unknown_args", "val": ": bool = True"}]</parameters></docstring> | |
| Parse command-line args and config file into instances of the specified dataclass types. | |
| This method wraps [transformers.HfArgumentParser.parse_args_into_dataclasses](https://huggingface.co/docs/transformers/main/en/internal/trainer_utils#transformers.HfArgumentParser.parse_args_into_dataclasses) and also parses the config file | |
| specified with the `--config` flag. The config file (in YAML format) provides argument values that replace the | |
| default values in the dataclasses. Command line arguments can override values set by the config file. The | |
| method also sets any environment variables specified in the `env` field of the config file. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>parse_args_into_dataclasses</name><anchor>trl.TrlParser.parse_args_into_dataclasses</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/transformers/hf_argparser.py#L272</source><parameters>[{"name": "args", "val": " = None"}, {"name": "return_remaining_strings", "val": " = False"}, {"name": "look_for_args_file", "val": " = True"}, {"name": "args_filename", "val": " = None"}, {"name": "args_file_flag", "val": " = None"}]</parameters><paramsdesc>- **args** -- | |
| List of strings to parse. The default is taken from sys.argv. (same as argparse.ArgumentParser) | |
| - **return_remaining_strings** -- | |
| If true, also return a list of remaining argument strings. | |
| - **look_for_args_file** -- | |
| If true, will look for a ".args" file with the same base name as the entry point script for this | |
| process, and will append its potential content to the command line args. | |
| - **args_filename** -- | |
| If not None, will uses this file instead of the ".args" file specified in the previous argument. | |
| - **args_file_flag** -- | |
| If not None, will look for a file in the command-line args specified with this flag. The flag can be | |
| specified multiple times and precedence is determined by the order (last one wins).</paramsdesc><paramgroups>0</paramgroups><rettype>Tuple consisting of</rettype><retdesc>- the dataclass instances in the same order as they were passed to the initializer.abspath | |
| - if applicable, an additional namespace for more (non-dataclass backed) arguments added to the parser | |
| after initialization. | |
| - The potential list of remaining argument strings. (same as argparse.ArgumentParser.parse_known_args)</retdesc></docstring> | |
| Parse command-line args into instances of the specified dataclass types. | |
| This relies on argparse's `ArgumentParser.parse_known_args`. See the doc at: | |
| docs.python.org/3/library/argparse.html#argparse.ArgumentParser.parse_args | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>set_defaults_with_config</name><anchor>trl.TrlParser.set_defaults_with_config</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/trl/scripts/utils.py#L368</source><parameters>[{"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Overrides the parser's default values with those provided via keyword arguments, including for subparsers. | |
| Any argument with an updated default will also be marked as not required if it was previously required. | |
| Returns a list of strings that were not consumed by the parser. | |
| </div></div> | |
| ## get_dataset[[trl.get_dataset]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>trl.get_dataset</name><anchor>trl.get_dataset</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/trl/scripts/utils.py#L421</source><parameters>[{"name": "mixture_config", "val": ": DatasetMixtureConfig"}]</parameters><paramsdesc>- **mixture_config** ([DatasetMixtureConfig](/docs/trl/pr_4305/en/script_utils#trl.DatasetMixtureConfig)) -- | |
| Script arguments containing dataset configuration.</paramsdesc><paramgroups>0</paramgroups><rettype>[DatasetDict](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.DatasetDict)</rettype><retdesc>Combined dataset(s) from the mixture configuration, with optional train/test split if `test_split_size` is | |
| set.</retdesc></docstring> | |
| Load a mixture of datasets based on the configuration. | |
| <ExampleCodeBlock anchor="trl.get_dataset.example"> | |
| Example: | |
| ```python | |
| from trl import DatasetMixtureConfig, get_dataset | |
| from trl.scripts.utils import DatasetConfig | |
| mixture_config = DatasetMixtureConfig(datasets=[DatasetConfig(path="trl-lib/tldr")]) | |
| dataset = get_dataset(mixture_config) | |
| print(dataset) | |
| ``` | |
| </ExampleCodeBlock> | |
| <ExampleCodeBlock anchor="trl.get_dataset.example-2"> | |
| ``` | |
| DatasetDict({ | |
| train: Dataset({ | |
| features: ['prompt', 'completion'], | |
| num_rows: 116722 | |
| }) | |
| }) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## DatasetConfig[[trl.scripts.utils.DatasetConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class trl.scripts.utils.DatasetConfig</name><anchor>trl.scripts.utils.DatasetConfig</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/trl/scripts/utils.py#L58</source><parameters>[{"name": "path", "val": ": str"}, {"name": "name", "val": ": typing.Optional[str] = None"}, {"name": "data_dir", "val": ": typing.Optional[str] = None"}, {"name": "data_files", "val": ": typing.Union[str, list[str], dict[str, str], NoneType] = None"}, {"name": "split", "val": ": str = 'train'"}, {"name": "columns", "val": ": typing.Optional[list[str]] = None"}]</parameters><paramsdesc>- **path** (`str`) -- | |
| Path or name of the dataset. | |
| - **name** (`str`, *optional*) -- | |
| Defining the name of the dataset configuration. | |
| - **data_dir** (`str`, *optional*) -- | |
| Defining the `data_dir` of the dataset configuration. If specified for the generic builders(csv, text etc.) | |
| or the Hub datasets and `data_files` is `None`, the behavior is equal to passing `os.path.join(data_dir, | |
| **)` as `data_files` to reference all the files in a directory. | |
| - **data_files** (`str` or `Sequence` or `Mapping`, *optional*) -- | |
| Path(s) to source data file(s). | |
| - **split** (`str`, *optional*, defaults to `"train"`) -- | |
| Which split of the data to load. | |
| - **columns** (`list[str]`, *optional*) -- | |
| List of column names to select from the dataset. If `None`, all columns are selected.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration for a dataset. | |
| This class matches the signature of [load_dataset](https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset) and the arguments are used directly in the | |
| [load_dataset](https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset) function. You can refer to the [load_dataset](https://huggingface.co/docs/datasets/main/en/package_reference/loading_methods#datasets.load_dataset) documentation for more | |
| details. | |
| </div> | |
| ## DatasetMixtureConfig[[trl.DatasetMixtureConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class trl.DatasetMixtureConfig</name><anchor>trl.DatasetMixtureConfig</anchor><source>https://github.com/huggingface/trl/blob/vr_4305/trl/scripts/utils.py#L92</source><parameters>[{"name": "datasets", "val": ": list = <factory>"}, {"name": "streaming", "val": ": bool = False"}, {"name": "test_split_size", "val": ": typing.Optional[float] = None"}]</parameters><paramsdesc>- **datasets** (`list[DatasetConfig]`) -- | |
| List of dataset configurations to include in the mixture. | |
| - **streaming** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to stream the datasets. If `True`, the datasets will be loaded in streaming mode. | |
| - **test_split_size** (`float`, *optional*) -- | |
| Size of the test split. Refer to the `test_size` parameter in the `train_test_split` function | |
| for more details. If `None`, the dataset will not be split into train and test sets.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for a mixture of datasets. | |
| Using [HfArgumentParser](https://huggingface.co/docs/transformers/main/en/internal/trainer_utils#transformers.HfArgumentParser) we can turn this class into | |
| [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the | |
| command line. | |
| Usage: | |
| <ExampleCodeBlock anchor="trl.DatasetMixtureConfig.example"> | |
| When using the CLI, you can add the following section to your YAML config file: | |
| ```yaml | |
| datasets: | |
| - path: ... | |
| name: ... | |
| data_dir: ... | |
| data_files: ... | |
| split: ... | |
| columns: ... | |
| - path: ... | |
| name: ... | |
| data_dir: ... | |
| data_files: ... | |
| split: ... | |
| columns: ... | |
| streaming: ... | |
| test_split_size: ... | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/trl/blob/main/docs/source/script_utils.md" /> |
Xet Storage Details
- Size:
- 13.7 kB
- Xet hash:
- 4a3c212d3dc052805ab876bcd6de173da412c4c7dd1ae8b7e34aeea6b05d2df7
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.