BryanW commited on Mar 23

Commit

1d230da

verified ·

1 Parent(s): 2ce59c4

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/data_loader.cpython-312.pyc +0 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/inference.cpython-312.pyc +0 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/local_sgd.cpython-312.pyc +0 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/logging.cpython-312.pyc +0 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/parallelism_config.cpython-312.pyc +0 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/scheduler.cpython-312.pyc +0 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/tracking.cpython-312.pyc +0 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/__init__.py +13 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py +54 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/env.py +131 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/estimate.py +316 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/launch.py +1291 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/merge.py +69 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/test.py +65 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/to_fsdp2.py +172 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/utils.py +123 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/__init__.py +66 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/examples.py +148 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/testing.py +881 -0
Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/training.py +148 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/__init__.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/abstract_nodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/algorithms.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/approximations.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/ast.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cfunctions.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cnodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cutils.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cxxnodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/fnodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/futils.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/matrix_nodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/numpy_nodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/pynodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/pyutils.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/rewriting.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/scipy_nodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__init__.py +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/__init__.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_abstract_nodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_algorithms.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_applications.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_approximations.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_ast.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_cfunctions.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_cnodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_cxxnodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_fnodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_matrix_nodes.cpython-312.pyc +0 -0
URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_numpy_nodes.cpython-312.pyc +0 -0

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/data_loader.cpython-312.pyc ADDED Viewed

Binary file (65.1 kB). View file

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/inference.cpython-312.pyc ADDED Viewed

Binary file (7.56 kB). View file

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/local_sgd.cpython-312.pyc ADDED Viewed

Binary file (5.14 kB). View file

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/logging.cpython-312.pyc ADDED Viewed

Binary file (5.76 kB). View file

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/parallelism_config.cpython-312.pyc ADDED Viewed

Binary file (19.7 kB). View file

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/scheduler.cpython-312.pyc ADDED Viewed

Binary file (4.69 kB). View file

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/__pycache__/tracking.cpython-312.pyc ADDED Viewed

Binary file (64.1 kB). View file

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py ADDED Viewed

	@@ -0,0 +1,54 @@

+#!/usr/bin/env python
+# Copyright 2021 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from accelerate.commands.config import get_config_parser
+from accelerate.commands.env import env_command_parser
+from accelerate.commands.estimate import estimate_command_parser
+from accelerate.commands.launch import launch_command_parser
+from accelerate.commands.merge import merge_command_parser
+from accelerate.commands.test import test_command_parser
+from accelerate.commands.to_fsdp2 import to_fsdp2_command_parser
+from accelerate.commands.tpu import tpu_command_parser
+from accelerate.commands.utils import CustomArgumentParser
+def main():
+    parser = CustomArgumentParser("Accelerate CLI tool", usage="accelerate <command> [<args>]", allow_abbrev=False)
+    subparsers = parser.add_subparsers(help="accelerate command helpers")
+    # Register commands
+    get_config_parser(subparsers=subparsers)
+    estimate_command_parser(subparsers=subparsers)
+    env_command_parser(subparsers=subparsers)
+    launch_command_parser(subparsers=subparsers)
+    merge_command_parser(subparsers=subparsers)
+    tpu_command_parser(subparsers=subparsers)
+    test_command_parser(subparsers=subparsers)
+    to_fsdp2_command_parser(subparsers=subparsers)
+    # Let's go
+    args = parser.parse_args()
+    if not hasattr(args, "func"):
+        parser.print_help()
+        exit(1)
+    # Run
+    args.func(args)
+if __name__ == "__main__":
+    main()

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/env.py ADDED Viewed

	@@ -0,0 +1,131 @@

+#!/usr/bin/env python
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import os
+import platform
+import subprocess
+import numpy as np
+import psutil
+import torch
+from accelerate import __version__ as version
+from accelerate.commands.config import default_config_file, load_config_from_file
+from ..utils import is_mlu_available, is_musa_available, is_npu_available, is_sdaa_available, is_xpu_available
+def env_command_parser(subparsers=None):
+    if subparsers is not None:
+        parser = subparsers.add_parser("env")
+    else:
+        parser = argparse.ArgumentParser("Accelerate env command")
+    parser.add_argument(
+        "--config_file", default=None, help="The config file to use for the default values in the launching script."
+    )
+    if subparsers is not None:
+        parser.set_defaults(func=env_command)
+    return parser
+def env_command(args):
+    pt_version = torch.__version__
+    pt_cuda_available = torch.cuda.is_available()
+    pt_xpu_available = is_xpu_available()
+    pt_mlu_available = is_mlu_available()
+    pt_sdaa_available = is_sdaa_available()
+    pt_musa_available = is_musa_available()
+    pt_npu_available = is_npu_available()
+    accelerator = "N/A"
+    if pt_cuda_available:
+        accelerator = "CUDA"
+    elif pt_xpu_available:
+        accelerator = "XPU"
+    elif pt_mlu_available:
+        accelerator = "MLU"
+    elif pt_sdaa_available:
+        accelerator = "SDAA"
+    elif pt_musa_available:
+        accelerator = "MUSA"
+    elif pt_npu_available:
+        accelerator = "NPU"
+    accelerate_config = "Not found"
+    # Get the default from the config file.
+    if args.config_file is not None or os.path.isfile(default_config_file):
+        accelerate_config = load_config_from_file(args.config_file).to_dict()
+    # if we can run which, get it
+    command = None
+    bash_location = "Not found"
+    if os.name == "nt":
+        command = ["where", "accelerate"]
+    elif os.name == "posix":
+        command = ["which", "accelerate"]
+    if command is not None:
+        bash_location = subprocess.check_output(command, text=True, stderr=subprocess.STDOUT).strip()
+    info = {
+        "`Accelerate` version": version,
+        "Platform": platform.platform(),
+        "`accelerate` bash location": bash_location,
+        "Python version": platform.python_version(),
+        "Numpy version": np.__version__,
+        "PyTorch version": f"{pt_version}",
+        "PyTorch accelerator": accelerator,
+        "System RAM": f"{psutil.virtual_memory().total / 1024**3:.2f} GB",
+    }
+    if pt_cuda_available:
+        info["GPU type"] = torch.cuda.get_device_name()
+    elif pt_xpu_available:
+        info["XPU type"] = torch.xpu.get_device_name()
+    elif pt_mlu_available:
+        info["MLU type"] = torch.mlu.get_device_name()
+    elif pt_sdaa_available:
+        info["SDAA type"] = torch.sdaa.get_device_name()
+    elif pt_musa_available:
+        info["MUSA type"] = torch.musa.get_device_name()
+    elif pt_npu_available:
+        info["CANN version"] = torch.version.cann
+    print("\nCopy-and-paste the text below in your GitHub issue\n")
+    print("\n".join([f"- {prop}: {val}" for prop, val in info.items()]))
+    print("- `Accelerate` default config:" if args.config_file is None else "- `Accelerate` config passed:")
+    accelerate_config_str = (
+        "\n".join([f"\t- {prop}: {val}" for prop, val in accelerate_config.items()])
+        if isinstance(accelerate_config, dict)
+        else f"\t{accelerate_config}"
+    )
+    print(accelerate_config_str)
+    info["`Accelerate` configs"] = accelerate_config
+    return info
+def main() -> int:
+    parser = env_command_parser()
+    args = parser.parse_args()
+    env_command(args)
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/estimate.py ADDED Viewed

	@@ -0,0 +1,316 @@

+#!/usr/bin/env python
+# Copyright 2023 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Optional
+import torch
+from huggingface_hub import model_info
+from huggingface_hub.utils import GatedRepoError, RepositoryNotFoundError
+from accelerate import init_empty_weights
+from accelerate.commands.utils import CustomArgumentParser
+from accelerate.utils import (
+    calculate_maximum_sizes,
+    convert_bytes,
+    is_timm_available,
+    is_transformers_available,
+)
+if is_transformers_available():
+    import transformers
+    from transformers import AutoConfig, AutoModel
+if is_timm_available():
+    import timm
+def verify_on_hub(repo: str, token: Optional[str] = None):
+    "Verifies that the model is on the hub and returns the model info."
+    try:
+        return model_info(repo, token=token)
+    except (OSError, GatedRepoError):
+        return "gated"
+    except RepositoryNotFoundError:
+        return "repo"
+def check_has_model(error):
+    """
+    Checks what library spawned `error` when a model is not found
+    """
+    if is_timm_available() and isinstance(error, RuntimeError) and "Unknown model" in error.args[0]:
+        return "timm"
+    elif (
+        is_transformers_available()
+        and isinstance(error, OSError)
+        and "does not appear to have a file named" in error.args[0]
+    ):
+        return "transformers"
+    else:
+        return "unknown"
+def create_empty_model(
+    model_name: str, library_name: str, trust_remote_code: bool = False, access_token: Optional[str] = None
+):
+    """
+    Creates an empty model in full precision from its parent library on the `Hub` to calculate the overall memory
+    consumption.
+    Args:
+        model_name (`str`):
+            The model name on the Hub
+        library_name (`str`):
+            The library the model has an integration with, such as `transformers`. Will be used if `model_name` has no
+            metadata on the Hub to determine the library.
+        trust_remote_code (`bool`, `optional`, defaults to `False`):
+            Whether or not to allow for custom models defined on the Hub in their own modeling files. This option
+            should only be set to `True` for repositories you trust and in which you have read the code, as it will
+            execute code present on the Hub on your local machine.
+        access_token (`str`, `optional`, defaults to `None`):
+            The access token to use to access private or gated models on the Hub. (for use on the Gradio app)
+    Returns:
+        `torch.nn.Module`: The torch model that has been initialized on the `meta` device.
+    """
+    model_info = verify_on_hub(model_name, access_token)
+    # Simplified errors
+    if model_info == "gated":
+        raise GatedRepoError(
+            f"Repo for model `{model_name}` is gated. You must be authenticated to access it. Please run `huggingface-cli login`."
+        )
+    elif model_info == "repo":
+        raise RepositoryNotFoundError(
+            f"Repo for model `{model_name}` does not exist on the Hub. If you are trying to access a private repo,"
+            " make sure you are authenticated via `huggingface-cli login` and have access."
+        )
+    if library_name is None:
+        library_name = getattr(model_info, "library_name", False)
+        if not library_name:
+            raise ValueError(
+                f"Model `{model_name}` does not have any library metadata on the Hub, please manually pass in a `--library_name` to use (such as `transformers`)"
+            )
+    if library_name == "transformers":
+        if not is_transformers_available():
+            raise ImportError(
+                f"To check `{model_name}`, `transformers` must be installed. Please install it via `pip install transformers`"
+            )
+        print(f"Loading pretrained config for `{model_name}` from `transformers`...")
+        if model_info.config is None:
+            raise RuntimeError(f"Tried to load `{model_name}` with `transformers` but it does not have any metadata.")
+        auto_map = model_info.config.get("auto_map", False)
+        config = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code, token=access_token)
+        with init_empty_weights():
+            # remote code could specify a specific `AutoModel` class in the `auto_map`
+            constructor = AutoModel
+            if isinstance(auto_map, dict):
+                value = None
+                for key in auto_map.keys():
+                    if key.startswith("AutoModelFor"):
+                        value = key
+                        break
+                if value is not None:
+                    constructor = getattr(transformers, value)
+            # we need to pass the dtype, otherwise it is going to use the torch_dtype that is saved in the config
+            model = constructor.from_config(config, torch_dtype=torch.float32, trust_remote_code=trust_remote_code)
+    elif library_name == "timm":
+        if not is_timm_available():
+            raise ImportError(
+                f"To check `{model_name}`, `timm` must be installed. Please install it via `pip install timm`"
+            )
+        print(f"Loading pretrained config for `{model_name}` from `timm`...")
+        with init_empty_weights():
+            model = timm.create_model(model_name, pretrained=False)
+    else:
+        raise ValueError(
+            f"Library `{library_name}` is not supported yet, please open an issue on GitHub for us to add support."
+        )
+    return model
+def create_ascii_table(headers: list, rows: list, title: str):
+    "Creates a pretty table from a list of rows, minimal version of `tabulate`."
+    sep_char, in_between = "│", "─"
+    column_widths = []
+    for i in range(len(headers)):
+        column_values = [row[i] for row in rows] + [headers[i]]
+        max_column_width = max(len(value) for value in column_values)
+        column_widths.append(max_column_width)
+    formats = [f"%{column_widths[i]}s" for i in range(len(rows[0]))]
+    pattern = f"{sep_char}{sep_char.join(formats)}{sep_char}"
+    diff = 0
+    def make_row(left_char, middle_char, right_char):
+        return f"{left_char}{middle_char.join([in_between * n for n in column_widths])}{in_between * diff}{right_char}"
+    separator = make_row("├", "┼", "┤")
+    if len(title) > sum(column_widths):
+        diff = abs(len(title) - len(separator))
+        column_widths[-1] += diff
+    # Update with diff
+    separator = make_row("├", "┼", "┤")
+    initial_rows = [
+        make_row("┌", in_between, "┐"),
+        f"{sep_char}{title.center(len(separator) - 2)}{sep_char}",
+        make_row("├", "┬", "┤"),
+    ]
+    table = "\n".join(initial_rows) + "\n"
+    column_widths[-1] += diff
+    centered_line = [text.center(column_widths[i]) for i, text in enumerate(headers)]
+    table += f"{pattern % tuple(centered_line)}\n{separator}\n"
+    for i, line in enumerate(rows):
+        centered_line = [t.center(column_widths[i]) for i, t in enumerate(line)]
+        table += f"{pattern % tuple(centered_line)}\n"
+    table += f"└{'┴'.join([in_between * n for n in column_widths])}┘"
+    return table
+def estimate_command_parser(subparsers=None):
+    if subparsers is not None:
+        parser = subparsers.add_parser("estimate-memory")
+    else:
+        parser = CustomArgumentParser(description="Model size estimator for fitting a model onto CUDA memory.")
+    parser.add_argument("model_name", type=str, help="The model name on the Hugging Face Hub.")
+    parser.add_argument(
+        "--library_name",
+        type=str,
+        help="The library the model has an integration with, such as `transformers`, needed only if this information is not stored on the Hub.",
+        choices=["timm", "transformers"],
+    )
+    parser.add_argument(
+        "--dtypes",
+        type=str,
+        nargs="+",
+        default=["float32", "float16", "int8", "int4"],
+        help="The dtypes to use for the model, must be one (or many) of `float32`, `float16`, `int8`, and `int4`",
+        choices=["float32", "float16", "int8", "int4"],
+    )
+    parser.add_argument(
+        "--trust_remote_code",
+        action="store_true",
+        help="""Whether or not to allow for custom models defined on the Hub in their own modeling files. This flag
+                should only be used for repositories you trust and in which you have read the code, as it will execute
+                code present on the Hub on your local machine.""",
+        default=False,
+    )
+    if subparsers is not None:
+        parser.set_defaults(func=estimate_command)
+    return parser
+def estimate_training_usage(bytes: int, mixed_precision: str, msamp_config: Optional[str] = None) -> dict:
+    """
+    Given an amount of `bytes` and `mixed_precision`, calculates how much training memory is needed for a batch size of
+    1.
+    Args:
+        bytes (`int`):
+            The size of the model being trained.
+        mixed_precision (`str`):
+            The mixed precision that would be ran.
+        msamp_config (`str`):
+            The msamp config to estimate the training memory for if `mixed_precision` is set to `"fp8"`.
+    """
+    memory_sizes = {"model": -1, "optimizer": -1, "gradients": -1, "step": -1}
+    fp32_size = bytes
+    fp16_size = bytes // 2
+    if mixed_precision == "float32":
+        memory_sizes["model"] = fp32_size
+        memory_sizes["gradients"] = fp32_size
+        memory_sizes["optimizer"] = fp32_size * 2
+        memory_sizes["step"] = fp32_size * 4
+    elif mixed_precision in ("float16", "bfloat16") or (mixed_precision == "fp8" and msamp_config is None):
+        # With native `TransformersEngine`, there is no memory savings with FP8
+        # With mixed precision training, the model has weights stored
+        # in FP16 and FP32
+        memory_sizes["model"] = fp32_size
+        # 1.5 from weight gradient + computation (GEMM)
+        memory_sizes["gradients"] = fp32_size + fp16_size
+        # 2x from optimizer states
+        memory_sizes["optimizer"] = fp32_size * 2  # Optimizer states
+        memory_sizes["step"] = memory_sizes["optimizer"]
+    return memory_sizes
+def gather_data(args):
+    "Creates an empty model and gathers the data for the sizes"
+    try:
+        model = create_empty_model(
+            args.model_name, library_name=args.library_name, trust_remote_code=args.trust_remote_code
+        )
+    except (RuntimeError, OSError) as e:
+        library = check_has_model(e)
+        if library != "unknown":
+            raise RuntimeError(
+                f"Tried to load `{args.model_name}` with `{library}` but a possible model to load was not found inside the repo."
+            )
+        raise e
+    total_size, largest_layer = calculate_maximum_sizes(model)
+    data = []
+    for dtype in args.dtypes:
+        dtype_total_size = total_size
+        dtype_largest_layer = largest_layer[0]
+        dtype_training_size = estimate_training_usage(dtype_total_size, dtype)
+        if dtype == "float16":
+            dtype_total_size /= 2
+            dtype_largest_layer /= 2
+        elif dtype == "int8":
+            dtype_total_size /= 4
+            dtype_largest_layer /= 4
+        elif dtype == "int4":
+            dtype_total_size /= 8
+            dtype_largest_layer /= 8
+        data.append([dtype, dtype_largest_layer, dtype_total_size, dtype_training_size])
+    return data
+def estimate_command(args):
+    data = gather_data(args)
+    for row in data:
+        for i, item in enumerate(row):
+            if isinstance(item, (int, float)):
+                row[i] = convert_bytes(item)
+            elif isinstance(item, dict):
+                training_usage = max(item.values())
+                row[i] = convert_bytes(training_usage) if training_usage != -1 else "N/A"
+    headers = ["dtype", "Largest Layer", "Total Size", "Training using Adam"]
+    title = f"Memory Usage for loading `{args.model_name}`"
+    table = create_ascii_table(headers, data, title)
+    print(table)
+def main():
+    parser = estimate_command_parser()
+    args = parser.parse_args()
+    estimate_command(args)
+if __name__ == "__main__":
+    main()

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/launch.py ADDED Viewed

	@@ -0,0 +1,1291 @@

+#!/usr/bin/env python
+# Copyright 2021 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import importlib
+import logging
+import os
+import subprocess
+import sys
+from pathlib import Path
+import psutil
+import torch
+from accelerate.commands.config import default_config_file, load_config_from_file
+from accelerate.commands.config.config_args import SageMakerConfig
+from accelerate.commands.config.config_utils import DYNAMO_BACKENDS
+from accelerate.commands.utils import CustomArgumentParser
+from accelerate.state import get_int_from_env
+from accelerate.utils import (
+    ComputeEnvironment,
+    DistributedType,
+    PrepareForLaunch,
+    _filter_args,
+    check_cuda_p2p_ib_support,
+    convert_dict_to_env_variables,
+    is_bf16_available,
+    is_deepspeed_available,
+    is_hpu_available,
+    is_mlu_available,
+    is_musa_available,
+    is_npu_available,
+    is_rich_available,
+    is_sagemaker_available,
+    is_sdaa_available,
+    is_torch_xla_available,
+    is_xpu_available,
+    patch_environment,
+    prepare_deepspeed_cmd_env,
+    prepare_multi_gpu_env,
+    prepare_sagemager_args_inputs,
+    prepare_simple_launcher_cmd_env,
+    prepare_tpu,
+    str_to_bool,
+)
+from accelerate.utils.constants import DEEPSPEED_MULTINODE_LAUNCHERS, TORCH_DYNAMO_MODES
+if is_rich_available():
+    from rich import get_console
+    from rich.logging import RichHandler
+    FORMAT = "%(message)s"
+    logging.basicConfig(format=FORMAT, datefmt="[%X]", handlers=[RichHandler()])
+logger = logging.getLogger(__name__)
+options_to_group = {
+    "multi_gpu": "Distributed GPUs",
+    "tpu": "TPU",
+    "use_deepspeed": "DeepSpeed Arguments",
+    "use_fsdp": "FSDP Arguments",
+    "use_megatron_lm": "Megatron-LM Arguments",
+    "fp8_backend": "FP8 Arguments",
+}
+def clean_option(option):
+    "Finds all cases of - after the first two characters and changes them to _"
+    if "fp8_backend" in option:
+        option = "--fp8_backend"
+    if option.startswith("--"):
+        return option[2:].replace("-", "_")
+class CustomHelpFormatter(argparse.HelpFormatter):
+    """
+    This is a custom help formatter that will hide all arguments that are not used in the command line when the help is
+    called. This is useful for the case where the user is using a specific platform and only wants to see the arguments
+    for that platform.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.titles = [
+            "Hardware Selection Arguments",
+            "Resource Selection Arguments",
+            "Training Paradigm Arguments",
+            "positional arguments",
+            "optional arguments",
+        ]
+    def add_argument(self, action: argparse.Action):
+        if "accelerate" in sys.argv[0] and "launch" in sys.argv[1:]:
+            args = sys.argv[2:]
+        else:
+            args = sys.argv[1:]
+        if len(args) > 1:
+            args = list(map(clean_option, args))
+            used_platforms = [arg for arg in args if arg in options_to_group.keys()]
+            used_titles = [options_to_group[o] for o in used_platforms]
+            if action.container.title not in self.titles + used_titles:
+                action.help = argparse.SUPPRESS
+            elif action.container.title == "Hardware Selection Arguments":
+                if set(action.option_strings).isdisjoint(set(args)):
+                    action.help = argparse.SUPPRESS
+                else:
+                    action.help = action.help + " (currently selected)"
+            elif action.container.title == "Training Paradigm Arguments":
+                if set(action.option_strings).isdisjoint(set(args)):
+                    action.help = argparse.SUPPRESS
+                else:
+                    action.help = action.help + " (currently selected)"
+        action.option_strings = [s for s in action.option_strings if "-" not in s[2:]]
+        super().add_argument(action)
+    def end_section(self):
+        if len(self._current_section.items) < 2:
+            self._current_section.items = []
+            self._current_section.heading = ""
+        super().end_section()
+def launch_command_parser(subparsers=None):
+    description = "Launch a python script in a distributed scenario. Arguments can be passed in with either hyphens (`--num-processes=2`) or underscores (`--num_processes=2`)"
+    if subparsers is not None:
+        parser = subparsers.add_parser(
+            "launch", description=description, add_help=False, allow_abbrev=False, formatter_class=CustomHelpFormatter
+        )
+    else:
+        parser = CustomArgumentParser(
+            "Accelerate launch command",
+            description=description,
+            add_help=False,
+            allow_abbrev=False,
+            formatter_class=CustomHelpFormatter,
+        )
+    parser.add_argument("-h", "--help", action="help", help="Show this help message and exit.")
+    parser.add_argument(
+        "--config_file",
+        default=None,
+        help="The config file to use for the default values in the launching script.",
+    )
+    parser.add_argument(
+        "--quiet",
+        "-q",
+        action="store_true",
+        help="Silence subprocess errors from the launch stack trace and only show the relevant tracebacks. (Only applicable to DeepSpeed and single-process configurations)",
+    )
+    # Hardware selection arguments
+    hardware_args = parser.add_argument_group(
+        "Hardware Selection Arguments", "Arguments for selecting the hardware to be used."
+    )
+    hardware_args.add_argument(
+        "--cpu", default=False, action="store_true", help="Whether or not to force the training on the CPU."
+    )
+    hardware_args.add_argument(
+        "--multi_gpu",
+        default=False,
+        action="store_true",
+        help="Whether or not this should launch a distributed GPU training.",
+    )
+    hardware_args.add_argument(
+        "--tpu", default=False, action="store_true", help="Whether or not this should launch a TPU training."
+    )
+    # Resource selection arguments
+    resource_args = parser.add_argument_group(
+        "Resource Selection Arguments", "Arguments for fine-tuning how available hardware should be used."
+    )
+    resource_args.add_argument(
+        "--mixed_precision",
+        type=str,
+        choices=["no", "fp16", "bf16", "fp8"],
+        help="Whether or not to use mixed precision training. "
+        "Choose between FP16 and BF16 (bfloat16) training. "
+        "BF16 training is only supported on Nvidia Ampere GPUs and PyTorch 1.10 or later.",
+    )
+    resource_args.add_argument(
+        "--num_processes", type=int, default=None, help="The total number of processes to be launched in parallel."
+    )
+    resource_args.add_argument(
+        "--num_machines", type=int, default=None, help="The total number of machines used in this training."
+    )
+    resource_args.add_argument(
+        "--num_cpu_threads_per_process",
+        type=int,
+        default=None,
+        help="The number of CPU threads per process. Can be tuned for optimal performance.",
+    )
+    resource_args.add_argument(
+        "--enable_cpu_affinity",
+        default=False,
+        action="store_true",
+        help="Whether or not CPU affinity and balancing should be enabled. Currently only supported on NVIDIA hardware.",
+    )
+    # Dynamo arguments
+    resource_args.add_argument(
+        "--dynamo_backend",
+        type=str,
+        choices=["no"] + [b.lower() for b in DYNAMO_BACKENDS],
+        help="Choose a backend to optimize your training with dynamo, see more at "
+        "https://github.com/pytorch/torchdynamo.",
+    )
+    resource_args.add_argument(
+        "--dynamo_mode",
+        type=str,
+        default="default",
+        choices=TORCH_DYNAMO_MODES,
+        help="Choose a mode to optimize your training with dynamo.",
+    )
+    resource_args.add_argument(
+        "--dynamo_use_fullgraph",
+        default=False,
+        action="store_true",
+        help="Whether to use full graph mode for dynamo or it is ok to break model into several subgraphs",
+    )
+    resource_args.add_argument(
+        "--dynamo_use_dynamic",
+        default=False,
+        action="store_true",
+        help="Whether to enable dynamic shape tracing.",
+    )
+    resource_args.add_argument(
+        "--dynamo_use_regional_compilation",
+        default=False,
+        action="store_true",
+        help="Whether to enable regional compilation.",
+    )
+    # Training Paradigm arguments
+    paradigm_args = parser.add_argument_group(
+        "Training Paradigm Arguments", "Arguments for selecting which training paradigm to be used."
+    )
+    paradigm_args.add_argument(
+        "--use_deepspeed",
+        default=False,
+        action="store_true",
+        help="Whether to use deepspeed.",
+    )
+    paradigm_args.add_argument(
+        "--use_fsdp",
+        default=False,
+        action="store_true",
+        help="Whether to use fsdp.",
+    )
+    paradigm_args.add_argument(
+        "--use_parallelism_config",
+        default=False,
+        action="store_true",
+        help="Whether to use the parallelism config to configure the N-d distributed training.",
+    )
+    paradigm_args.add_argument(
+        "--use_megatron_lm",
+        default=False,
+        action="store_true",
+        help="Whether to use Megatron-LM.",
+    )
+    paradigm_args.add_argument(
+        "--use_xpu",
+        default=None,
+        action="store_true",
+        help="Whether to use IPEX plugin to speed up training on XPU specifically. This argument is deprecated and ignored, will be removed in Accelerate v1.20.",
+    )
+    # distributed GPU training arguments
+    distributed_args = parser.add_argument_group("Distributed GPUs", "Arguments related to distributed GPU training.")
+    distributed_args.add_argument(
+        "--gpu_ids",
+        default=None,
+        help="What GPUs (by id) should be used for training on this machine as a comma-separated list",
+    )
+    distributed_args.add_argument(
+        "--same_network",
+        default=False,
+        action="store_true",
+        help="Whether all machines used for multinode training exist on the same local network.",
+    )
+    distributed_args.add_argument(
+        "--machine_rank", type=int, default=None, help="The rank of the machine on which this script is launched."
+    )
+    distributed_args.add_argument(
+        "--main_process_ip", type=str, default=None, help="The IP address of the machine of rank 0."
+    )
+    distributed_args.add_argument(
+        "--main_process_port",
+        type=int,
+        default=None,
+        help="The port to use to communicate with the machine of rank 0.",
+    )
+    distributed_args.add_argument(
+        "-t",
+        "--tee",
+        default="0",
+        type=str,
+        help="Tee std streams into a log file and also to console.",
+    )
+    distributed_args.add_argument(
+        "--log_dir",
+        type=str,
+        default=None,
+        help=(
+            "Base directory to use for log files when using torchrun/torch.distributed.run as launcher. "
+            "Use with --tee to redirect std streams info log files."
+        ),
+    )
+    distributed_args.add_argument(
+        "--role",
+        type=str,
+        default="default",
+        help="User-defined role for the workers.",
+    )
+    # Rendezvous related arguments
+    distributed_args.add_argument(
+        "--rdzv_backend",
+        type=str,
+        default="static",
+        help="The rendezvous method to use, such as 'static' (the default) or 'c10d'",
+    )
+    distributed_args.add_argument(
+        "--rdzv_conf",
+        type=str,
+        default="",
+        help="Additional rendezvous configuration (<key1>=<value1>,<key2>=<value2>,...).",
+    )
+    distributed_args.add_argument(
+        "--max_restarts",
+        type=int,
+        default=0,
+        help="Maximum number of worker group restarts before failing.",
+    )
+    distributed_args.add_argument(
+        "--monitor_interval",
+        type=float,
+        default=0.1,
+        help="Interval, in seconds, to monitor the state of workers.",
+    )
+    parser.add_argument(
+        "-m",
+        "--module",
+        action="store_true",
+        help="Change each process to interpret the launch script as a Python module, executing with the same behavior as 'python -m'.",
+    )
+    parser.add_argument(
+        "--no_python",
+        action="store_true",
+        help="Skip prepending the training script with 'python' - just execute it directly. Useful when the script is not a Python script.",
+    )
+    # TPU arguments
+    tpu_args = parser.add_argument_group("TPU", "Arguments related to TPU.")
+    tpu_args.add_argument(
+        "--tpu_cluster",
+        action="store_true",
+        dest="tpu_use_cluster",
+        help="Whether to use a GCP TPU pod for training.",
+    )
+    tpu_args.add_argument(
+        "--no_tpu_cluster",
+        action="store_false",
+        dest="tpu_use_cluster",
+        help="Should not be passed explicitly, this is for internal use only.",
+    )
+    tpu_args.add_argument(
+        "--tpu_use_sudo",
+        action="store_true",
+        help="Whether to use `sudo` when running the TPU training script in each pod.",
+    )
+    tpu_args.add_argument(
+        "--vm",
+        type=str,
+        action="append",
+        help=(
+            "List of single Compute VM instance names. "
+            "If not provided we assume usage of instance groups. For TPU pods."
+        ),
+    )
+    tpu_args.add_argument(
+        "--env",
+        type=str,
+        action="append",
+        help="List of environment variables to set on the Compute VM instances. For TPU pods.",
+    )
+    tpu_args.add_argument(
+        "--main_training_function",
+        type=str,
+        default=None,
+        help="The name of the main function to be executed in your script (only for TPU training).",
+    )
+    tpu_args.add_argument(
+        "--downcast_bf16",
+        action="store_true",
+        help="Whether when using bf16 precision on TPUs if both float and double tensors are cast to bfloat16 or if double tensors remain as float32.",
+    )
+    # DeepSpeed arguments
+    deepspeed_args = parser.add_argument_group("DeepSpeed Arguments", "Arguments related to DeepSpeed.")
+    deepspeed_args.add_argument(
+        "--deepspeed_config_file",
+        default=None,
+        type=str,
+        help="DeepSpeed config file.",
+    )
+    deepspeed_args.add_argument(
+        "--zero_stage",
+        default=None,
+        type=int,
+        help="DeepSpeed's ZeRO optimization stage (useful only when `use_deepspeed` flag is passed). "
+        "If unspecified, will default to `2`.",
+    )
+    deepspeed_args.add_argument(
+        "--offload_optimizer_device",
+        default=None,
+        type=str,
+        help="Decides where (none|cpu|nvme) to offload optimizer states (useful only when `use_deepspeed` flag is passed). "
+        "If unspecified, will default to 'none'.",
+    )
+    deepspeed_args.add_argument(
+        "--offload_param_device",
+        default=None,
+        type=str,
+        help="Decides where (none|cpu|nvme) to offload parameters (useful only when `use_deepspeed` flag is passed). "
+        "If unspecified, will default to 'none'.",
+    )
+    deepspeed_args.add_argument(
+        "--offload_optimizer_nvme_path",
+        default=None,
+        type=str,
+        help="Decides Nvme Path to offload optimizer states (useful only when `use_deepspeed` flag is passed). "
+        "If unspecified, will default to 'none'.",
+    )
+    deepspeed_args.add_argument(
+        "--offload_param_nvme_path",
+        default=None,
+        type=str,
+        help="Decides Nvme Path to offload parameters (useful only when `use_deepspeed` flag is passed). "
+        "If unspecified, will default to 'none'.",
+    )
+    deepspeed_args.add_argument(
+        "--gradient_accumulation_steps",
+        default=None,
+        type=int,
+        help="No of gradient_accumulation_steps used in your training script (useful only when `use_deepspeed` flag is passed). "
+        "If unspecified, will default to `1`.",
+    )
+    deepspeed_args.add_argument(
+        "--gradient_clipping",
+        default=None,
+        type=float,
+        help="gradient clipping value used in your training script (useful only when `use_deepspeed` flag is passed). "
+        "If unspecified, will default to `1.0`.",
+    )
+    deepspeed_args.add_argument(
+        "--zero3_init_flag",
+        default=None,
+        type=str,
+        help="Decides Whether (true|false) to enable `deepspeed.zero.Init` for constructing massive models. "
+        "Only applicable with DeepSpeed ZeRO Stage-3. If unspecified, will default to `true`.",
+    )
+    deepspeed_args.add_argument(
+        "--zero3_save_16bit_model",
+        default=None,
+        type=str,
+        help="Decides Whether (true|false) to save 16-bit model weights when using ZeRO Stage-3. "
+        "Only applicable with DeepSpeed ZeRO Stage-3. If unspecified, will default to `false`.",
+    )
+    deepspeed_args.add_argument(
+        "--deepspeed_hostfile",
+        default=None,
+        type=str,
+        help="DeepSpeed hostfile for configuring multi-node compute resources.",
+    )
+    deepspeed_args.add_argument(
+        "--deepspeed_exclusion_filter",
+        default=None,
+        type=str,
+        help="DeepSpeed exclusion filter string when using multi-node setup.",
+    )
+    deepspeed_args.add_argument(
+        "--deepspeed_inclusion_filter",
+        default=None,
+        type=str,
+        help="DeepSpeed inclusion filter string when using multi-node setup.",
+    )
+    deepspeed_args.add_argument(
+        "--deepspeed_multinode_launcher",
+        default=None,
+        type=str,
+        help="DeepSpeed multi-node launcher to use, e.g. `pdsh`, `standard`, `openmpi`, `mvapich`, `mpich`, `slurm`, `nossh` (requires DeepSpeed >= 0.14.5). If unspecified, will default to `pdsh`.",
+    )
+    deepspeed_args.add_argument(
+        "--deepspeed_moe_layer_cls_names",
+        default=None,
+        type=str,
+        help="comma-separated list of transformer MoE layer class names (case-sensitive) to wrap ,e.g, `MixtralSparseMoeBlock`, `Qwen2MoeSparseMoeBlock`, `JetMoEAttention,JetMoEBlock` ..."
+        " (useful only when `use_deepspeed` flag is passed).",
+    )
+    # fsdp arguments
+    fsdp_args = parser.add_argument_group("FSDP Arguments", "Arguments related to Fully Shared Data Parallelism.")
+    fsdp_args.add_argument(
+        "--fsdp_version",
+        type=str,
+        default="1",
+        choices=["1", "2"],
+        help="FSDP version to use. (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_offload_params",
+        default="false",
+        type=str,
+        help="Decides Whether (true|false) to offload parameters and gradients to CPU. (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_min_num_params",
+        type=int,
+        default=1e8,
+        help="FSDP's minimum number of parameters for Default Auto Wrapping. (useful only when `use_fsdp` flag is passed).",
+    )
+    # We enable this for backwards compatibility, throw a warning if this is set in `FullyShardedDataParallelPlugin`
+    fsdp_args.add_argument(
+        "--fsdp_sharding_strategy",
+        type=str,
+        default="FULL_SHARD",
+        help="FSDP's sharding strategy. (useful only when `use_fsdp` flag is passed and `fsdp_version=1`).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_reshard_after_forward",
+        type=str,
+        default="true",
+        help="FSDP's Reshard After Forward Strategy. (useful only when `use_fsdp` flag is passed). Supports either boolean (FSDP2) or `FULL_SHARD | SHARD_GRAD_OP | NO_RESHARD` (FSDP1).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_auto_wrap_policy",
+        type=str,
+        default=None,
+        help="FSDP's auto wrap policy. (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_transformer_layer_cls_to_wrap",
+        default=None,
+        type=str,
+        help="Transformer layer class name (case-sensitive) to wrap ,e.g, `BertLayer`, `GPTJBlock`, `T5Block` .... "
+        "(useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_backward_prefetch",
+        default=None,
+        type=str,
+        help="FSDP's backward prefetch policy. (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_state_dict_type",
+        default=None,
+        type=str,
+        help="FSDP's state dict type. (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_forward_prefetch",
+        default="false",
+        type=str,
+        help="If True, then FSDP explicitly prefetches the next upcoming "
+        "all-gather while executing in the forward pass (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_use_orig_params",
+        default="true",
+        type=str,
+        help="If True, allows non-uniform `requires_grad` during init, which means support for interspersed frozen and trainable parameters."
+        " (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_cpu_ram_efficient_loading",
+        default="true",
+        type=str,
+        help="If True, only the first process loads the pretrained model checkoint while all other processes have empty weights. "
+        "Only applicable for 🤗 Transformers. When using this, `--fsdp_sync_module_states` needs to True. "
+        "(useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_sync_module_states",
+        default="true",
+        type=str,
+        help="If True, each individually wrapped FSDP unit will broadcast module parameters from rank 0."
+        " (useful only when `use_fsdp` flag is passed).",
+    )
+    fsdp_args.add_argument(
+        "--fsdp_activation_checkpointing",
+        default="false",
+        type=str,
+        help="Decides Whether (true|false) intermediate activations are freed during the forward pass, and a checkpoint is left as a placeholder. (useful only when `use_fsdp` flag is passed).",
+    )
+    # megatron_lm args
+    megatron_lm_args = parser.add_argument_group("Megatron-LM Arguments", "Arguments related to Megatron-LM.")
+    megatron_lm_args.add_argument(
+        "--megatron_lm_tp_degree",
+        type=int,
+        default=1,
+        help="Megatron-LM's Tensor Parallelism (TP) degree. (useful only when `use_megatron_lm` flag is passed).",
+    )
+    megatron_lm_args.add_argument(
+        "--megatron_lm_pp_degree",
+        type=int,
+        default=1,
+        help="Megatron-LM's Pipeline Parallelism (PP) degree. (useful only when `use_megatron_lm` flag is passed).",
+    )
+    megatron_lm_args.add_argument(
+        "--megatron_lm_num_micro_batches",
+        type=int,
+        default=None,
+        help="Megatron-LM's number of micro batches when PP degree > 1. (useful only when `use_megatron_lm` flag is passed).",
+    )
+    megatron_lm_args.add_argument(
+        "--megatron_lm_sequence_parallelism",
+        default=None,
+        type=str,
+        help="Decides Whether (true|false) to enable Sequence Parallelism when TP degree > 1. "
+        "(useful only when `use_megatron_lm` flag is passed).",
+    )
+    megatron_lm_args.add_argument(
+        "--megatron_lm_recompute_activations",
+        default=None,
+        type=str,
+        help="Decides Whether (true|false) to enable Selective Activation Recomputation. "
+        "(useful only when `use_megatron_lm` flag is passed).",
+    )
+    megatron_lm_args.add_argument(
+        "--megatron_lm_use_distributed_optimizer",
+        default=None,
+        type=str,
+        help="Decides Whether (true|false) to use distributed optimizer "
+        "which shards optimizer state and gradients across Data Pralellel (DP) ranks. "
+        "(useful only when `use_megatron_lm` flag is passed).",
+    )
+    megatron_lm_args.add_argument(
+        "--megatron_lm_gradient_clipping",
+        default=1.0,
+        type=float,
+        help="Megatron-LM's gradient clipping value based on global L2 Norm (0 to disable). "
+        "(useful only when `use_megatron_lm` flag is passed).",
+    )
+    # FP8 arguments
+    fp8_args = parser.add_argument_group(
+        "FP8 Arguments", "Arguments related to FP8 training (requires `--mixed_precision=fp8`)"
+    )
+    fp8_args.add_argument(
+        "--fp8_backend",
+        type=str,
+        choices=["te", "msamp"],
+        help="Choose a backend to train with FP8 (te: TransformerEngine, msamp: MS-AMP)",
+    )
+    fp8_args.add_argument(
+        "--fp8_use_autocast_during_eval",
+        default=False,
+        action="store_true",
+        help="Whether to use FP8 autocast during eval mode (useful only when `--fp8_backend=te` is passed). Generally better metrics are found when this is not passed.",
+    )
+    fp8_args.add_argument(
+        "--fp8_margin",
+        type=int,
+        default=0,
+        help="The margin to use for the gradient scaling (useful only when `--fp8_backend=te` is passed).",
+    )
+    fp8_args.add_argument(
+        "--fp8_interval",
+        type=int,
+        default=1,
+        help="The interval to use for how often the scaling factor is recomputed (useful only when `--fp8_backend=te` is passed).",
+    )
+    fp8_args.add_argument(
+        "--fp8_format",
+        type=str,
+        default="HYBRID",
+        choices=["HYBRID", "E4M3", "E5M2"],
+        help="The format to use for the FP8 recipe (useful only when `--fp8_backend=te` is passed).",
+    )
+    fp8_args.add_argument(
+        "--fp8_amax_history_len",
+        type=int,
+        default=1024,
+        help="The length of the history to use for the scaling factor computation (useful only when `--fp8_backend=te` is passed).",
+    )
+    fp8_args.add_argument(
+        "--fp8_amax_compute_algo",
+        type=str,
+        default="most_recent",
+        choices=["max", "most_recent"],
+        help="The algorithm to use for the scaling factor computation. (useful only when `--fp8_backend=te` is passed).",
+    )
+    fp8_args.add_argument(
+        "--fp8_override_linear_precision",
+        type=lambda x: tuple(map(str_to_bool, x.split(","))),
+        default=(False, False, False),
+        help="Whether or not to execute `fprop`, `dgrad`, and `wgrad` GEMMS in higher precision. Should be passed in a comma-separated string of booleans (useful only when `--fp8_backend=te` is passed).",
+    )
+    fp8_args.add_argument(
+        "--fp8_opt_level",
+        type=str,
+        default="O2",
+        choices=["O1", "O2"],
+        help="What level of 8-bit collective communication should be used with MS-AMP (useful only when `--fp8_backend=msamp` is passed).",
+    )
+    # AWS arguments
+    aws_args = parser.add_argument_group("AWS Arguments", "Arguments related to AWS.")
+    aws_args.add_argument(
+        "--aws_access_key_id",
+        type=str,
+        default=None,
+        help="The AWS_ACCESS_KEY_ID used to launch the Amazon SageMaker training job",
+    )
+    aws_args.add_argument(
+        "--aws_secret_access_key",
+        type=str,
+        default=None,
+        help="The AWS_SECRET_ACCESS_KEY used to launch the Amazon SageMaker training job.",
+    )
+    parser.add_argument(
+        "--debug",
+        action="store_true",
+        help="Whether to print out the torch.distributed stack trace when something fails.",
+    )
+    parser.add_argument(
+        "training_script",
+        type=str,
+        help=(
+            "The full path to the script to be launched in parallel, followed by all the arguments for the training "
+            "script."
+        ),
+    )
+    # MPI arguments
+    mpirun_args = parser.add_argument_group("MPI Arguments", "Arguments related to mpirun for Multi-CPU")
+    mpirun_args.add_argument(
+        "--mpirun_hostfile",
+        type=str,
+        default=None,
+        help="Location for a hostfile for using Accelerate to launch a multi-CPU training job with mpirun. This will "
+        "get passed to the MPI --hostfile or -f parameter, depending on which MPI program is installed.",
+    )
+    mpirun_args.add_argument(
+        "--mpirun_ccl",
+        type=int,
+        default=1,
+        help="The number of oneCCL worker threads when using Accelerate to launch multi-CPU training with mpirun.",
+    )
+    # ParallelismConfig arguments
+    parallelism_config_args = parser.add_argument_group(
+        "ParallelismConfig Arguments",
+        "Arguments related to the ParallelismConfig used for distributed training.",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_dp_replicate_size",
+        type=int,
+        default=1,
+        help="The number of processes for data parallel training. Defaults to 1 (no data parallelism).",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_dp_shard_size",
+        type=int,
+        default=1,
+        help="The number of processes for FSDP sharding. Defaults to 1 (No FSDP sharding).",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_tp_size",
+        type=int,
+        default=1,
+        help="The number of processes for tensor parallel training. Defaults to 1 (no tensor parallelism).",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_cp_size",
+        type=int,
+        default=1,
+        help="The number of processese for context parallel training. Defaults to 1 (no context parallelism).",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_cp_backend",
+        type=str,
+        choices=["torch"],
+        default="torch",
+        help="Context Parallelism backend: torch (FSDP2) or deepspeed (ALST/Ulysses)",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_cp_comm_strategy",
+        type=str,
+        default="allgather",
+        help="The communication strategy for context parallel training. Defaults to 'allgather'. Other option is alltoall",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_sp_size",
+        type=int,
+        default=1,
+        help="The number of processese for context parallel training. Defaults to 1 (no context parallelism).",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_sp_backend",
+        type=str,
+        choices=["deepspeed"],
+        default="deepspeed",
+        help="Sequence Parallelism backend: deepspeed (ALST/Ulysses)",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_sp_seq_length",
+        type=str,
+        default=None,
+        help="Sequence length for when batches are all of the same length. For variable sequence lengths across batches set `parallelism_config_sp_seq_length_is_variable=True`",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_sp_seq_length_is_variable",
+        type=bool,
+        default=True,
+        help="If `True` will work with a sequence length that may change between batches, in which case `parallelism_config_sp_seq_length` value can be set to anything divisible by sp size or remain unset. If `False` then `parallelism_config_sp_seq_length` needs to match the batch's sequence length dimension. The default is `True`.",
+    )
+    parallelism_config_args.add_argument(
+        "--parallelism_config_sp_attn_implementation",
+        type=str,
+        default="sdpa",
+        help="Attention implementation to use. Can be one of 'flash_attention_2', 'flash_attention_3' or 'sdpa'. Defaults to `sdpa`.",
+    )
+    # Other arguments of the training scripts
+    parser.add_argument("training_script_args", nargs=argparse.REMAINDER, help="Arguments of the training script.")
+    if subparsers is not None:
+        parser.set_defaults(func=launch_command)
+    return parser
+def simple_launcher(args):
+    cmd, current_env = prepare_simple_launcher_cmd_env(args)
+    process = subprocess.Popen(cmd, env=current_env)
+    process.wait()
+    if process.returncode != 0:
+        if not args.quiet:
+            raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
+        else:
+            sys.exit(1)
+def multi_gpu_launcher(args):
+    import torch.distributed.run as distrib_run
+    current_env = prepare_multi_gpu_env(args)
+    if not check_cuda_p2p_ib_support():
+        message = "Using RTX 4000 series which doesn't support faster communication speedups. Ensuring P2P and IB communications are disabled."
+        warn = False
+        if "NCCL_P2P_DISABLE" not in current_env:
+            current_env["NCCL_P2P_DISABLE"] = "1"
+            warn = True
+        if "NCCL_IB_DISABLE" not in current_env:
+            current_env["NCCL_IB_DISABLE"] = "1"
+            warn = True
+        if warn:
+            logger.warning(message)
+    debug = getattr(args, "debug", False)
+    args = _filter_args(
+        args,
+        distrib_run.get_args_parser(),
+        ["--training_script", args.training_script, "--training_script_args", args.training_script_args],
+    )
+    with patch_environment(**current_env):
+        try:
+            distrib_run.run(args)
+        except Exception:
+            if is_rich_available() and debug:
+                console = get_console()
+                console.print("\n[bold red]Using --debug, `torch.distributed` Stack Trace:[/bold red]")
+                console.print_exception(suppress=[__file__], show_locals=False)
+            else:
+                raise
+def deepspeed_launcher(args):
+    import torch.distributed.run as distrib_run
+    if not is_deepspeed_available():
+        raise ImportError("DeepSpeed is not installed => run `pip3 install deepspeed` or build it from source.")
+    else:
+        from deepspeed.launcher.runner import DEEPSPEED_ENVIRONMENT_NAME
+    cmd, current_env = prepare_deepspeed_cmd_env(args)
+    if not check_cuda_p2p_ib_support():
+        message = "Using RTX 4000 series which doesn't support faster communication speedups. Ensuring P2P and IB communications are disabled."
+        warn = False
+        if "NCCL_P2P_DISABLE" not in current_env:
+            current_env["NCCL_P2P_DISABLE"] = "1"
+            warn = True
+        if "NCCL_IB_DISABLE" not in current_env:
+            current_env["NCCL_IB_DISABLE"] = "1"
+            warn = True
+        if warn:
+            logger.warning(message)
+    if args.num_machines > 1 and args.deepspeed_multinode_launcher != DEEPSPEED_MULTINODE_LAUNCHERS[1]:
+        with open(DEEPSPEED_ENVIRONMENT_NAME, "a") as f:
+            valid_env_items = convert_dict_to_env_variables(current_env)
+            if len(valid_env_items) > 1:
+                f.writelines(valid_env_items)
+        process = subprocess.Popen(cmd, env=current_env)
+        process.wait()
+        if process.returncode != 0:
+            if not args.quiet:
+                raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
+            else:
+                sys.exit(1)
+    else:
+        debug = getattr(args, "debug", False)
+        args = _filter_args(
+            args,
+            distrib_run.get_args_parser(),
+            ["--training_script", args.training_script, "--training_script_args", args.training_script_args],
+        )
+        with patch_environment(**current_env):
+            try:
+                distrib_run.run(args)
+            except Exception:
+                if is_rich_available() and debug:
+                    console = get_console()
+                    console.print("\n[bold red]Using --debug, `torch.distributed` Stack Trace:[/bold red]")
+                    console.print_exception(suppress=[__file__], show_locals=False)
+                else:
+                    raise
+def tpu_launcher(args):
+    import torch_xla.distributed.xla_multiprocessing as xmp
+    if args.no_python:
+        raise ValueError("--no_python cannot be used with TPU launcher")
+    args, current_env = prepare_tpu(args, {})
+    if args.module:
+        mod_name = args.training_script
+    else:
+        # Import training_script as a module
+        script_path = Path(args.training_script)
+        sys.path.append(str(script_path.parent.resolve()))
+        mod_name = script_path.stem
+    mod = importlib.import_module(mod_name)
+    if not hasattr(mod, args.main_training_function):
+        raise ValueError(
+            f"Your training script should have a function named {args.main_training_function}, or you should pass a "
+            "different value to `--main_training_function`."
+        )
+    # Patch sys.argv
+    sys.argv = [mod.__file__] + args.training_script_args
+    main_function = getattr(mod, args.main_training_function)
+    with patch_environment(**current_env):
+        xmp.spawn(PrepareForLaunch(main_function), args=())
+def tpu_pod_launcher(args):
+    from torch_xla.distributed import xla_dist
+    current_env = {}
+    args, current_env = prepare_tpu(args, current_env, True)
+    debug = getattr(args, "debug", False)
+    training_script = args.training_script
+    training_script_args = args.training_script_args
+    new_args = _filter_args(
+        args, xla_dist.get_args_parser(), ["--tpu", args.tpu_name, "--positional", "", "--restart-tpuvm-pod-server"]
+    )
+    if args.tpu_use_sudo:
+        new_cmd = ["sudo"]
+    else:
+        new_cmd = []
+    new_cmd += [
+        "accelerate-launch",
+        "--tpu",
+        "--no_tpu_cluster",
+        "--num_machines",
+        "1",
+        "--mixed_precision",
+        "no",
+        "--dynamo_backend",
+        "no",
+        "--num_processes",
+        str(args.num_processes),
+        "--main_training_function",
+        str(args.main_training_function),
+        training_script,
+    ] + training_script_args
+    new_args.positional = new_cmd
+    bad_flags = ""
+    for arg in vars(new_args):
+        if arg.startswith("docker_"):
+            value = getattr(new_args, arg)
+            if value != "" and value is not None:
+                bad_flags += f'{arg}="{value}"\n'
+    if bad_flags != "":
+        raise ValueError(
+            f"Docker containers are not supported for TPU pod launcher currently, please remove the following flags:\n{bad_flags}"
+        )
+    new_args.env = [f"{k}={v}" for k, v in current_env.items()]
+    new_args.env.append("ACCELERATE_IN_TPU_POD=1")
+    try:
+        xla_dist.resolve_and_execute(new_args)
+    except Exception:
+        if is_rich_available() and debug:
+            console = get_console()
+            console.print("\n[bold red]Using --debug, `torch_xla.xla_dist` Stack Trace:[/bold red]")
+            console.print_exception(suppress=[__file__], show_locals=False)
+        else:
+            raise
+def sagemaker_launcher(sagemaker_config: SageMakerConfig, args):
+    if not is_sagemaker_available():
+        raise ImportError(
+            "Please install sagemaker to be able to launch training on Amazon SageMaker with `pip install accelerate[sagemaker]`"
+        )
+    if args.module or args.no_python:
+        raise ValueError(
+            "SageMaker requires a python training script file and cannot be used with --module or --no_python"
+        )
+    from sagemaker.huggingface import HuggingFace
+    args, sagemaker_inputs = prepare_sagemager_args_inputs(sagemaker_config, args)
+    huggingface_estimator = HuggingFace(**args)
+    huggingface_estimator.fit(inputs=sagemaker_inputs)
+    print(f"You can find your model data at: {huggingface_estimator.model_data}")
+def _validate_launch_command(args):
+    # Sanity checks
+    if sum([args.multi_gpu, args.cpu, args.tpu, args.use_deepspeed, args.use_fsdp]) > 1:
+        raise ValueError(
+            "You can only use one of `--cpu`, `--multi_gpu`, `--tpu`, `--use_deepspeed`, `--use_fsdp` at a time."
+        )
+    if args.multi_gpu and (args.num_processes is not None) and (args.num_processes < 2):
+        raise ValueError("You need to use at least 2 processes to use `--multi_gpu`.")
+    if (not args.use_fsdp or args.fsdp_version == 1) and args.use_parallelism_config:
+        raise ValueError("You cannot use `--use_parallelism_config` without `--use_fsdp` and `--fsdp_version=2`. ")
+    defaults = None
+    warned = []
+    mp_from_config_flag = False
+    # Get the default from the config file.
+    if args.config_file is not None or os.path.isfile(default_config_file) and not args.cpu:
+        defaults = load_config_from_file(args.config_file)
+        if (
+            not args.multi_gpu
+            and not args.tpu
+            and not args.tpu_use_cluster
+            and not args.use_deepspeed
+            and not args.use_fsdp
+            and not args.use_megatron_lm
+        ):
+            args.use_deepspeed = defaults.distributed_type == DistributedType.DEEPSPEED
+            args.multi_gpu = (
+                True
+                if defaults.distributed_type
+                in (
+                    DistributedType.MULTI_GPU,
+                    DistributedType.MULTI_NPU,
+                    DistributedType.MULTI_MLU,
+                    DistributedType.MULTI_SDAA,
+                    DistributedType.MULTI_MUSA,
+                    DistributedType.MULTI_XPU,
+                    DistributedType.MULTI_HPU,
+                )
+                else False
+            )
+            args.tpu = defaults.distributed_type == DistributedType.XLA
+            args.use_fsdp = defaults.distributed_type == DistributedType.FSDP
+            args.use_megatron_lm = defaults.distributed_type == DistributedType.MEGATRON_LM
+            args.tpu_use_cluster = defaults.tpu_use_cluster if args.tpu else False
+            args.use_parallelism_config = defaults.parallelism_config != {}
+        if args.gpu_ids is None:
+            if defaults.gpu_ids is not None:
+                args.gpu_ids = defaults.gpu_ids
+            else:
+                args.gpu_ids = "all"
+        if args.multi_gpu and args.num_machines is None:
+            args.num_machines = defaults.num_machines
+        if len(args.gpu_ids.split(",")) < 2 and (args.gpu_ids != "all") and args.multi_gpu and args.num_machines <= 1:
+            raise ValueError(
+                "Less than two GPU ids were configured and tried to run on on multiple GPUs. "
+                "Please ensure at least two are specified for `--gpu_ids`, or use `--gpu_ids='all'`."
+            )
+        if defaults.compute_environment == ComputeEnvironment.LOCAL_MACHINE:
+            # Update args with the defaults
+            for name, attr in defaults.__dict__.items():
+                if isinstance(attr, dict):
+                    # Copy defaults.somedict.somearg to args.somearg and
+                    # defaults.fsdp_config.x to args.fsdp_x
+                    for key, value in attr.items():
+                        if name == "fsdp_config" and not key.startswith("fsdp"):
+                            key = "fsdp_" + key
+                        elif name == "fp8_config" and not key.startswith("fp8"):
+                            key = "fp8_" + key
+                        if hasattr(args, "nondefault") and key not in args.nondefault:
+                            setattr(args, key, value)
+                elif (
+                    name not in ["compute_environment", "mixed_precision", "distributed_type"]
+                    and getattr(args, name, None) is None
+                ):
+                    # Those args are handled separately
+                    setattr(args, name, attr)
+        if not args.debug:
+            args.debug = defaults.debug
+        if not args.mixed_precision:
+            if defaults.mixed_precision is None:
+                args.mixed_precision = "no"
+            else:
+                args.mixed_precision = defaults.mixed_precision
+                mp_from_config_flag = True
+        else:
+            native_amp = is_bf16_available(True)
+            if (
+                args.mixed_precision == "bf16"
+                and not native_amp
+                and not (args.tpu and is_torch_xla_available(check_is_tpu=True))
+            ):
+                raise ValueError("bf16 mixed precision requires PyTorch >= 1.10 and a supported device.")
+        # Silently set the default here
+        if args.dynamo_backend is None:
+            args.dynamo_backend = "no"
+        if args.num_processes == -1:
+            raise ValueError("You need to manually pass in `--num_processes` using this config yaml.")
+    else:
+        if args.num_processes is None:
+            if is_xpu_available():
+                args.num_processes = torch.xpu.device_count()
+            elif is_mlu_available():
+                args.num_processes = torch.mlu.device_count()
+            elif is_sdaa_available():
+                args.num_processes = torch.sdaa.device_count()
+            elif is_musa_available():
+                args.num_processes = torch.musa.device_count()
+            elif is_npu_available():
+                args.num_processes = torch.npu.device_count()
+            elif is_hpu_available():
+                args.num_processes = torch.hpu.device_count()
+            else:
+                args.num_processes = torch.cuda.device_count()
+            warned.append(f"\t`--num_processes` was set to a value of `{args.num_processes}`")
+        if args.debug is None:
+            args.debug = False
+        if (
+            not args.multi_gpu
+            and args.num_processes > 1
+            and (
+                (is_xpu_available() and torch.xpu.device_count() > 1)
+                or (is_npu_available() and torch.npu.device_count() > 1)
+                or (is_hpu_available() and torch.hpu.device_count() > 1)
+                or (is_mlu_available() and torch.mlu.device_count() > 1)
+                or (is_sdaa_available() and torch.sdaa.device_count() > 1)
+                or (is_musa_available() and torch.musa.device_count() > 1)
+                or (torch.cuda.is_available() and torch.cuda.device_count() > 1)
+            )
+        ):
+            warned.append(
+                "\t\tMore than one GPU was found, enabling multi-GPU training.\n"
+                "\t\tIf this was unintended please pass in `--num_processes=1`."
+            )
+            args.multi_gpu = True
+        if args.num_machines is None:
+            warned.append("\t`--num_machines` was set to a value of `1`")
+            args.num_machines = 1
+        if args.mixed_precision is None:
+            warned.append("\t`--mixed_precision` was set to a value of `'no'`")
+            args.mixed_precision = "no"
+        if not hasattr(args, "use_cpu"):
+            args.use_cpu = args.cpu
+        if args.dynamo_backend is None:
+            warned.append("\t`--dynamo_backend` was set to a value of `'no'`")
+            args.dynamo_backend = "no"
+    if args.debug:
+        logger.debug("Running script in debug mode, expect distributed operations to be slightly slower.")
+    is_aws_env_disabled = defaults is None or (
+        defaults is not None and defaults.compute_environment != ComputeEnvironment.AMAZON_SAGEMAKER
+    )
+    if is_aws_env_disabled and args.num_cpu_threads_per_process is None:
+        args.num_cpu_threads_per_process = get_int_from_env(["OMP_NUM_THREADS"], 1)
+        if args.use_cpu and args.num_processes >= 1 and get_int_from_env(["OMP_NUM_THREADS"], 0) == 0:
+            local_size = get_int_from_env(
+                ["MPI_LOCALNRANKS", "OMPI_COMM_WORLD_LOCAL_SIZE", "MV2_COMM_WORLD_LOCAL_SIZE"],
+                max(int(args.num_processes / args.num_machines), 1),
+            )
+            threads_per_process = int(psutil.cpu_count(logical=False) / local_size)
+            if threads_per_process > 1:
+                args.num_cpu_threads_per_process = threads_per_process
+                warned.append(
+                    f"\t`--num_cpu_threads_per_process` was set to `{args.num_cpu_threads_per_process}` to improve out-of-box performance when training on CPUs"
+                )
+    if args.use_xpu is not None:
+        logger.warning(
+            "use_xpu is deprecated and ignored, will be removed in Accelerate v1.20. "
+            "XPU is a PyTorch native citizen now, we don't need extra argument to enable it any more."
+        )
+    if any(warned):
+        message = "The following values were not passed to `accelerate launch` and had defaults used instead:\n"
+        message += "\n".join(warned)
+        message += (
+            "\nTo avoid this warning pass in values for each of the problematic parameters or run `accelerate config`."
+        )
+        logger.warning(message)
+    return args, defaults, mp_from_config_flag
+def launch_command(args):
+    args, defaults, mp_from_config_flag = _validate_launch_command(args)
+    # Use the proper launcher
+    if args.use_deepspeed and not args.cpu:
+        args.deepspeed_fields_from_accelerate_config = list(defaults.deepspeed_config.keys()) if defaults else []
+        if mp_from_config_flag:
+            args.deepspeed_fields_from_accelerate_config.append("mixed_precision")
+        args.deepspeed_fields_from_accelerate_config = ",".join(args.deepspeed_fields_from_accelerate_config)
+        deepspeed_launcher(args)
+    elif args.use_fsdp and not args.cpu:
+        multi_gpu_launcher(args)
+    elif args.use_megatron_lm and not args.cpu:
+        multi_gpu_launcher(args)
+    elif args.multi_gpu and not args.cpu:
+        multi_gpu_launcher(args)
+    elif args.tpu and not args.cpu:
+        if args.tpu_use_cluster:
+            tpu_pod_launcher(args)
+        else:
+            tpu_launcher(args)
+    elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMAZON_SAGEMAKER:
+        sagemaker_launcher(defaults, args)
+    else:
+        simple_launcher(args)
+def main():
+    parser = launch_command_parser()
+    args = parser.parse_args()
+    launch_command(args)
+if __name__ == "__main__":
+    main()

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/merge.py ADDED Viewed

	@@ -0,0 +1,69 @@

+#!/usr/bin/env python
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from accelerate.commands.utils import CustomArgumentParser
+from accelerate.utils import merge_fsdp_weights
+description = """Utility to merge the weights from multiple FSDP checkpoints into a single combined checkpoint. Should be used if
+`SHARDED_STATE_DICT` was used for the model. Weights will be saved to `{output_path}`.
+This is a CPU-bound process and requires enough RAM to load the entire model state dict."""
+def merge_command(args):
+    merge_fsdp_weights(
+        args.checkpoint_directory, args.output_path, not args.unsafe_serialization, args.remove_checkpoint_dir
+    )
+def merge_command_parser(subparsers=None):
+    if subparsers is not None:
+        parser = subparsers.add_parser("merge-weights", description=description)
+    else:
+        parser = CustomArgumentParser(description=description)
+    parser.add_argument("checkpoint_directory", type=str, help="A directory containing sharded weights saved by FSDP.")
+    parser.add_argument(
+        "output_path",
+        type=str,
+        help="The path to save the merged weights. Defaults to the current directory. ",
+    )
+    parser.add_argument(
+        "--unsafe_serialization",
+        action="store_true",
+        default=False,
+        help="Whether to save the merged weights as `.bin` rather than `.safetensors` (not recommended).",
+    )
+    parser.add_argument(
+        "--remove_checkpoint_dir",
+        action="store_true",
+        help="Whether to remove the checkpoint directory after merging.",
+        default=False,
+    )
+    if subparsers is not None:
+        parser.set_defaults(func=merge_command)
+    return parser
+def main():
+    parser = merge_command_parser()
+    args = parser.parse_args()
+    merge_command(args)
+if __name__ == "__main__":
+    main()

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/test.py ADDED Viewed

	@@ -0,0 +1,65 @@

+#!/usr/bin/env python
+# Copyright 2021 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+from accelerate.test_utils import execute_subprocess_async, path_in_accelerate_package
+def test_command_parser(subparsers=None):
+    if subparsers is not None:
+        parser = subparsers.add_parser("test")
+    else:
+        parser = argparse.ArgumentParser("Accelerate test command")
+    parser.add_argument(
+        "--config_file",
+        default=None,
+        help=(
+            "The path to use to store the config file. Will default to a file named default_config.yaml in the cache "
+            "location, which is the content of the environment `HF_HOME` suffixed with 'accelerate', or if you don't have "
+            "such an environment variable, your cache directory ('~/.cache' or the content of `XDG_CACHE_HOME`) suffixed "
+            "with 'huggingface'."
+        ),
+    )
+    if subparsers is not None:
+        parser.set_defaults(func=test_command)
+    return parser
+def test_command(args):
+    script_name = path_in_accelerate_package("test_utils", "scripts", "test_script.py")
+    if args.config_file is None:
+        test_args = [script_name]
+    else:
+        test_args = f"--config_file={args.config_file} {script_name}".split()
+    cmd = ["accelerate-launch"] + test_args
+    result = execute_subprocess_async(cmd)
+    if result.returncode == 0:
+        print("Test is a success! You are ready for your distributed training!")
+def main():
+    parser = test_command_parser()
+    args = parser.parse_args()
+    test_command(args)
+if __name__ == "__main__":
+    main()

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/to_fsdp2.py ADDED Viewed

	@@ -0,0 +1,172 @@

+#!/usr/bin/env python
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import enum
+import logging
+from pathlib import Path
+import yaml
+from accelerate.commands.utils import CustomArgumentParser
+class ConversionStatus(enum.Enum):
+    NOT_YET_IMPLEMENTED = 0
+    REMOVED = -1
+ARGUMENT_KEY_MAPPING = {
+    # New keys in FSDP2
+    "fsdp_version": "fsdp_version",
+    "fsdp_reshard_after_forward": "fsdp_reshard_after_forward",
+    # https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md
+    # https://huggingface.co/docs/accelerate/en/usage_guides/fsdp
+    "fsdp_auto_wrap_policy": "fsdp_auto_wrap_policy",
+    "fsdp_backward_prefetch": ConversionStatus.REMOVED,
+    "fsdp_forward_prefetch": ConversionStatus.NOT_YET_IMPLEMENTED,
+    "fsdp_cpu_ram_efficient_loading": "fsdp_cpu_ram_efficient_loading",
+    "fsdp_offload_params": "fsdp_offload_params",
+    "fsdp_sharding_strategy": "fsdp_reshard_after_forward",
+    "fsdp_state_dict_type": "fsdp_state_dict_type",
+    "fsdp_sync_module_states": ConversionStatus.REMOVED,
+    "fsdp_transformer_layer_cls_to_wrap": "fsdp_transformer_layer_cls_to_wrap",
+    "fsdp_min_num_params": "fsdp_min_num_params",
+    "fsdp_use_orig_params": ConversionStatus.REMOVED,
+    "fsdp_activation_checkpointing": "fsdp_activation_checkpointing",
+}
+ARGUMENT_VALUE_MAPPING = {
+    "fsdp_sharding_strategy": {
+        "FULL_SHARD": True,
+        "SHARD_GRAD_OP": False,
+        "HYBRID_SHARD": True,
+        "HYBRID_SHARD_ZERO2": False,
+        "NO_SHARD": False,
+    },
+    "fsdp_reshard_after_forward": {  # Needed to convert newly created configs using FSDP1 to FSDP2
+        "FULL_SHARD": True,
+        "SHARD_GRAD_OP": False,
+        "HYBRID_SHARD": True,
+        "HYBRID_SHARD_ZERO2": False,
+        "NO_SHARD": False,
+    },
+}
+logger = logging.getLogger(__name__)
+def _validate_to_fsdp2_args(args):
+    if not Path(args.config_file).exists():
+        raise FileNotFoundError(f"Config file {args.config_file} not found")
+    if not args.overwrite and args.output_file is None:
+        raise ValueError("If --overwrite is not set, --output_file must be provided")
+    if not args.overwrite and Path(args.output_file).exists():
+        raise FileExistsError(f"Output file {args.output_file} already exists and --overwrite is not set")
+def convert_config_to_fsdp2(config: dict) -> dict:
+    fsdp_config = config.get("fsdp_config", {})
+    if not fsdp_config:
+        logger.info("No FSDP config found in the config file, skipping conversion...")
+        return config
+    new_fsdp_config = {}
+    if fsdp_config.get("fsdp_version", 1) == 2:
+        logger.warning("Config already specifies FSDP2, skipping conversion...")
+        logger.warning(
+            "If the config doesn't use new argument names, change `fsdp_version` to `1` and rerun the command."
+        )
+        return config
+    for key, value in fsdp_config.items():
+        conversion_status = ARGUMENT_KEY_MAPPING.get(key, None)
+        if isinstance(conversion_status, ConversionStatus) or conversion_status is None:
+            conversion_status = key
+            new_fsdp_config[conversion_status] = value
+            continue
+        if conversion_status == ConversionStatus.REMOVED:
+            logger.warning(f"Argument {key} has been removed in FSDP2, skipping this key...")
+            continue
+        if conversion_status == ConversionStatus.NOT_YET_IMPLEMENTED:
+            logger.warning(f"Argument {key} is not yet implemented in FSDP2, skipping this key...")
+            continue
+        if conversion_status is None:
+            logger.warning(f"Argument {key} is not being converted, skipping this key...")
+            new_fsdp_config[key] = value
+        else:
+            if key in ARGUMENT_VALUE_MAPPING:
+                value = ARGUMENT_VALUE_MAPPING[key].get(value, value)
+            new_fsdp_config[ARGUMENT_KEY_MAPPING[key]] = value
+    new_fsdp_config["fsdp_version"] = 2
+    config["fsdp_config"] = new_fsdp_config
+    return config
+def to_fsdp2_command_parser(subparsers=None):
+    description = "Convert an Accelerate config from FSDP1 to FSDP2"
+    if subparsers is not None:
+        parser = subparsers.add_parser("to-fsdp2", description=description)
+    else:
+        parser = CustomArgumentParser(description=description)
+    parser.add_argument("--config_file", type=str, help="The config file to convert to FSDP2", required=True)
+    parser.add_argument(
+        "--overwrite",
+        action="store_true",
+        help="Overwrite the config file if it exists",
+        default=False,
+    )
+    parser.add_argument(
+        "--output_file",
+        type=str,
+        help="The path to the output file to write the converted config to. If not provided, the input file will be overwritten (if --overwrite is set)",
+        default=None,
+    )
+    if subparsers is not None:
+        parser.set_defaults(func=to_fsdp2_command)
+    return parser
+def load_config(config_file: str) -> dict:
+    with open(config_file) as f:
+        config = yaml.safe_load(f)
+    if not config:
+        raise ValueError("Config file is empty")
+    return config
+def to_fsdp2_command(args):
+    _validate_to_fsdp2_args(args)
+    config = load_config(args.config_file)
+    if args.overwrite and args.output_file is None:
+        args.output_file = args.config_file
+    new_config = convert_config_to_fsdp2(config)
+    with open(args.output_file, "w") as f:
+        yaml.dump(new_config, f)

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/commands/utils.py ADDED Viewed

	@@ -0,0 +1,123 @@

+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+class _StoreAction(argparse.Action):
+    """
+    Custom action that allows for `-` or `_` to be passed in for an argument.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        new_option_strings = []
+        for option_string in self.option_strings:
+            new_option_strings.append(option_string)
+            if "_" in option_string[2:]:
+                # Add `-` version to the option string
+                new_option_strings.append(option_string.replace("_", "-"))
+        self.option_strings = new_option_strings
+    def __call__(self, parser, namespace, values, option_string=None):
+        setattr(namespace, self.dest, values)
+        if not hasattr(namespace, "nondefault"):
+            namespace.nondefault = set()
+        namespace.nondefault.add(self.dest)
+class _StoreConstAction(_StoreAction):
+    """
+    Same as `argparse._StoreConstAction` but uses the custom `_StoreAction`.
+    """
+    def __init__(self, option_strings, dest, const, default=None, required=False, help=None):
+        super().__init__(
+            option_strings=option_strings,
+            dest=dest,
+            nargs=0,
+            const=const,
+            default=default,
+            required=required,
+            help=help,
+        )
+    def __call__(self, parser, namespace, values, option_string=None):
+        super().__call__(parser, namespace, self.const, option_string)
+class _StoreTrueAction(_StoreConstAction):
+    """
+    Same as `argparse._StoreTrueAction` but uses the custom `_StoreConstAction`.
+    """
+    def __init__(
+        self,
+        option_strings,
+        dest,
+        default=None,
+        required=False,
+        help=None,
+    ):
+        super().__init__(
+            option_strings=option_strings, dest=dest, const=True, default=default, required=required, help=help
+        )
+class CustomArgumentGroup(argparse._ArgumentGroup):
+    """
+    Custom argument group that allows for the use of `-` or `_` in arguments passed and overrides the help for each
+    when applicable.
+    """
+    def _add_action(self, action):
+        args = vars(action)
+        if isinstance(action, argparse._StoreTrueAction):
+            action = _StoreTrueAction(
+                args["option_strings"], args["dest"], args["default"], args["required"], args["help"]
+            )
+        elif isinstance(action, argparse._StoreConstAction):
+            action = _StoreConstAction(
+                args["option_strings"],
+                args["dest"],
+                args["const"],
+                args["default"],
+                args["required"],
+                args["help"],
+            )
+        elif isinstance(action, argparse._StoreAction):
+            action = _StoreAction(**args)
+        action = super()._add_action(action)
+        return action
+class CustomArgumentParser(argparse.ArgumentParser):
+    """
+    Custom argument parser that allows for the use of `-` or `_` in arguments passed and overrides the help for each
+    when applicable.
+    """
+    def add_argument(self, *args, **kwargs):
+        if "action" in kwargs:
+            # Translate action -> class
+            if kwargs["action"] == "store_true":
+                kwargs["action"] = _StoreTrueAction
+        else:
+            kwargs["action"] = _StoreAction
+        super().add_argument(*args, **kwargs)
+    def add_argument_group(self, *args, **kwargs):
+        group = CustomArgumentGroup(self, *args, **kwargs)
+        self._action_groups.append(group)
+        return group

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/__init__.py ADDED Viewed

	@@ -0,0 +1,66 @@

+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .testing import (
+    DEFAULT_LAUNCH_COMMAND,
+    are_the_same_tensors,
+    assert_exception,
+    capture_call_output,
+    device_count,
+    execute_subprocess_async,
+    get_launch_command,
+    get_torch_dist_unique_port,
+    memory_allocated_func,
+    path_in_accelerate_package,
+    pytest_xdist_worker_id,
+    require_bnb,
+    require_cpu,
+    require_cuda,
+    require_cuda_or_hpu,
+    require_cuda_or_xpu,
+    require_fp8,
+    require_fp16,
+    require_huggingface_suite,
+    require_mlu,
+    require_mps,
+    require_multi_device,
+    require_multi_gpu,
+    require_multi_gpu_or_xpu,
+    require_multi_xpu,
+    require_musa,
+    require_non_cpu,
+    require_non_hpu,
+    require_non_torch_xla,
+    require_non_xpu,
+    require_npu,
+    require_pippy,
+    require_sdaa,
+    require_single_device,
+    require_single_gpu,
+    require_single_xpu,
+    require_torch_min_version,
+    require_torchao,
+    require_torchvision,
+    require_tpu,
+    require_transformer_engine,
+    require_transformer_engine_mxfp8,
+    require_xpu,
+    run_first,
+    skip,
+    slow,
+    torch_device,
+)
+from .training import RegressionDataset, RegressionModel
+from .scripts import test_script, test_sync, test_ops  # isort: skip

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/examples.py ADDED Viewed

	@@ -0,0 +1,148 @@

+#!/usr/bin/env python
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+A collection of utilities for comparing `examples/complete_*_example.py` scripts with the capabilities inside of each
+`examples/by_feature` example. `compare_against_test` is the main function that should be used when testing, while the
+others are used to either get the code that matters, or to preprocess them (such as stripping comments)
+"""
+import os
+from typing import Optional
+def get_function_contents_by_name(lines: list[str], name: str):
+    """
+    Extracts a function from `lines` of segmented source code with the name `name`.
+    Args:
+        lines (`List[str]`):
+            Source code of a script separated by line.
+        name (`str`):
+            The name of the function to extract. Should be either `training_function` or `main`
+    """
+    if name != "training_function" and name != "main":
+        raise ValueError(f"Incorrect function name passed: {name}, choose either 'main' or 'training_function'")
+    good_lines, found_start = [], False
+    for line in lines:
+        if not found_start and f"def {name}" in line:
+            found_start = True
+            good_lines.append(line)
+            continue
+        if found_start:
+            if name == "training_function" and "def main" in line:
+                return good_lines
+            if name == "main" and "if __name__" in line:
+                return good_lines
+            good_lines.append(line)
+def clean_lines(lines: list[str]):
+    """
+    Filters `lines` and removes any entries that start with a comment ('#') or is just a newline ('\n')
+    Args:
+        lines (`List[str]`):
+            Source code of a script separated by line.
+    """
+    return [line for line in lines if not line.lstrip().startswith("#") and line != "\n"]
+def compare_against_test(
+    base_filename: str, feature_filename: str, parser_only: bool, secondary_filename: Optional[str] = None
+):
+    """
+    Tests whether the additional code inside of `feature_filename` was implemented in `base_filename`. This should be
+    used when testing to see if `complete_*_.py` examples have all of the implementations from each of the
+    `examples/by_feature/*` scripts.
+    It utilizes `nlp_example.py` to extract out all of the repeated training code, so that only the new additional code
+    is examined and checked. If something *other* than `nlp_example.py` should be used, such as `cv_example.py` for the
+    `complete_cv_example.py` script, it should be passed in for the `secondary_filename` parameter.
+    Args:
+        base_filename (`str` or `os.PathLike`):
+            The filepath of a single "complete" example script to test, such as `examples/complete_cv_example.py`
+        feature_filename (`str` or `os.PathLike`):
+            The filepath of a single feature example script. The contents of this script are checked to see if they
+            exist in `base_filename`
+        parser_only (`bool`):
+            Whether to compare only the `main()` sections in both files, or to compare the contents of
+            `training_loop()`
+        secondary_filename (`str`, *optional*):
+            A potential secondary filepath that should be included in the check. This function extracts the base
+            functionalities off of "examples/nlp_example.py", so if `base_filename` is a script other than
+            `complete_nlp_example.py`, the template script should be included here. Such as `examples/cv_example.py`
+    """
+    with open(base_filename) as f:
+        base_file_contents = f.readlines()
+    with open(os.path.abspath(os.path.join("examples", "nlp_example.py"))) as f:
+        full_file_contents = f.readlines()
+    with open(feature_filename) as f:
+        feature_file_contents = f.readlines()
+    if secondary_filename is not None:
+        with open(secondary_filename) as f:
+            secondary_file_contents = f.readlines()
+    # This is our base, we remove all the code from here in our `full_filename` and `feature_filename` to find the new content
+    if parser_only:
+        base_file_func = clean_lines(get_function_contents_by_name(base_file_contents, "main"))
+        full_file_func = clean_lines(get_function_contents_by_name(full_file_contents, "main"))
+        feature_file_func = clean_lines(get_function_contents_by_name(feature_file_contents, "main"))
+        if secondary_filename is not None:
+            secondary_file_func = clean_lines(get_function_contents_by_name(secondary_file_contents, "main"))
+    else:
+        base_file_func = clean_lines(get_function_contents_by_name(base_file_contents, "training_function"))
+        full_file_func = clean_lines(get_function_contents_by_name(full_file_contents, "training_function"))
+        feature_file_func = clean_lines(get_function_contents_by_name(feature_file_contents, "training_function"))
+        if secondary_filename is not None:
+            secondary_file_func = clean_lines(
+                get_function_contents_by_name(secondary_file_contents, "training_function")
+            )
+    _dl_line = "train_dataloader, eval_dataloader = get_dataloaders(accelerator, batch_size)\n"
+    # Specific code in our script that differs from the full version, aka what is new
+    new_feature_code = []
+    passed_idxs = []  # We keep track of the idxs just in case it's a repeated statement
+    it = iter(feature_file_func)
+    for i in range(len(feature_file_func) - 1):
+        if i not in passed_idxs:
+            line = next(it)
+            if (line not in full_file_func) and (line.lstrip() != _dl_line):
+                if "TESTING_MOCKED_DATALOADERS" not in line:
+                    new_feature_code.append(line)
+                    passed_idxs.append(i)
+                else:
+                    # Skip over the `config['num_epochs'] = 2` statement
+                    _ = next(it)
+    # Extract out just the new parts from the full_file_training_func
+    new_full_example_parts = []
+    passed_idxs = []  # We keep track of the idxs just in case it's a repeated statement
+    for i, line in enumerate(base_file_func):
+        if i not in passed_idxs:
+            if (line not in full_file_func) and (line.lstrip() != _dl_line):
+                if "TESTING_MOCKED_DATALOADERS" not in line:
+                    new_full_example_parts.append(line)
+                    passed_idxs.append(i)
+    # Finally, get the overall diff
+    diff_from_example = [line for line in new_feature_code if line not in new_full_example_parts]
+    if secondary_filename is not None:
+        diff_from_two = [line for line in full_file_contents if line not in secondary_file_func]
+        diff_from_example = [line for line in diff_from_example if line not in diff_from_two]
+    return diff_from_example

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/testing.py ADDED Viewed

	@@ -0,0 +1,881 @@

+# Copyright 2021 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import asyncio
+import inspect
+import io
+import os
+import re
+import shutil
+import subprocess
+import sys
+import tempfile
+import unittest
+from contextlib import contextmanager
+from functools import partial
+from pathlib import Path
+from typing import Optional, Union
+from unittest import mock
+import torch
+import accelerate
+from ..state import AcceleratorState
+from ..utils import (
+    check_cuda_fp8_capability,
+    compare_versions,
+    gather,
+    is_aim_available,
+    is_bnb_available,
+    is_clearml_available,
+    is_comet_ml_available,
+    is_cuda_available,
+    is_datasets_available,
+    is_deepspeed_available,
+    is_dvclive_available,
+    is_fp8_available,
+    is_fp16_available,
+    is_habana_gaudi1,
+    is_hpu_available,
+    is_import_timer_available,
+    is_matplotlib_available,
+    is_mlflow_available,
+    is_mlu_available,
+    is_mps_available,
+    is_musa_available,
+    is_npu_available,
+    is_pandas_available,
+    is_pippy_available,
+    is_pytest_available,
+    is_schedulefree_available,
+    is_sdaa_available,
+    is_swanlab_available,
+    is_tensorboard_available,
+    is_timm_available,
+    is_torch_version,
+    is_torch_xla_available,
+    is_torchao_available,
+    is_torchdata_stateful_dataloader_available,
+    is_torchvision_available,
+    is_trackio_available,
+    is_transformer_engine_available,
+    is_transformer_engine_mxfp8_available,
+    is_transformers_available,
+    is_triton_available,
+    is_wandb_available,
+    is_xpu_available,
+    str_to_bool,
+)
+def get_backend():
+    if is_torch_xla_available():
+        return "xla", torch.cuda.device_count(), torch.cuda.memory_allocated
+    elif is_cuda_available():
+        return "cuda", torch.cuda.device_count(), torch.cuda.memory_allocated
+    elif is_mps_available(min_version="2.0"):
+        return "mps", 1, torch.mps.current_allocated_memory
+    elif is_mps_available():
+        return "mps", 1, lambda: 0
+    elif is_mlu_available():
+        return "mlu", torch.mlu.device_count(), torch.mlu.memory_allocated
+    elif is_sdaa_available():
+        return "sdaa", torch.sdaa.device_count(), torch.sdaa.memory_allocated
+    elif is_musa_available():
+        return "musa", torch.musa.device_count(), torch.musa.memory_allocated
+    elif is_npu_available():
+        return "npu", torch.npu.device_count(), torch.npu.memory_allocated
+    elif is_xpu_available():
+        return "xpu", torch.xpu.device_count(), torch.xpu.memory_allocated
+    elif is_hpu_available():
+        return "hpu", torch.hpu.device_count(), torch.hpu.memory_allocated
+    else:
+        return "cpu", 1, lambda: 0
+torch_device, device_count, memory_allocated_func = get_backend()
+def get_launch_command(**kwargs) -> list:
+    """
+    Wraps around `kwargs` to help simplify launching from `subprocess`.
+    Example:
+    ```python
+    # returns ['accelerate', 'launch', '--num_processes=2', '--device_count=2']
+    get_launch_command(num_processes=2, device_count=2)
+    ```
+    """
+    command = ["accelerate", "launch"]
+    for k, v in kwargs.items():
+        if isinstance(v, bool) and v:
+            command.append(f"--{k}")
+        elif v is not None:
+            command.append(f"--{k}={v}")
+    return command
+DEFAULT_LAUNCH_COMMAND = get_launch_command(num_processes=device_count, monitor_interval=0.1)
+def parse_flag_from_env(key, default=False):
+    try:
+        value = os.environ[key]
+    except KeyError:
+        # KEY isn't set, default to `default`.
+        _value = default
+    else:
+        # KEY is set, convert it to True or False.
+        try:
+            _value = str_to_bool(value)
+        except ValueError:
+            # More values are supported, but let's keep the message simple.
+            raise ValueError(f"If set, {key} must be yes or no.")
+    return _value
+_run_slow_tests = parse_flag_from_env("RUN_SLOW", default=False)
+def skip(test_case):
+    "Decorator that skips a test unconditionally"
+    return unittest.skip("Test was skipped")(test_case)
+def slow(test_case):
+    """
+    Decorator marking a test as slow. Slow tests are skipped by default. Set the RUN_SLOW environment variable to a
+    truthy value to run them.
+    """
+    return unittest.skipUnless(_run_slow_tests, "test is slow")(test_case)
+def require_cpu(test_case):
+    """
+    Decorator marking a test that must be only ran on the CPU. These tests are skipped when a GPU is available.
+    """
+    return unittest.skipUnless(torch_device == "cpu", "test requires only a CPU")(test_case)
+def require_non_cpu(test_case):
+    """
+    Decorator marking a test that requires a hardware accelerator backend. These tests are skipped when there are no
+    hardware accelerator available.
+    """
+    return unittest.skipUnless(torch_device != "cpu", "test requires a GPU")(test_case)
+def require_cuda(test_case):
+    """
+    Decorator marking a test that requires CUDA. These tests are skipped when there are no GPU available or when
+    TorchXLA is available.
+    """
+    return unittest.skipUnless(is_cuda_available() and not is_torch_xla_available(), "test requires a GPU")(test_case)
+def require_cuda_or_hpu(test_case):
+    """
+    Decorator marking a test that requires CUDA or HPU. These tests are skipped when there are no GPU available or when
+    TorchXLA is available.
+    """
+    return unittest.skipUnless(
+        (is_cuda_available() and not is_torch_xla_available()) or is_hpu_available(), "test requires a GPU or HPU"
+    )(test_case)
+def require_xpu(test_case):
+    """
+    Decorator marking a test that requires XPU. These tests are skipped when there are no XPU available.
+    """
+    return unittest.skipUnless(is_xpu_available(), "test requires a XPU")(test_case)
+def require_cuda_or_xpu(test_case):
+    """
+    Decorator marking a test that requires CUDA or XPU. These tests are skipped when there are no GPU available or when
+    TorchXLA is available.
+    """
+    cuda_condition = is_cuda_available() and not is_torch_xla_available()
+    xpu_condition = is_xpu_available()
+    return unittest.skipUnless(cuda_condition or xpu_condition, "test requires a CUDA GPU or XPU")(test_case)
+def require_non_xpu(test_case):
+    """
+    Decorator marking a test that should be skipped for XPU.
+    """
+    return unittest.skipUnless(torch_device != "xpu", "test requires a non-XPU")(test_case)
+def require_non_hpu(test_case):
+    """
+    Decorator marking a test that should be skipped for HPU.
+    """
+    return unittest.skipUnless(torch_device != "hpu", "test requires a non-HPU")(test_case)
+def require_fp16(test_case):
+    """
+    Decorator marking a test that requires FP16. These tests are skipped when FP16 is not supported.
+    """
+    return unittest.skipUnless(is_fp16_available(), "test requires FP16 support")(test_case)
+def require_fp8(test_case):
+    """
+    Decorator marking a test that requires FP8. These tests are skipped when FP8 is not supported.
+    """
+    # is_fp8_available only checks for libraries
+    # ideally it should check for device capability as well
+    fp8_is_available = is_fp8_available()
+    if torch.cuda.is_available() and not check_cuda_fp8_capability():
+        fp8_is_available = False
+    if is_hpu_available() and is_habana_gaudi1():
+        fp8_is_available = False
+    return unittest.skipUnless(fp8_is_available, "test requires FP8 support")(test_case)
+def require_fsdp2(test_case):
+    return unittest.skipUnless(is_torch_version(">=", "2.5.0"), "test requires FSDP2 (torch >= 2.5.0)")(test_case)
+def require_mlu(test_case):
+    """
+    Decorator marking a test that requires MLU. These tests are skipped when there are no MLU available.
+    """
+    return unittest.skipUnless(is_mlu_available(), "test require a MLU")(test_case)
+def require_sdaa(test_case):
+    """
+    Decorator marking a test that requires SDAA. These tests are skipped when there are no SDAA available.
+    """
+    return unittest.skipUnless(is_sdaa_available(), "test require a SDAA")(test_case)
+def require_musa(test_case):
+    """
+    Decorator marking a test that requires MUSA. These tests are skipped when there are no MUSA available.
+    """
+    return unittest.skipUnless(is_musa_available(), "test require a MUSA")(test_case)
+def require_npu(test_case):
+    """
+    Decorator marking a test that requires NPU. These tests are skipped when there are no NPU available.
+    """
+    return unittest.skipUnless(is_npu_available(), "test require a NPU")(test_case)
+def require_mps(test_case):
+    """
+    Decorator marking a test that requires MPS backend. These tests are skipped when torch doesn't support `mps`
+    backend.
+    """
+    return unittest.skipUnless(is_mps_available(), "test requires a `mps` backend support in `torch`")(test_case)
+def require_huggingface_suite(test_case):
+    """
+    Decorator marking a test that requires transformers and datasets. These tests are skipped when they are not.
+    """
+    return unittest.skipUnless(
+        is_transformers_available() and is_datasets_available(),
+        "test requires the Hugging Face suite",
+    )(test_case)
+def require_transformers(test_case):
+    """
+    Decorator marking a test that requires transformers. These tests are skipped when they are not.
+    """
+    return unittest.skipUnless(is_transformers_available(), "test requires the transformers library")(test_case)
+def require_timm(test_case):
+    """
+    Decorator marking a test that requires timm. These tests are skipped when they are not.
+    """
+    return unittest.skipUnless(is_timm_available(), "test requires the timm library")(test_case)
+def require_torchvision(test_case):
+    """
+    Decorator marking a test that requires torchvision. These tests are skipped when they are not.
+    """
+    return unittest.skipUnless(is_torchvision_available(), "test requires the torchvision library")(test_case)
+def require_triton(test_case):
+    """
+    Decorator marking a test that requires triton. These tests are skipped when they are not.
+    """
+    return unittest.skipUnless(is_triton_available(), "test requires the triton library")(test_case)
+def require_schedulefree(test_case):
+    """
+    Decorator marking a test that requires schedulefree. These tests are skipped when they are not.
+    """
+    return unittest.skipUnless(is_schedulefree_available(), "test requires the schedulefree library")(test_case)
+def require_bnb(test_case):
+    """
+    Decorator marking a test that requires bitsandbytes. These tests are skipped when they are not.
+    """
+    return unittest.skipUnless(is_bnb_available(), "test requires the bitsandbytes library")(test_case)
+def require_tpu(test_case):
+    """
+    Decorator marking a test that requires TPUs. These tests are skipped when there are no TPUs available.
+    """
+    return unittest.skipUnless(is_torch_xla_available(check_is_tpu=True), "test requires TPU")(test_case)
+def require_non_torch_xla(test_case):
+    """
+    Decorator marking a test as requiring an environment without TorchXLA. These tests are skipped when TorchXLA is
+    available.
+    """
+    return unittest.skipUnless(not is_torch_xla_available(), "test requires an env without TorchXLA")(test_case)
+def require_single_device(test_case):
+    """
+    Decorator marking a test that requires a single device. These tests are skipped when there is no hardware
+    accelerator available or number of devices is more than one.
+    """
+    return unittest.skipUnless(
+        torch_device != "cpu" and device_count == 1, "test requires a single device accelerator"
+    )(test_case)
+def require_single_gpu(test_case):
+    """
+    Decorator marking a test that requires CUDA on a single GPU. These tests are skipped when there are no GPU
+    available or number of GPUs is more than one.
+    """
+    return unittest.skipUnless(torch.cuda.device_count() == 1, "test requires a GPU")(test_case)
+def require_single_xpu(test_case):
+    """
+    Decorator marking a test that requires CUDA on a single XPU. These tests are skipped when there are no XPU
+    available or number of xPUs is more than one.
+    """
+    return unittest.skipUnless(torch.xpu.device_count() == 1, "test requires a XPU")(test_case)
+def require_multi_device(test_case):
+    """
+    Decorator marking a test that requires a multi-device setup. These tests are skipped on a machine without multiple
+    devices.
+    """
+    return unittest.skipUnless(device_count > 1, "test requires multiple hardware accelerators")(test_case)
+def require_multi_gpu(test_case):
+    """
+    Decorator marking a test that requires a multi-GPU setup. These tests are skipped on a machine without multiple
+    GPUs.
+    """
+    return unittest.skipUnless(torch.cuda.device_count() > 1, "test requires multiple GPUs")(test_case)
+def require_multi_xpu(test_case):
+    """
+    Decorator marking a test that requires a multi-XPU setup. These tests are skipped on a machine without multiple
+    XPUs.
+    """
+    return unittest.skipUnless(torch.xpu.device_count() > 1, "test requires multiple XPUs")(test_case)
+def require_multi_gpu_or_xpu(test_case):
+    """
+    Decorator marking a test that requires a multi-GPU setup. These tests are skipped on a machine without multiple
+    GPUs or XPUs.
+    """
+    return unittest.skipUnless(
+        (is_cuda_available() or is_xpu_available()) and device_count > 1, "test requires multiple GPUs or XPUs"
+    )(test_case)
+def require_deepspeed(test_case):
+    """
+    Decorator marking a test that requires DeepSpeed installed. These tests are skipped when DeepSpeed isn't installed
+    """
+    return unittest.skipUnless(is_deepspeed_available(), "test requires DeepSpeed")(test_case)
+def require_tp(test_case):
+    """
+    Decorator marking a test that requires TP installed. These tests are skipped when TP isn't installed
+    """
+    return unittest.skipUnless(
+        is_torch_version(">=", "2.3.0") and compare_versions("transformers", ">=", "4.52.0"),
+        "test requires torch version >= 2.3.0 and transformers version >= 4.52.0",
+    )(test_case)
+def require_torch_min_version(test_case=None, version=None):
+    """
+    Decorator marking that a test requires a particular torch version to be tested. These tests are skipped when an
+    installed torch version is less than the required one.
+    """
+    if test_case is None:
+        return partial(require_torch_min_version, version=version)
+    return unittest.skipUnless(is_torch_version(">=", version), f"test requires torch version >= {version}")(test_case)
+def require_tensorboard(test_case):
+    """
+    Decorator marking a test that requires tensorboard installed. These tests are skipped when tensorboard isn't
+    installed
+    """
+    return unittest.skipUnless(is_tensorboard_available(), "test requires Tensorboard")(test_case)
+def require_wandb(test_case):
+    """
+    Decorator marking a test that requires wandb installed. These tests are skipped when wandb isn't installed
+    """
+    return unittest.skipUnless(is_wandb_available(), "test requires wandb")(test_case)
+def require_trackio(test_case):
+    """
+    Decorator marking a test that requires trackio installed. These tests are skipped when trackio isn't installed
+    """
+    return unittest.skipUnless(is_trackio_available(), "test requires trackio")(test_case)
+def require_comet_ml(test_case):
+    """
+    Decorator marking a test that requires comet_ml installed. These tests are skipped when comet_ml isn't installed
+    """
+    return unittest.skipUnless(is_comet_ml_available(), "test requires comet_ml")(test_case)
+def require_aim(test_case):
+    """
+    Decorator marking a test that requires aim installed. These tests are skipped when aim isn't installed
+    """
+    return unittest.skipUnless(is_aim_available(), "test requires aim")(test_case)
+def require_clearml(test_case):
+    """
+    Decorator marking a test that requires clearml installed. These tests are skipped when clearml isn't installed
+    """
+    return unittest.skipUnless(is_clearml_available(), "test requires clearml")(test_case)
+def require_dvclive(test_case):
+    """
+    Decorator marking a test that requires dvclive installed. These tests are skipped when dvclive isn't installed
+    """
+    return unittest.skipUnless(is_dvclive_available(), "test requires dvclive")(test_case)
+def require_swanlab(test_case):
+    """
+    Decorator marking a test that requires swanlab installed. These tests are skipped when swanlab isn't installed
+    """
+    return unittest.skipUnless(is_swanlab_available(), "test requires swanlab")(test_case)
+def require_pandas(test_case):
+    """
+    Decorator marking a test that requires pandas installed. These tests are skipped when pandas isn't installed
+    """
+    return unittest.skipUnless(is_pandas_available(), "test requires pandas")(test_case)
+def require_mlflow(test_case):
+    """
+    Decorator marking a test that requires mlflow installed. These tests are skipped when mlflow isn't installed
+    """
+    return unittest.skipUnless(is_mlflow_available(), "test requires mlflow")(test_case)
+def require_pippy(test_case):
+    """
+    Decorator marking a test that requires pippy installed. These tests are skipped when pippy isn't installed It is
+    also checked if the test is running on a Gaudi1 device which doesn't support pippy.
+    """
+    return unittest.skipUnless(is_pippy_available() and not is_habana_gaudi1(), "test requires pippy")(test_case)
+def require_import_timer(test_case):
+    """
+    Decorator marking a test that requires tuna interpreter installed. These tests are skipped when tuna isn't
+    installed
+    """
+    return unittest.skipUnless(is_import_timer_available(), "test requires tuna interpreter")(test_case)
+def require_transformer_engine(test_case):
+    """
+    Decorator marking a test that requires transformers engine installed. These tests are skipped when transformers
+    engine isn't installed
+    """
+    return unittest.skipUnless(is_transformer_engine_available(), "test requires transformers engine")(test_case)
+def require_transformer_engine_mxfp8(test_case):
+    """
+    Decorator marking a test that requires transformers engine MXFP8 block scaling available. These tests are skipped
+    when transformers engine MXFP8 block scaling isn't available
+    """
+    return unittest.skipUnless(
+        is_transformer_engine_mxfp8_available(), "test requires transformers engine MXFP8 block scaling"
+    )(test_case)
+def require_torchao(test_case):
+    """
+    Decorator marking a test that requires torchao installed. These tests are skipped when torchao isn't installed
+    """
+    return unittest.skipUnless(is_torchao_available(), "test requires torchao")(test_case)
+def require_matplotlib(test_case):
+    """
+    Decorator marking a test that requires matplotlib installed. These tests are skipped when matplotlib isn't
+    installed
+    """
+    return unittest.skipUnless(is_matplotlib_available(), "test requires matplotlib")(test_case)
+_atleast_one_tracker_available = (
+    any([is_wandb_available(), is_tensorboard_available(), is_trackio_available(), is_swanlab_available()])
+    and not is_comet_ml_available()
+)
+def require_trackers(test_case):
+    """
+    Decorator marking that a test requires at least one tracking library installed. These tests are skipped when none
+    are installed
+    """
+    return unittest.skipUnless(
+        _atleast_one_tracker_available,
+        "test requires at least one tracker to be available and for `comet_ml` to not be installed",
+    )(test_case)
+def require_torchdata_stateful_dataloader(test_case):
+    """
+    Decorator marking a test that requires torchdata.stateful_dataloader.
+    These tests are skipped when torchdata with stateful_dataloader module isn't installed.
+    """
+    return unittest.skipUnless(
+        is_torchdata_stateful_dataloader_available(), "test requires torchdata.stateful_dataloader"
+    )(test_case)
+def run_first(test_case):
+    """
+    Decorator marking a test with order(1). When pytest-order plugin is installed, tests marked with this decorator are
+    guaranteed to run first.
+    This is especially useful in some test settings like on a Gaudi instance where a Gaudi device can only be used by a
+    single process at a time. So we make sure all tests that run in a subprocess are launched first, to avoid device
+    allocation conflicts.
+    If pytest is not installed, test will be returned as is.
+    """
+    if is_pytest_available():
+        import pytest
+        return pytest.mark.order(1)(test_case)
+    return test_case
+class TempDirTestCase(unittest.TestCase):
+    """
+    A TestCase class that keeps a single `tempfile.TemporaryDirectory` open for the duration of the class, wipes its
+    data at the start of a test, and then destroys it at the end of the TestCase.
+    Useful for when a class or API requires a single constant folder throughout it's use, such as Weights and Biases
+    The temporary directory location will be stored in `self.tmpdir`
+    """
+    clear_on_setup = True
+    @classmethod
+    def setUpClass(cls):
+        "Creates a `tempfile.TemporaryDirectory` and stores it in `cls.tmpdir`"
+        cls.tmpdir = Path(tempfile.mkdtemp())
+    @classmethod
+    def tearDownClass(cls):
+        "Remove `cls.tmpdir` after test suite has finished"
+        if os.path.exists(cls.tmpdir):
+            shutil.rmtree(cls.tmpdir)
+    def setUp(self):
+        "Destroy all contents in `self.tmpdir`, but not `self.tmpdir`"
+        if self.clear_on_setup:
+            for path in self.tmpdir.glob("**/*"):
+                if path.is_file():
+                    path.unlink()
+                elif path.is_dir():
+                    shutil.rmtree(path)
+class AccelerateTestCase(unittest.TestCase):
+    """
+    A TestCase class that will reset the accelerator state at the end of every test. Every test that checks or utilizes
+    the `AcceleratorState` class should inherit from this to avoid silent failures due to state being shared between
+    tests.
+    """
+    def tearDown(self):
+        super().tearDown()
+        # Reset the state of the AcceleratorState singleton.
+        AcceleratorState._reset_state(True)
+class MockingTestCase(unittest.TestCase):
+    """
+    A TestCase class designed to dynamically add various mockers that should be used in every test, mimicking the
+    behavior of a class-wide mock when defining one normally will not do.
+    Useful when a mock requires specific information available only initialized after `TestCase.setUpClass`, such as
+    setting an environment variable with that information.
+    The `add_mocks` function should be ran at the end of a `TestCase`'s `setUp` function, after a call to
+    `super().setUp()` such as:
+    ```python
+    def setUp(self):
+        super().setUp()
+        mocks = mock.patch.dict(os.environ, {"SOME_ENV_VAR", "SOME_VALUE"})
+        self.add_mocks(mocks)
+    ```
+    """
+    def add_mocks(self, mocks: Union[mock.Mock, list[mock.Mock]]):
+        """
+        Add custom mocks for tests that should be repeated on each test. Should be called during
+        `MockingTestCase.setUp`, after `super().setUp()`.
+        Args:
+            mocks (`mock.Mock` or list of `mock.Mock`):
+                Mocks that should be added to the `TestCase` after `TestCase.setUpClass` has been run
+        """
+        self.mocks = mocks if isinstance(mocks, (tuple, list)) else [mocks]
+        for m in self.mocks:
+            m.start()
+            self.addCleanup(m.stop)
+def are_the_same_tensors(tensor):
+    state = AcceleratorState()
+    tensor = tensor[None].clone().to(state.device)
+    tensors = gather(tensor).cpu()
+    tensor = tensor[0].cpu()
+    for i in range(tensors.shape[0]):
+        if not torch.equal(tensors[i], tensor):
+            return False
+    return True
+class _RunOutput:
+    def __init__(self, returncode, stdout, stderr):
+        self.returncode = returncode
+        self.stdout = stdout
+        self.stderr = stderr
+async def _read_stream(stream, callback):
+    while True:
+        line = await stream.readline()
+        if line:
+            callback(line)
+        else:
+            break
+async def _stream_subprocess(cmd, env=None, stdin=None, timeout=None, quiet=False, echo=False) -> _RunOutput:
+    if echo:
+        print("\nRunning: ", " ".join(cmd))
+    p = await asyncio.create_subprocess_exec(
+        cmd[0],
+        *cmd[1:],
+        stdin=stdin,
+        stdout=asyncio.subprocess.PIPE,
+        stderr=asyncio.subprocess.PIPE,
+        env=env,
+    )
+    # note: there is a warning for a possible deadlock when using `wait` with huge amounts of data in the pipe
+    # https://docs.python.org/3/library/asyncio-subprocess.html#asyncio.asyncio.subprocess.Process.wait
+    #
+    # If it starts hanging, will need to switch to the following code. The problem is that no data
+    # will be seen until it's done and if it hangs for example there will be no debug info.
+    # out, err = await p.communicate()
+    # return _RunOutput(p.returncode, out, err)
+    out = []
+    err = []
+    def tee(line, sink, pipe, label=""):
+        line = line.decode("utf-8").rstrip()
+        sink.append(line)
+        if not quiet:
+            print(label, line, file=pipe)
+    # XXX: the timeout doesn't seem to make any difference here
+    await asyncio.wait(
+        [
+            asyncio.create_task(_read_stream(p.stdout, lambda l: tee(l, out, sys.stdout, label="stdout:"))),
+            asyncio.create_task(_read_stream(p.stderr, lambda l: tee(l, err, sys.stderr, label="stderr:"))),
+        ],
+        timeout=timeout,
+    )
+    return _RunOutput(await p.wait(), out, err)
+def execute_subprocess_async(cmd: list, env=None, stdin=None, timeout=180, quiet=False, echo=True) -> _RunOutput:
+    # Cast every path in `cmd` to a string
+    for i, c in enumerate(cmd):
+        if isinstance(c, Path):
+            cmd[i] = str(c)
+    loop = asyncio.get_event_loop()
+    result = loop.run_until_complete(
+        _stream_subprocess(cmd, env=env, stdin=stdin, timeout=timeout, quiet=quiet, echo=echo)
+    )
+    cmd_str = " ".join(cmd)
+    if result.returncode > 0:
+        stderr = "\n".join(result.stderr)
+        raise RuntimeError(
+            f"'{cmd_str}' failed with returncode {result.returncode}\n\n"
+            f"The combined stderr from workers follows:\n{stderr}"
+        )
+    return result
+def pytest_xdist_worker_id():
+    """
+    Returns an int value of worker's numerical id under `pytest-xdist`'s concurrent workers `pytest -n N` regime, or 0
+    if `-n 1` or `pytest-xdist` isn't being used.
+    """
+    worker = os.environ.get("PYTEST_XDIST_WORKER", "gw0")
+    worker = re.sub(r"^gw", "", worker, 0, re.M)
+    return int(worker)
+def get_torch_dist_unique_port():
+    """
+    Returns a port number that can be fed to `torch.distributed.launch`'s `--master_port` argument.
+    Under `pytest-xdist` it adds a delta number based on a worker id so that concurrent tests don't try to use the same
+    port at once.
+    """
+    port = 29500
+    uniq_delta = pytest_xdist_worker_id()
+    return port + uniq_delta
+class SubprocessCallException(Exception):
+    pass
+def run_command(command: list[str], return_stdout=False, env=None):
+    """
+    Runs `command` with `subprocess.check_output` and will potentially return the `stdout`. Will also properly capture
+    if an error occurred while running `command`
+    """
+    # Cast every path in `command` to a string
+    for i, c in enumerate(command):
+        if isinstance(c, Path):
+            command[i] = str(c)
+    if env is None:
+        env = os.environ.copy()
+    try:
+        output = subprocess.check_output(command, stderr=subprocess.STDOUT, env=env)
+        if return_stdout:
+            if hasattr(output, "decode"):
+                output = output.decode("utf-8")
+            return output
+    except subprocess.CalledProcessError as e:
+        raise SubprocessCallException(
+            f"Command `{' '.join(command)}` failed with the following error:\n\n{e.output.decode()}"
+        ) from e
+def path_in_accelerate_package(*components: str) -> Path:
+    """
+    Get a path within the `accelerate` package's directory.
+    Args:
+        *components: Components of the path to join after the package directory.
+    Returns:
+        `Path`: The path to the requested file or directory.
+    """
+    accelerate_package_dir = Path(inspect.getfile(accelerate)).parent
+    return accelerate_package_dir.joinpath(*components)
+@contextmanager
+def assert_exception(exception_class: Exception, msg: Optional[str] = None) -> bool:
+    """
+    Context manager to assert that the right `Exception` class was raised.
+    If `msg` is provided, will check that the message is contained in the raised exception.
+    """
+    was_ran = False
+    try:
+        yield
+        was_ran = True
+    except Exception as e:
+        assert isinstance(e, exception_class), f"Expected exception of type {exception_class} but got {type(e)}"
+        if msg is not None:
+            assert msg in str(e), f"Expected message '{msg}' to be in exception but got '{str(e)}'"
+    if was_ran:
+        raise AssertionError(f"Expected exception of type {exception_class} but ran without issue.")
+def capture_call_output(func, *args, **kwargs):
+    """
+    Takes in a `func` with `args` and `kwargs` and returns the captured stdout as a string
+    """
+    captured_output = io.StringIO()
+    original_stdout = sys.stdout
+    try:
+        sys.stdout = captured_output
+        func(*args, **kwargs)
+    except Exception as e:
+        raise e
+    finally:
+        sys.stdout = original_stdout
+    return captured_output.getvalue()

Prism/LLaDA/LLaDA_Prism/.venv/lib/python3.12/site-packages/accelerate/test_utils/training.py ADDED Viewed

	@@ -0,0 +1,148 @@

+# Copyright 2021 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+import torch
+from torch.utils.data import DataLoader
+from accelerate.utils.dataclasses import DistributedType
+class RegressionDataset:
+    def __init__(self, a=2, b=3, length=64, seed=None):
+        rng = np.random.default_rng(seed)
+        self.length = length
+        self.x = rng.normal(size=(length,)).astype(np.float32)
+        self.y = a * self.x + b + rng.normal(scale=0.1, size=(length,)).astype(np.float32)
+    def __len__(self):
+        return self.length
+    def __getitem__(self, i):
+        return {"x": self.x[i], "y": self.y[i]}
+class RegressionModel(torch.nn.Module):
+    def __init__(self, a=0, b=0, double_output=False):
+        super().__init__()
+        self.a = torch.nn.Parameter(torch.tensor(a).float())
+        self.b = torch.nn.Parameter(torch.tensor(b).float())
+        self.first_batch = True
+    def forward(self, x=None):
+        if self.first_batch:
+            print(f"Model dtype: {self.a.dtype}, {self.b.dtype}. Input dtype: {x.dtype}")
+            self.first_batch = False
+        return x * self.a + self.b
+def mocked_dataloaders(accelerator, batch_size: int = 16):
+    from datasets import load_dataset
+    from transformers import AutoTokenizer
+    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+    data_files = {"train": "tests/test_samples/MRPC/train.csv", "validation": "tests/test_samples/MRPC/dev.csv"}
+    datasets = load_dataset("csv", data_files=data_files)
+    label_list = datasets["train"].unique("label")
+    label_to_id = {v: i for i, v in enumerate(label_list)}
+    def tokenize_function(examples):
+        # max_length=None => use the model max length (it's actually the default)
+        outputs = tokenizer(
+            examples["sentence1"], examples["sentence2"], truncation=True, max_length=None, padding="max_length"
+        )
+        if "label" in examples:
+            outputs["labels"] = [label_to_id[l] for l in examples["label"]]
+        return outputs
+    # Apply the method we just defined to all the examples in all the splits of the dataset
+    tokenized_datasets = datasets.map(
+        tokenize_function,
+        batched=True,
+        remove_columns=["sentence1", "sentence2", "label"],
+    )
+    def collate_fn(examples):
+        # On TPU it's best to pad everything to the same length or training will be very slow.
+        if accelerator.distributed_type == DistributedType.XLA:
+            return tokenizer.pad(examples, padding="max_length", max_length=128, return_tensors="pt")
+        return tokenizer.pad(examples, padding="longest", return_tensors="pt")
+    # Instantiate dataloaders.
+    train_dataloader = DataLoader(tokenized_datasets["train"], shuffle=True, collate_fn=collate_fn, batch_size=2)
+    eval_dataloader = DataLoader(tokenized_datasets["validation"], shuffle=False, collate_fn=collate_fn, batch_size=1)
+    return train_dataloader, eval_dataloader
+def mocked_dataloaders_for_autoregressive_models(accelerator, batch_size: int = 16):
+    from datasets import load_dataset
+    from transformers import AutoTokenizer
+    tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-360M")
+    tokenizer.pad_token = tokenizer.eos_token
+    data_files = {"train": "tests/test_samples/MRPC/train.csv", "validation": "tests/test_samples/MRPC/dev.csv"}
+    datasets = load_dataset("csv", data_files=data_files)
+    def tokenize_function(examples):
+        # max_length=None => use the model max length (it's actually the default)
+        outputs = tokenizer(examples["sentence1"], truncation=True, max_length=None, return_attention_mask=False)
+        return outputs
+    # Apply the method we just defined to all the examples in all the splits of the dataset
+    # starting with the main process first:
+    with accelerator.main_process_first():
+        tokenized_datasets = datasets.map(
+            tokenize_function,
+            batched=True,
+            remove_columns=["sentence1", "sentence2", "label"],
+        )
+    def collate_fn(examples):
+        # On TPU it's best to pad everything to the same length or training will be very slow.
+        max_length = (
+            128
+            if accelerator.distributed_type == DistributedType.XLA
+            else max([len(e["input_ids"]) for e in examples])
+        )
+        # When using mixed precision we want round multiples of 8/16
+        if accelerator.mixed_precision == "fp8":
+            pad_to_multiple_of = 16
+        elif accelerator.mixed_precision != "no":
+            pad_to_multiple_of = 8
+        else:
+            pad_to_multiple_of = None
+        batch = tokenizer.pad(
+            examples,
+            padding="max_length",
+            max_length=max_length + 1,
+            pad_to_multiple_of=pad_to_multiple_of,
+            return_tensors="pt",
+        )
+        batch["labels"] = batch["input_ids"][:, 1:]
+        batch["input_ids"] = batch["input_ids"][:, :-1]
+        batch["labels"] = torch.where(batch["labels"] == tokenizer.pad_token_id, -100, batch["labels"])
+        return batch
+    # Instantiate dataloaders.
+    train_dataloader = DataLoader(tokenized_datasets["train"], shuffle=False, collate_fn=collate_fn, batch_size=2)
+    eval_dataloader = DataLoader(tokenized_datasets["validation"], shuffle=False, collate_fn=collate_fn, batch_size=1)
+    return train_dataloader, eval_dataloader

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (1.14 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/abstract_nodes.cpython-312.pyc ADDED Viewed

Binary file (1.14 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/algorithms.cpython-312.pyc ADDED Viewed

Binary file (8.64 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/approximations.cpython-312.pyc ADDED Viewed

Binary file (9.03 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/ast.cpython-312.pyc ADDED Viewed

Binary file (74 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cfunctions.cpython-312.pyc ADDED Viewed

Binary file (19.1 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cnodes.cpython-312.pyc ADDED Viewed

Binary file (5.62 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cutils.cpython-312.pyc ADDED Viewed

Binary file (813 Bytes). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/cxxnodes.cpython-312.pyc ADDED Viewed

Binary file (798 Bytes). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/fnodes.cpython-312.pyc ADDED Viewed

Binary file (26.2 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/futils.cpython-312.pyc ADDED Viewed

Binary file (2.52 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/matrix_nodes.cpython-312.pyc ADDED Viewed

Binary file (3.12 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/numpy_nodes.cpython-312.pyc ADDED Viewed

Binary file (7.93 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/pynodes.cpython-312.pyc ADDED Viewed

Binary file (744 Bytes). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/pyutils.cpython-312.pyc ADDED Viewed

Binary file (1.49 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/rewriting.cpython-312.pyc ADDED Viewed

Binary file (18.4 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/__pycache__/scipy_nodes.cpython-312.pyc ADDED Viewed

Binary file (4.44 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__init__.py ADDED Viewed

File without changes

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (211 Bytes). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_abstract_nodes.cpython-312.pyc ADDED Viewed

Binary file (1.31 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_algorithms.cpython-312.pyc ADDED Viewed

Binary file (10.9 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_applications.cpython-312.pyc ADDED Viewed

Binary file (3.57 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_approximations.cpython-312.pyc ADDED Viewed

Binary file (3.54 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_ast.cpython-312.pyc ADDED Viewed

Binary file (48.4 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_cfunctions.cpython-312.pyc ADDED Viewed

Binary file (12.5 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_cnodes.cpython-312.pyc ADDED Viewed

Binary file (6.72 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_cxxnodes.cpython-312.pyc ADDED Viewed

Binary file (846 Bytes). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_fnodes.cpython-312.pyc ADDED Viewed

Binary file (11 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_matrix_nodes.cpython-312.pyc ADDED Viewed

Binary file (4.14 kB). View file

URSA/.venv_ursa/lib/python3.12/site-packages/sympy/codegen/tests/__pycache__/test_numpy_nodes.cpython-312.pyc ADDED Viewed

Binary file (4.72 kB). View file