Instructions to use jdopensource/JoyAI-LLM-Flash-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jdopensource/JoyAI-LLM-Flash-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jdopensource/JoyAI-LLM-Flash-FP8", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("jdopensource/JoyAI-LLM-Flash-FP8", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jdopensource/JoyAI-LLM-Flash-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jdopensource/JoyAI-LLM-Flash-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jdopensource/JoyAI-LLM-Flash-FP8

SGLang

How to use jdopensource/JoyAI-LLM-Flash-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jdopensource/JoyAI-LLM-Flash-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jdopensource/JoyAI-LLM-Flash-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jdopensource/JoyAI-LLM-Flash-FP8 with Docker Model Runner:
```
docker model run hf.co/jdopensource/JoyAI-LLM-Flash-FP8
```

Mingke977 commited on Feb 25

Commit

f0c5df7

verified ·

1 Parent(s): ca4b037

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

venv/lib/python3.10/site-packages/fsspec/__pycache__/__init__.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/_version.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/archive.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/asyn.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/caching.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/callbacks.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/compression.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/config.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/conftest.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/core.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/dircache.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/exceptions.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/fuse.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/generic.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/gui.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/json.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/mapping.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/parquet.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/registry.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/spec.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/transaction.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/__pycache__/utils.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/fsspec/implementations/__init__.py +0 -0
venv/lib/python3.10/site-packages/fsspec/implementations/arrow.py +312 -0
venv/lib/python3.10/site-packages/fsspec/implementations/asyn_wrapper.py +124 -0
venv/lib/python3.10/site-packages/fsspec/implementations/cache_mapper.py +75 -0
venv/lib/python3.10/site-packages/fsspec/implementations/cache_metadata.py +231 -0
venv/lib/python3.10/site-packages/fsspec/implementations/cached.py +1021 -0
venv/lib/python3.10/site-packages/fsspec/implementations/chained.py +23 -0
venv/lib/python3.10/site-packages/fsspec/implementations/dask.py +152 -0
venv/lib/python3.10/site-packages/fsspec/implementations/data.py +57 -0
venv/lib/python3.10/site-packages/fsspec/implementations/dbfs.py +496 -0
venv/lib/python3.10/site-packages/fsspec/implementations/dirfs.py +389 -0
venv/lib/python3.10/site-packages/fsspec/implementations/ftp.py +437 -0
venv/lib/python3.10/site-packages/fsspec/implementations/gist.py +241 -0
venv/lib/python3.10/site-packages/fsspec/implementations/git.py +114 -0
venv/lib/python3.10/site-packages/fsspec/implementations/github.py +333 -0
venv/lib/python3.10/site-packages/fsspec/implementations/http.py +897 -0
venv/lib/python3.10/site-packages/fsspec/implementations/http_sync.py +937 -0
venv/lib/python3.10/site-packages/fsspec/implementations/libarchive.py +213 -0
venv/lib/python3.10/site-packages/fsspec/parquet.py +572 -0
venv/lib/python3.10/site-packages/fsspec/registry.py +333 -0
venv/lib/python3.10/site-packages/fsspec/spec.py +2281 -0
venv/lib/python3.10/site-packages/fsspec/transaction.py +90 -0
venv/lib/python3.10/site-packages/fsspec/utils.py +748 -0
venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/INSTALLER +1 -0
venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/METADATA +625 -0
venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/RECORD +68 -0
venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/WHEEL +4 -0
venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/licenses/LICENSE.md +27 -0

venv/lib/python3.10/site-packages/fsspec/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (1.6 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/_version.cpython-310.pyc ADDED Viewed

Binary file (763 Bytes). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/archive.cpython-310.pyc ADDED Viewed

Binary file (2.99 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/asyn.cpython-310.pyc ADDED Viewed

Binary file (29.6 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/caching.cpython-310.pyc ADDED Viewed

Binary file (26.4 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/callbacks.cpython-310.pyc ADDED Viewed

Binary file (11 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/compression.cpython-310.pyc ADDED Viewed

Binary file (5.38 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/config.cpython-310.pyc ADDED Viewed

Binary file (3.93 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/conftest.cpython-310.pyc ADDED Viewed

Binary file (3.86 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/core.cpython-310.pyc ADDED Viewed

Binary file (22.6 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/dircache.cpython-310.pyc ADDED Viewed

Binary file (3.52 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/exceptions.cpython-310.pyc ADDED Viewed

Binary file (841 Bytes). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/fuse.cpython-310.pyc ADDED Viewed

Binary file (10.3 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/generic.cpython-310.pyc ADDED Viewed

Binary file (12.6 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/gui.cpython-310.pyc ADDED Viewed

Binary file (14.7 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/json.cpython-310.pyc ADDED Viewed

Binary file (4.65 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/mapping.cpython-310.pyc ADDED Viewed

Binary file (9.17 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/parquet.cpython-310.pyc ADDED Viewed

Binary file (14.5 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/registry.cpython-310.pyc ADDED Viewed

Binary file (9.65 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/spec.cpython-310.pyc ADDED Viewed

Binary file (67.7 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/transaction.cpython-310.pyc ADDED Viewed

Binary file (3.31 kB). View file

venv/lib/python3.10/site-packages/fsspec/__pycache__/utils.cpython-310.pyc ADDED Viewed

Binary file (21 kB). View file

venv/lib/python3.10/site-packages/fsspec/implementations/__init__.py ADDED Viewed

File without changes

venv/lib/python3.10/site-packages/fsspec/implementations/arrow.py ADDED Viewed

	@@ -0,0 +1,312 @@

+import errno
+import io
+import os
+import secrets
+import shutil
+from contextlib import suppress
+from functools import cached_property, wraps
+from urllib.parse import parse_qs
+from fsspec.spec import AbstractFileSystem
+from fsspec.utils import (
+    get_package_version_without_import,
+    infer_storage_options,
+    mirror_from,
+    tokenize,
+)
+def wrap_exceptions(func):
+    @wraps(func)
+    def wrapper(*args, **kwargs):
+        try:
+            return func(*args, **kwargs)
+        except OSError as exception:
+            if not exception.args:
+                raise
+            message, *args = exception.args
+            if isinstance(message, str) and "does not exist" in message:
+                raise FileNotFoundError(errno.ENOENT, message) from exception
+            else:
+                raise
+    return wrapper
+PYARROW_VERSION = None
+class ArrowFSWrapper(AbstractFileSystem):
+    """FSSpec-compatible wrapper of pyarrow.fs.FileSystem.
+    Parameters
+    ----------
+    fs : pyarrow.fs.FileSystem
+    """
+    root_marker = "/"
+    def __init__(self, fs, **kwargs):
+        global PYARROW_VERSION
+        PYARROW_VERSION = get_package_version_without_import("pyarrow")
+        self.fs = fs
+        super().__init__(**kwargs)
+    @property
+    def protocol(self):
+        return self.fs.type_name
+    @cached_property
+    def fsid(self):
+        return "hdfs_" + tokenize(self.fs.host, self.fs.port)
+    @classmethod
+    def _strip_protocol(cls, path):
+        ops = infer_storage_options(path)
+        path = ops["path"]
+        if path.startswith("//"):
+            # special case for "hdfs://path" (without the triple slash)
+            path = path[1:]
+        return path
+    def ls(self, path, detail=False, **kwargs):
+        path = self._strip_protocol(path)
+        from pyarrow.fs import FileSelector
+        try:
+            entries = [
+                self._make_entry(entry)
+                for entry in self.fs.get_file_info(FileSelector(path))
+            ]
+        except (FileNotFoundError, NotADirectoryError):
+            entries = [self.info(path, **kwargs)]
+        if detail:
+            return entries
+        else:
+            return [entry["name"] for entry in entries]
+    def info(self, path, **kwargs):
+        path = self._strip_protocol(path)
+        [info] = self.fs.get_file_info([path])
+        return self._make_entry(info)
+    def exists(self, path):
+        path = self._strip_protocol(path)
+        try:
+            self.info(path)
+        except FileNotFoundError:
+            return False
+        else:
+            return True
+    def _make_entry(self, info):
+        from pyarrow.fs import FileType
+        if info.type is FileType.Directory:
+            kind = "directory"
+        elif info.type is FileType.File:
+            kind = "file"
+        elif info.type is FileType.NotFound:
+            raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), info.path)
+        else:
+            kind = "other"
+        return {
+            "name": info.path,
+            "size": info.size,
+            "type": kind,
+            "mtime": info.mtime,
+        }
+    @wrap_exceptions
+    def cp_file(self, path1, path2, **kwargs):
+        path1 = self._strip_protocol(path1).rstrip("/")
+        path2 = self._strip_protocol(path2).rstrip("/")
+        with self._open(path1, "rb") as lstream:
+            tmp_fname = f"{path2}.tmp.{secrets.token_hex(6)}"
+            try:
+                with self.open(tmp_fname, "wb") as rstream:
+                    shutil.copyfileobj(lstream, rstream)
+                self.fs.move(tmp_fname, path2)
+            except BaseException:
+                with suppress(FileNotFoundError):
+                    self.fs.delete_file(tmp_fname)
+                raise
+    @wrap_exceptions
+    def mv(self, path1, path2, **kwargs):
+        path1 = self._strip_protocol(path1).rstrip("/")
+        path2 = self._strip_protocol(path2).rstrip("/")
+        self.fs.move(path1, path2)
+    @wrap_exceptions
+    def rm_file(self, path):
+        path = self._strip_protocol(path)
+        self.fs.delete_file(path)
+    @wrap_exceptions
+    def rm(self, path, recursive=False, maxdepth=None):
+        path = self._strip_protocol(path).rstrip("/")
+        if self.isdir(path):
+            if recursive:
+                self.fs.delete_dir(path)
+            else:
+                raise ValueError("Can't delete directories without recursive=False")
+        else:
+            self.fs.delete_file(path)
+    @wrap_exceptions
+    def _open(self, path, mode="rb", block_size=None, seekable=True, **kwargs):
+        if mode == "rb":
+            if seekable:
+                method = self.fs.open_input_file
+            else:
+                method = self.fs.open_input_stream
+        elif mode == "wb":
+            method = self.fs.open_output_stream
+        elif mode == "ab":
+            method = self.fs.open_append_stream
+        else:
+            raise ValueError(f"unsupported mode for Arrow filesystem: {mode!r}")
+        _kwargs = {}
+        if mode != "rb" or not seekable:
+            if int(PYARROW_VERSION.split(".")[0]) >= 4:
+                # disable compression auto-detection
+                _kwargs["compression"] = None
+        stream = method(path, **_kwargs)
+        return ArrowFile(self, stream, path, mode, block_size, **kwargs)
+    @wrap_exceptions
+    def mkdir(self, path, create_parents=True, **kwargs):
+        path = self._strip_protocol(path)
+        if create_parents:
+            self.makedirs(path, exist_ok=True)
+        else:
+            self.fs.create_dir(path, recursive=False)
+    @wrap_exceptions
+    def makedirs(self, path, exist_ok=False):
+        path = self._strip_protocol(path)
+        self.fs.create_dir(path, recursive=True)
+    @wrap_exceptions
+    def rmdir(self, path):
+        path = self._strip_protocol(path)
+        self.fs.delete_dir(path)
+    @wrap_exceptions
+    def modified(self, path):
+        path = self._strip_protocol(path)
+        return self.fs.get_file_info(path).mtime
+    def cat_file(self, path, start=None, end=None, **kwargs):
+        kwargs.setdefault("seekable", start not in [None, 0])
+        return super().cat_file(path, start=None, end=None, **kwargs)
+    def get_file(self, rpath, lpath, **kwargs):
+        kwargs.setdefault("seekable", False)
+        super().get_file(rpath, lpath, **kwargs)
+@mirror_from(
+    "stream",
+    [
+        "read",
+        "seek",
+        "tell",
+        "write",
+        "readable",
+        "writable",
+        "close",
+        "seekable",
+    ],
+)
+class ArrowFile(io.IOBase):
+    def __init__(self, fs, stream, path, mode, block_size=None, **kwargs):
+        self.path = path
+        self.mode = mode
+        self.fs = fs
+        self.stream = stream
+        self.blocksize = self.block_size = block_size
+        self.kwargs = kwargs
+    def __enter__(self):
+        return self
+    @property
+    def size(self):
+        if self.stream.seekable():
+            return self.stream.size()
+        return None
+    def __exit__(self, *args):
+        return self.close()
+class HadoopFileSystem(ArrowFSWrapper):
+    """A wrapper on top of the pyarrow.fs.HadoopFileSystem
+    to connect it's interface with fsspec"""
+    protocol = "hdfs"
+    def __init__(
+        self,
+        host="default",
+        port=0,
+        user=None,
+        kerb_ticket=None,
+        replication=3,
+        extra_conf=None,
+        **kwargs,
+    ):
+        """
+        Parameters
+        ----------
+        host: str
+            Hostname, IP or "default" to try to read from Hadoop config
+        port: int
+            Port to connect on, or default from Hadoop config if 0
+        user: str or None
+            If given, connect as this username
+        kerb_ticket: str or None
+            If given, use this ticket for authentication
+        replication: int
+            set replication factor of file for write operations. default value is 3.
+        extra_conf: None or dict
+            Passed on to HadoopFileSystem
+        """
+        from pyarrow.fs import HadoopFileSystem
+        fs = HadoopFileSystem(
+            host=host,
+            port=port,
+            user=user,
+            kerb_ticket=kerb_ticket,
+            replication=replication,
+            extra_conf=extra_conf,
+        )
+        super().__init__(fs=fs, **kwargs)
+    @staticmethod
+    def _get_kwargs_from_urls(path):
+        ops = infer_storage_options(path)
+        out = {}
+        if ops.get("host", None):
+            out["host"] = ops["host"]
+        if ops.get("username", None):
+            out["user"] = ops["username"]
+        if ops.get("port", None):
+            out["port"] = ops["port"]
+        if ops.get("url_query", None):
+            queries = parse_qs(ops["url_query"])
+            if queries.get("replication", None):
+                out["replication"] = int(queries["replication"][0])
+        return out

venv/lib/python3.10/site-packages/fsspec/implementations/asyn_wrapper.py ADDED Viewed

	@@ -0,0 +1,124 @@

+import asyncio
+import functools
+import inspect
+import fsspec
+from fsspec.asyn import AsyncFileSystem, running_async
+from .chained import ChainedFileSystem
+def async_wrapper(func, obj=None, semaphore=None):
+    """
+    Wraps a synchronous function to make it awaitable.
+    Parameters
+    ----------
+    func : callable
+        The synchronous function to wrap.
+    obj : object, optional
+        The instance to bind the function to, if applicable.
+    semaphore : asyncio.Semaphore, optional
+        A semaphore to limit concurrent calls.
+    Returns
+    -------
+    coroutine
+        An awaitable version of the function.
+    """
+    @functools.wraps(func)
+    async def wrapper(*args, **kwargs):
+        if semaphore:
+            async with semaphore:
+                return await asyncio.to_thread(func, *args, **kwargs)
+        return await asyncio.to_thread(func, *args, **kwargs)
+    return wrapper
+class AsyncFileSystemWrapper(AsyncFileSystem, ChainedFileSystem):
+    """
+    A wrapper class to convert a synchronous filesystem into an asynchronous one.
+    This class takes an existing synchronous filesystem implementation and wraps all
+    its methods to provide an asynchronous interface.
+    Parameters
+    ----------
+    sync_fs : AbstractFileSystem
+        The synchronous filesystem instance to wrap.
+    """
+    protocol = "asyncwrapper", "async_wrapper"
+    cachable = False
+    def __init__(
+        self,
+        fs=None,
+        asynchronous=None,
+        target_protocol=None,
+        target_options=None,
+        semaphore=None,
+        max_concurrent_tasks=None,
+        **kwargs,
+    ):
+        if asynchronous is None:
+            asynchronous = running_async()
+        super().__init__(asynchronous=asynchronous, **kwargs)
+        if fs is not None:
+            self.sync_fs = fs
+        else:
+            self.sync_fs = fsspec.filesystem(target_protocol, **target_options)
+        self.protocol = self.sync_fs.protocol
+        self.semaphore = semaphore
+        self._wrap_all_sync_methods()
+    @property
+    def fsid(self):
+        return f"async_{self.sync_fs.fsid}"
+    def _wrap_all_sync_methods(self):
+        """
+        Wrap all synchronous methods of the underlying filesystem with asynchronous versions.
+        """
+        excluded_methods = {"open"}
+        for method_name in dir(self.sync_fs):
+            if method_name.startswith("_") or method_name in excluded_methods:
+                continue
+            attr = inspect.getattr_static(self.sync_fs, method_name)
+            if isinstance(attr, property):
+                continue
+            method = getattr(self.sync_fs, method_name)
+            if callable(method) and not inspect.iscoroutinefunction(method):
+                async_method = async_wrapper(method, obj=self, semaphore=self.semaphore)
+                setattr(self, f"_{method_name}", async_method)
+    @classmethod
+    def wrap_class(cls, sync_fs_class):
+        """
+        Create a new class that can be used to instantiate an AsyncFileSystemWrapper
+        with lazy instantiation of the underlying synchronous filesystem.
+        Parameters
+        ----------
+        sync_fs_class : type
+            The class of the synchronous filesystem to wrap.
+        Returns
+        -------
+        type
+            A new class that wraps the provided synchronous filesystem class.
+        """
+        class GeneratedAsyncFileSystemWrapper(cls):
+            def __init__(self, *args, **kwargs):
+                sync_fs = sync_fs_class(*args, **kwargs)
+                super().__init__(sync_fs)
+        GeneratedAsyncFileSystemWrapper.__name__ = (
+            f"Async{sync_fs_class.__name__}Wrapper"
+        )
+        return GeneratedAsyncFileSystemWrapper

venv/lib/python3.10/site-packages/fsspec/implementations/cache_mapper.py ADDED Viewed

	@@ -0,0 +1,75 @@

+from __future__ import annotations
+import abc
+import hashlib
+from fsspec.implementations.local import make_path_posix
+class AbstractCacheMapper(abc.ABC):
+    """Abstract super-class for mappers from remote URLs to local cached
+    basenames.
+    """
+    @abc.abstractmethod
+    def __call__(self, path: str) -> str: ...
+    def __eq__(self, other: object) -> bool:
+        # Identity only depends on class. When derived classes have attributes
+        # they will need to be included.
+        return isinstance(other, type(self))
+    def __hash__(self) -> int:
+        # Identity only depends on class. When derived classes have attributes
+        # they will need to be included.
+        return hash(type(self))
+class BasenameCacheMapper(AbstractCacheMapper):
+    """Cache mapper that uses the basename of the remote URL and a fixed number
+    of directory levels above this.
+    The default is zero directory levels, meaning different paths with the same
+    basename will have the same cached basename.
+    """
+    def __init__(self, directory_levels: int = 0):
+        if directory_levels < 0:
+            raise ValueError(
+                "BasenameCacheMapper requires zero or positive directory_levels"
+            )
+        self.directory_levels = directory_levels
+        # Separator for directories when encoded as strings.
+        self._separator = "_@_"
+    def __call__(self, path: str) -> str:
+        path = make_path_posix(path)
+        prefix, *bits = path.rsplit("/", self.directory_levels + 1)
+        if bits:
+            return self._separator.join(bits)
+        else:
+            return prefix  # No separator found, simple filename
+    def __eq__(self, other: object) -> bool:
+        return super().__eq__(other) and self.directory_levels == other.directory_levels
+    def __hash__(self) -> int:
+        return super().__hash__() ^ hash(self.directory_levels)
+class HashCacheMapper(AbstractCacheMapper):
+    """Cache mapper that uses a hash of the remote URL."""
+    def __call__(self, path: str) -> str:
+        return hashlib.sha256(path.encode()).hexdigest()
+def create_cache_mapper(same_names: bool) -> AbstractCacheMapper:
+    """Factory method to create cache mapper for backward compatibility with
+    ``CachingFileSystem`` constructor using ``same_names`` kwarg.
+    """
+    if same_names:
+        return BasenameCacheMapper()
+    else:
+        return HashCacheMapper()

venv/lib/python3.10/site-packages/fsspec/implementations/cache_metadata.py ADDED Viewed

	@@ -0,0 +1,231 @@

+from __future__ import annotations
+import os
+import pickle
+import time
+from typing import TYPE_CHECKING
+from fsspec.utils import atomic_write
+try:
+    import ujson as json
+except ImportError:
+    if not TYPE_CHECKING:
+        import json
+if TYPE_CHECKING:
+    from collections.abc import Iterator
+    from typing import Any, Literal, TypeAlias
+    from .cached import CachingFileSystem
+    Detail: TypeAlias = dict[str, Any]
+class CacheMetadata:
+    """Cache metadata.
+    All reading and writing of cache metadata is performed by this class,
+    accessing the cached files and blocks is not.
+    Metadata is stored in a single file per storage directory in JSON format.
+    For backward compatibility, also reads metadata stored in pickle format
+    which is converted to JSON when next saved.
+    """
+    def __init__(self, storage: list[str]):
+        """
+        Parameters
+        ----------
+        storage: list[str]
+            Directories containing cached files, must be at least one. Metadata
+            is stored in the last of these directories by convention.
+        """
+        if not storage:
+            raise ValueError("CacheMetadata expects at least one storage location")
+        self._storage = storage
+        self.cached_files: list[Detail] = [{}]
+        # Private attribute to force saving of metadata in pickle format rather than
+        # JSON for use in tests to confirm can read both pickle and JSON formats.
+        self._force_save_pickle = False
+    def _load(self, fn: str) -> Detail:
+        """Low-level function to load metadata from specific file"""
+        try:
+            with open(fn, "r") as f:
+                loaded = json.load(f)
+        except ValueError:
+            with open(fn, "rb") as f:
+                loaded = pickle.load(f)
+        for c in loaded.values():
+            if isinstance(c.get("blocks"), list):
+                c["blocks"] = set(c["blocks"])
+        return loaded
+    def _save(self, metadata_to_save: Detail, fn: str) -> None:
+        """Low-level function to save metadata to specific file"""
+        if self._force_save_pickle:
+            with atomic_write(fn) as f:
+                pickle.dump(metadata_to_save, f)
+        else:
+            with atomic_write(fn, mode="w") as f:
+                json.dump(metadata_to_save, f)
+    def _scan_locations(
+        self, writable_only: bool = False
+    ) -> Iterator[tuple[str, str, bool]]:
+        """Yield locations (filenames) where metadata is stored, and whether
+        writable or not.
+        Parameters
+        ----------
+        writable: bool
+            Set to True to only yield writable locations.
+        Returns
+        -------
+        Yields (str, str, bool)
+        """
+        n = len(self._storage)
+        for i, storage in enumerate(self._storage):
+            writable = i == n - 1
+            if writable_only and not writable:
+                continue
+            yield os.path.join(storage, "cache"), storage, writable
+    def check_file(
+        self, path: str, cfs: CachingFileSystem | None
+    ) -> Literal[False] | tuple[Detail, str]:
+        """If path is in cache return its details, otherwise return ``False``.
+        If the optional CachingFileSystem is specified then it is used to
+        perform extra checks to reject possible matches, such as if they are
+        too old.
+        """
+        for (fn, base, _), cache in zip(self._scan_locations(), self.cached_files):
+            if path not in cache:
+                continue
+            detail = cache[path].copy()
+            if cfs is not None:
+                if cfs.check_files and detail["uid"] != cfs.fs.ukey(path):
+                    # Wrong file as determined by hash of file properties
+                    continue
+                if cfs.expiry and time.time() - detail["time"] > cfs.expiry:
+                    # Cached file has expired
+                    continue
+            fn = os.path.join(base, detail["fn"])
+            if os.path.exists(fn):
+                return detail, fn
+        return False
+    def clear_expired(self, expiry_time: int) -> tuple[list[str], bool]:
+        """Remove expired metadata from the cache.
+        Returns names of files corresponding to expired metadata and a boolean
+        flag indicating whether the writable cache is empty. Caller is
+        responsible for deleting the expired files.
+        """
+        expired_files = []
+        for path, detail in self.cached_files[-1].copy().items():
+            if time.time() - detail["time"] > expiry_time:
+                fn = detail.get("fn", "")
+                if not fn:
+                    raise RuntimeError(
+                        f"Cache metadata does not contain 'fn' for {path}"
+                    )
+                fn = os.path.join(self._storage[-1], fn)
+                expired_files.append(fn)
+                self.cached_files[-1].pop(path)
+        if self.cached_files[-1]:
+            cache_path = os.path.join(self._storage[-1], "cache")
+            self._save(self.cached_files[-1], cache_path)
+        writable_cache_empty = not self.cached_files[-1]
+        return expired_files, writable_cache_empty
+    def load(self) -> None:
+        """Load all metadata from disk and store in ``self.cached_files``"""
+        cached_files = []
+        for fn, _, _ in self._scan_locations():
+            if os.path.exists(fn):
+                # TODO: consolidate blocks here
+                cached_files.append(self._load(fn))
+            else:
+                cached_files.append({})
+        self.cached_files = cached_files or [{}]
+    def on_close_cached_file(self, f: Any, path: str) -> None:
+        """Perform side-effect actions on closing a cached file.
+        The actual closing of the file is the responsibility of the caller.
+        """
+        # File must be writeble, so in self.cached_files[-1]
+        c = self.cached_files[-1][path]
+        if c["blocks"] is not True and len(c["blocks"]) * f.blocksize >= f.size:
+            c["blocks"] = True
+    def pop_file(self, path: str) -> str | None:
+        """Remove metadata of cached file.
+        If path is in the cache, return the filename of the cached file,
+        otherwise return ``None``.  Caller is responsible for deleting the
+        cached file.
+        """
+        details = self.check_file(path, None)
+        if not details:
+            return None
+        _, fn = details
+        if fn.startswith(self._storage[-1]):
+            self.cached_files[-1].pop(path)
+            self.save()
+        else:
+            raise PermissionError(
+                "Can only delete cached file in last, writable cache location"
+            )
+        return fn
+    def save(self) -> None:
+        """Save metadata to disk"""
+        for (fn, _, writable), cache in zip(self._scan_locations(), self.cached_files):
+            if not writable:
+                continue
+            if os.path.exists(fn):
+                cached_files = self._load(fn)
+                for k, c in cached_files.items():
+                    if k in cache:
+                        if c["blocks"] is True or cache[k]["blocks"] is True:
+                            c["blocks"] = True
+                        else:
+                            # self.cached_files[*][*]["blocks"] must continue to
+                            # point to the same set object so that updates
+                            # performed by MMapCache are propagated back to
+                            # self.cached_files.
+                            blocks = cache[k]["blocks"]
+                            blocks.update(c["blocks"])
+                            c["blocks"] = blocks
+                        c["time"] = max(c["time"], cache[k]["time"])
+                        c["uid"] = cache[k]["uid"]
+                # Files can be added to cache after it was written once
+                for k, c in cache.items():
+                    if k not in cached_files:
+                        cached_files[k] = c
+            else:
+                cached_files = cache
+            cache = {k: v.copy() for k, v in cached_files.items()}
+            for c in cache.values():
+                if isinstance(c["blocks"], set):
+                    c["blocks"] = list(c["blocks"])
+            self._save(cache, fn)
+            self.cached_files[-1] = cached_files
+    def update_file(self, path: str, detail: Detail) -> None:
+        """Update metadata for specific file in memory, do not save"""
+        self.cached_files[-1][path] = detail

venv/lib/python3.10/site-packages/fsspec/implementations/cached.py ADDED Viewed

	@@ -0,0 +1,1021 @@

+from __future__ import annotations
+import inspect
+import logging
+import os
+import tempfile
+import time
+import weakref
+from collections.abc import Callable
+from shutil import rmtree
+from typing import TYPE_CHECKING, Any, ClassVar
+from fsspec import filesystem
+from fsspec.callbacks import DEFAULT_CALLBACK
+from fsspec.compression import compr
+from fsspec.core import BaseCache, MMapCache
+from fsspec.exceptions import BlocksizeMismatchError
+from fsspec.implementations.cache_mapper import create_cache_mapper
+from fsspec.implementations.cache_metadata import CacheMetadata
+from fsspec.implementations.chained import ChainedFileSystem
+from fsspec.implementations.local import LocalFileSystem
+from fsspec.spec import AbstractBufferedFile
+from fsspec.transaction import Transaction
+from fsspec.utils import infer_compression
+if TYPE_CHECKING:
+    from fsspec.implementations.cache_mapper import AbstractCacheMapper
+logger = logging.getLogger("fsspec.cached")
+class WriteCachedTransaction(Transaction):
+    def complete(self, commit=True):
+        rpaths = [f.path for f in self.files]
+        lpaths = [f.fn for f in self.files]
+        if commit:
+            self.fs.put(lpaths, rpaths)
+        self.files.clear()
+        self.fs._intrans = False
+        self.fs._transaction = None
+        self.fs = None  # break cycle
+class CachingFileSystem(ChainedFileSystem):
+    """Locally caching filesystem, layer over any other FS
+    This class implements chunk-wise local storage of remote files, for quick
+    access after the initial download. The files are stored in a given
+    directory with hashes of URLs for the filenames. If no directory is given,
+    a temporary one is used, which should be cleaned up by the OS after the
+    process ends. The files themselves are sparse (as implemented in
+    :class:`~fsspec.caching.MMapCache`), so only the data which is accessed
+    takes up space.
+    Restrictions:
+    - the block-size must be the same for each access of a given file, unless
+      all blocks of the file have already been read
+    - caching can only be applied to file-systems which produce files
+      derived from fsspec.spec.AbstractBufferedFile ; LocalFileSystem is also
+      allowed, for testing
+    """
+    protocol: ClassVar[str | tuple[str, ...]] = ("blockcache", "cached")
+    _strip_tokenize_options = ("fo",)
+    def __init__(
+        self,
+        target_protocol=None,
+        cache_storage="TMP",
+        cache_check=10,
+        check_files=False,
+        expiry_time=604800,
+        target_options=None,
+        fs=None,
+        same_names: bool | None = None,
+        compression=None,
+        cache_mapper: AbstractCacheMapper | None = None,
+        **kwargs,
+    ):
+        """
+        Parameters
+        ----------
+        target_protocol: str (optional)
+            Target filesystem protocol. Provide either this or ``fs``.
+        cache_storage: str or list(str)
+            Location to store files. If "TMP", this is a temporary directory,
+            and will be cleaned up by the OS when this process ends (or later).
+            If a list, each location will be tried in the order given, but
+            only the last will be considered writable.
+        cache_check: int
+            Number of seconds between reload of cache metadata
+        check_files: bool
+            Whether to explicitly see if the UID of the remote file matches
+            the stored one before using. Warning: some file systems such as
+            HTTP cannot reliably give a unique hash of the contents of some
+            path, so be sure to set this option to False.
+        expiry_time: int
+            The time in seconds after which a local copy is considered useless.
+            Set to falsy to prevent expiry. The default is equivalent to one
+            week.
+        target_options: dict or None
+            Passed to the instantiation of the FS, if fs is None.
+        fs: filesystem instance
+            The target filesystem to run against. Provide this or ``protocol``.
+        same_names: bool (optional)
+            By default, target URLs are hashed using a ``HashCacheMapper`` so
+            that files from different backends with the same basename do not
+            conflict. If this argument is ``true``, a ``BasenameCacheMapper``
+            is used instead. Other cache mapper options are available by using
+            the ``cache_mapper`` keyword argument. Only one of this and
+            ``cache_mapper`` should be specified.
+        compression: str (optional)
+            To decompress on download. Can be 'infer' (guess from the URL name),
+            one of the entries in ``fsspec.compression.compr``, or None for no
+            decompression.
+        cache_mapper: AbstractCacheMapper (optional)
+            The object use to map from original filenames to cached filenames.
+            Only one of this and ``same_names`` should be specified.
+        """
+        super().__init__(**kwargs)
+        if fs is None and target_protocol is None:
+            raise ValueError(
+                "Please provide filesystem instance(fs) or target_protocol"
+            )
+        if not (fs is None) ^ (target_protocol is None):
+            raise ValueError(
+                "Both filesystems (fs) and target_protocol may not be both given."
+            )
+        if cache_storage == "TMP":
+            tempdir = tempfile.mkdtemp()
+            storage = [tempdir]
+            weakref.finalize(self, self._remove_tempdir, tempdir)
+        else:
+            if isinstance(cache_storage, str):
+                storage = [cache_storage]
+            else:
+                storage = cache_storage
+        os.makedirs(storage[-1], exist_ok=True)
+        self.storage = storage
+        self.kwargs = target_options or {}
+        self.cache_check = cache_check
+        self.check_files = check_files
+        self.expiry = expiry_time
+        self.compression = compression
+        # Size of cache in bytes. If None then the size is unknown and will be
+        # recalculated the next time cache_size() is called. On writes to the
+        # cache this is reset to None.
+        self._cache_size = None
+        if same_names is not None and cache_mapper is not None:
+            raise ValueError(
+                "Cannot specify both same_names and cache_mapper in "
+                "CachingFileSystem.__init__"
+            )
+        if cache_mapper is not None:
+            self._mapper = cache_mapper
+        else:
+            self._mapper = create_cache_mapper(
+                same_names if same_names is not None else False
+            )
+        self.target_protocol = (
+            target_protocol
+            if isinstance(target_protocol, str)
+            else (fs.protocol if isinstance(fs.protocol, str) else fs.protocol[0])
+        )
+        self._metadata = CacheMetadata(self.storage)
+        self.load_cache()
+        self.fs = fs if fs is not None else filesystem(target_protocol, **self.kwargs)
+        def _strip_protocol(path):
+            # acts as a method, since each instance has a difference target
+            return self.fs._strip_protocol(type(self)._strip_protocol(path))
+        self._strip_protocol: Callable = _strip_protocol
+    @staticmethod
+    def _remove_tempdir(tempdir):
+        try:
+            rmtree(tempdir)
+        except Exception:
+            pass
+    def _mkcache(self):
+        os.makedirs(self.storage[-1], exist_ok=True)
+    def cache_size(self):
+        """Return size of cache in bytes.
+        If more than one cache directory is in use, only the size of the last
+        one (the writable cache directory) is returned.
+        """
+        if self._cache_size is None:
+            cache_dir = self.storage[-1]
+            self._cache_size = filesystem("file").du(cache_dir, withdirs=True)
+        return self._cache_size
+    def load_cache(self):
+        """Read set of stored blocks from file"""
+        self._metadata.load()
+        self._mkcache()
+        self.last_cache = time.time()
+    def save_cache(self):
+        """Save set of stored blocks from file"""
+        self._mkcache()
+        self._metadata.save()
+        self.last_cache = time.time()
+        self._cache_size = None
+    def _check_cache(self):
+        """Reload caches if time elapsed or any disappeared"""
+        self._mkcache()
+        if not self.cache_check:
+            # explicitly told not to bother checking
+            return
+        timecond = time.time() - self.last_cache > self.cache_check
+        existcond = all(os.path.exists(storage) for storage in self.storage)
+        if timecond or not existcond:
+            self.load_cache()
+    def _check_file(self, path):
+        """Is path in cache and still valid"""
+        path = self._strip_protocol(path)
+        self._check_cache()
+        return self._metadata.check_file(path, self)
+    def clear_cache(self):
+        """Remove all files and metadata from the cache
+        In the case of multiple cache locations, this clears only the last one,
+        which is assumed to be the read/write one.
+        """
+        rmtree(self.storage[-1])
+        self.load_cache()
+        self._cache_size = None
+    def clear_expired_cache(self, expiry_time=None):
+        """Remove all expired files and metadata from the cache
+        In the case of multiple cache locations, this clears only the last one,
+        which is assumed to be the read/write one.
+        Parameters
+        ----------
+        expiry_time: int
+            The time in seconds after which a local copy is considered useless.
+            If not defined the default is equivalent to the attribute from the
+            file caching instantiation.
+        """
+        if not expiry_time:
+            expiry_time = self.expiry
+        self._check_cache()
+        expired_files, writable_cache_empty = self._metadata.clear_expired(expiry_time)
+        for fn in expired_files:
+            if os.path.exists(fn):
+                os.remove(fn)
+        if writable_cache_empty:
+            rmtree(self.storage[-1])
+            self.load_cache()
+        self._cache_size = None
+    def pop_from_cache(self, path):
+        """Remove cached version of given file
+        Deletes local copy of the given (remote) path. If it is found in a cache
+        location which is not the last, it is assumed to be read-only, and
+        raises PermissionError
+        """
+        path = self._strip_protocol(path)
+        fn = self._metadata.pop_file(path)
+        if fn is not None:
+            os.remove(fn)
+        self._cache_size = None
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=True,
+        cache_options=None,
+        **kwargs,
+    ):
+        """Wrap the target _open
+        If the whole file exists in the cache, just open it locally and
+        return that.
+        Otherwise, open the file on the target FS, and make it have a mmap
+        cache pointing to the location which we determine, in our cache.
+        The ``blocks`` instance is shared, so as the mmap cache instance
+        updates, so does the entry in our ``cached_files`` attribute.
+        We monkey-patch this file, so that when it closes, we call
+        ``close_and_update`` to save the state of the blocks.
+        """
+        path = self._strip_protocol(path)
+        path = self.fs._strip_protocol(path)
+        if "r" not in mode:
+            return self.fs._open(
+                path,
+                mode=mode,
+                block_size=block_size,
+                autocommit=autocommit,
+                cache_options=cache_options,
+                **kwargs,
+            )
+        detail = self._check_file(path)
+        if detail:
+            # file is in cache
+            detail, fn = detail
+            hash, blocks = detail["fn"], detail["blocks"]
+            if blocks is True:
+                # stored file is complete
+                logger.debug("Opening local copy of %s", path)
+                return open(fn, mode)
+            # TODO: action where partial file exists in read-only cache
+            logger.debug("Opening partially cached copy of %s", path)
+        else:
+            hash = self._mapper(path)
+            fn = os.path.join(self.storage[-1], hash)
+            blocks = set()
+            detail = {
+                "original": path,
+                "fn": hash,
+                "blocks": blocks,
+                "time": time.time(),
+                "uid": self.fs.ukey(path),
+            }
+            self._metadata.update_file(path, detail)
+            logger.debug("Creating local sparse file for %s", path)
+        # explicitly submitting the size to the open call will avoid extra
+        # operations when opening. This is particularly relevant
+        # for any file that is read over a network, e.g. S3.
+        size = detail.get("size")
+        # call target filesystems open
+        self._mkcache()
+        f = self.fs._open(
+            path,
+            mode=mode,
+            block_size=block_size,
+            autocommit=autocommit,
+            cache_options=cache_options,
+            cache_type="none",
+            size=size,
+            **kwargs,
+        )
+        # set size if not already set
+        if size is None:
+            detail["size"] = f.size
+            self._metadata.update_file(path, detail)
+        if self.compression:
+            comp = (
+                infer_compression(path)
+                if self.compression == "infer"
+                else self.compression
+            )
+            f = compr[comp](f, mode="rb")
+        if "blocksize" in detail:
+            if detail["blocksize"] != f.blocksize:
+                raise BlocksizeMismatchError(
+                    f"Cached file must be reopened with same block"
+                    f" size as original (old: {detail['blocksize']},"
+                    f" new {f.blocksize})"
+                )
+        else:
+            detail["blocksize"] = f.blocksize
+        def _fetch_ranges(ranges):
+            return self.fs.cat_ranges(
+                [path] * len(ranges),
+                [r[0] for r in ranges],
+                [r[1] for r in ranges],
+                **kwargs,
+            )
+        multi_fetcher = None if self.compression else _fetch_ranges
+        f.cache = MMapCache(
+            f.blocksize, f._fetch_range, f.size, fn, blocks, multi_fetcher=multi_fetcher
+        )
+        close = f.close
+        f.close = lambda: self.close_and_update(f, close)
+        self.save_cache()
+        return f
+    def _parent(self, path):
+        return self.fs._parent(path)
+    def hash_name(self, path: str, *args: Any) -> str:
+        # Kept for backward compatibility with downstream libraries.
+        # Ignores extra arguments, previously same_name boolean.
+        return self._mapper(path)
+    def close_and_update(self, f, close):
+        """Called when a file is closing, so store the set of blocks"""
+        if f.closed:
+            return
+        path = self._strip_protocol(f.path)
+        self._metadata.on_close_cached_file(f, path)
+        try:
+            logger.debug("going to save")
+            self.save_cache()
+            logger.debug("saved")
+        except OSError:
+            logger.debug("Cache saving failed while closing file")
+        except NameError:
+            logger.debug("Cache save failed due to interpreter shutdown")
+        close()
+        f.closed = True
+    def ls(self, path, detail=True):
+        return self.fs.ls(path, detail)
+    def __getattribute__(self, item):
+        if item in {
+            "load_cache",
+            "_get_cached_file_before_open",
+            "_open",
+            "save_cache",
+            "close_and_update",
+            "__init__",
+            "__getattribute__",
+            "__reduce__",
+            "_make_local_details",
+            "open",
+            "cat",
+            "cat_file",
+            "_cat_file",
+            "cat_ranges",
+            "_cat_ranges",
+            "get",
+            "read_block",
+            "tail",
+            "head",
+            "info",
+            "ls",
+            "exists",
+            "isfile",
+            "isdir",
+            "_check_file",
+            "_check_cache",
+            "_mkcache",
+            "clear_cache",
+            "clear_expired_cache",
+            "pop_from_cache",
+            "local_file",
+            "_paths_from_path",
+            "get_mapper",
+            "open_many",
+            "commit_many",
+            "hash_name",
+            "__hash__",
+            "__eq__",
+            "to_json",
+            "to_dict",
+            "cache_size",
+            "pipe_file",
+            "pipe",
+            "start_transaction",
+            "end_transaction",
+        }:
+            # all the methods defined in this class. Note `open` here, since
+            # it calls `_open`, but is actually in superclass
+            return lambda *args, **kw: getattr(type(self), item).__get__(self)(
+                *args, **kw
+            )
+        if item in ["__reduce_ex__"]:
+            raise AttributeError
+        if item in ["transaction"]:
+            # property
+            return type(self).transaction.__get__(self)
+        if item in {"_cache", "transaction_type", "protocol"}:
+            # class attributes
+            return getattr(type(self), item)
+        if item == "__class__":
+            return type(self)
+        d = object.__getattribute__(self, "__dict__")
+        fs = d.get("fs", None)  # fs is not immediately defined
+        if item in d:
+            return d[item]
+        elif fs is not None:
+            if item in fs.__dict__:
+                # attribute of instance
+                return fs.__dict__[item]
+            # attributed belonging to the target filesystem
+            cls = type(fs)
+            m = getattr(cls, item)
+            if (inspect.isfunction(m) or inspect.isdatadescriptor(m)) and (
+                not hasattr(m, "__self__") or m.__self__ is None
+            ):
+                # instance method
+                return m.__get__(fs, cls)
+            return m  # class method or attribute
+        else:
+            # attributes of the superclass, while target is being set up
+            return super().__getattribute__(item)
+    def __eq__(self, other):
+        """Test for equality."""
+        if self is other:
+            return True
+        if not isinstance(other, type(self)):
+            return False
+        return (
+            self.storage == other.storage
+            and self.kwargs == other.kwargs
+            and self.cache_check == other.cache_check
+            and self.check_files == other.check_files
+            and self.expiry == other.expiry
+            and self.compression == other.compression
+            and self._mapper == other._mapper
+            and self.target_protocol == other.target_protocol
+        )
+    def __hash__(self):
+        """Calculate hash."""
+        return (
+            hash(tuple(self.storage))
+            ^ hash(str(self.kwargs))
+            ^ hash(self.cache_check)
+            ^ hash(self.check_files)
+            ^ hash(self.expiry)
+            ^ hash(self.compression)
+            ^ hash(self._mapper)
+            ^ hash(self.target_protocol)
+        )
+class WholeFileCacheFileSystem(CachingFileSystem):
+    """Caches whole remote files on first access
+    This class is intended as a layer over any other file system, and
+    will make a local copy of each file accessed, so that all subsequent
+    reads are local. This is similar to ``CachingFileSystem``, but without
+    the block-wise functionality and so can work even when sparse files
+    are not allowed. See its docstring for definition of the init
+    arguments.
+    The class still needs access to the remote store for listing files,
+    and may refresh cached files.
+    """
+    protocol = "filecache"
+    local_file = True
+    def open_many(self, open_files, **kwargs):
+        paths = [of.path for of in open_files]
+        if "r" in open_files.mode:
+            self._mkcache()
+        else:
+            return [
+                LocalTempFile(
+                    self.fs,
+                    path,
+                    mode=open_files.mode,
+                    fn=os.path.join(self.storage[-1], self._mapper(path)),
+                    **kwargs,
+                )
+                for path in paths
+            ]
+        if self.compression:
+            raise NotImplementedError
+        details = [self._check_file(sp) for sp in paths]
+        downpath = [p for p, d in zip(paths, details) if not d]
+        downfn0 = [
+            os.path.join(self.storage[-1], self._mapper(p))
+            for p, d in zip(paths, details)
+        ]  # keep these path names for opening later
+        downfn = [fn for fn, d in zip(downfn0, details) if not d]
+        if downpath:
+            # skip if all files are already cached and up to date
+            self.fs.get(downpath, downfn)
+            # update metadata - only happens when downloads are successful
+            newdetail = [
+                {
+                    "original": path,
+                    "fn": self._mapper(path),
+                    "blocks": True,
+                    "time": time.time(),
+                    "uid": self.fs.ukey(path),
+                }
+                for path in downpath
+            ]
+            for path, detail in zip(downpath, newdetail):
+                self._metadata.update_file(path, detail)
+            self.save_cache()
+        def firstpart(fn):
+            # helper to adapt both whole-file and simple-cache
+            return fn[1] if isinstance(fn, tuple) else fn
+        return [
+            open(firstpart(fn0) if fn0 else fn1, mode=open_files.mode)
+            for fn0, fn1 in zip(details, downfn0)
+        ]
+    def commit_many(self, open_files):
+        self.fs.put([f.fn for f in open_files], [f.path for f in open_files])
+        [f.close() for f in open_files]
+        for f in open_files:
+            # in case autocommit is off, and so close did not already delete
+            try:
+                os.remove(f.name)
+            except FileNotFoundError:
+                pass
+        self._cache_size = None
+    def _make_local_details(self, path):
+        hash = self._mapper(path)
+        fn = os.path.join(self.storage[-1], hash)
+        detail = {
+            "original": path,
+            "fn": hash,
+            "blocks": True,
+            "time": time.time(),
+            "uid": self.fs.ukey(path),
+        }
+        self._metadata.update_file(path, detail)
+        logger.debug("Copying %s to local cache", path)
+        return fn
+    def cat(
+        self,
+        path,
+        recursive=False,
+        on_error="raise",
+        callback=DEFAULT_CALLBACK,
+        **kwargs,
+    ):
+        paths = self.expand_path(
+            path, recursive=recursive, maxdepth=kwargs.get("maxdepth")
+        )
+        getpaths = []
+        storepaths = []
+        fns = []
+        out = {}
+        for p in paths.copy():
+            try:
+                detail = self._check_file(p)
+                if not detail:
+                    fn = self._make_local_details(p)
+                    getpaths.append(p)
+                    storepaths.append(fn)
+                else:
+                    detail, fn = detail if isinstance(detail, tuple) else (None, detail)
+                fns.append(fn)
+            except Exception as e:
+                if on_error == "raise":
+                    raise
+                if on_error == "return":
+                    out[p] = e
+                paths.remove(p)
+        if getpaths:
+            self.fs.get(getpaths, storepaths)
+            self.save_cache()
+        callback.set_size(len(paths))
+        for p, fn in zip(paths, fns):
+            with open(fn, "rb") as f:
+                out[p] = f.read()
+            callback.relative_update(1)
+        if isinstance(path, str) and len(paths) == 1 and recursive is False:
+            out = out[paths[0]]
+        return out
+    def _get_cached_file_before_open(self, path, **kwargs):
+        fn = self._make_local_details(path)
+        # call target filesystems open
+        self._mkcache()
+        if self.compression:
+            with self.fs._open(path, mode="rb", **kwargs) as f, open(fn, "wb") as f2:
+                if isinstance(f, AbstractBufferedFile):
+                    # want no type of caching if just downloading whole thing
+                    f.cache = BaseCache(0, f.cache.fetcher, f.size)
+                comp = (
+                    infer_compression(path)
+                    if self.compression == "infer"
+                    else self.compression
+                )
+                f = compr[comp](f, mode="rb")
+                data = True
+                while data:
+                    block = getattr(f, "blocksize", 5 * 2**20)
+                    data = f.read(block)
+                    f2.write(data)
+        else:
+            self.fs.get_file(path, fn)
+        self.save_cache()
+    def _open(self, path, mode="rb", **kwargs):
+        path = self._strip_protocol(path)
+        # For read (or append), (try) download from remote
+        if "r" in mode or "a" in mode:
+            if not self._check_file(path):
+                if self.fs.exists(path):
+                    self._get_cached_file_before_open(path, **kwargs)
+                elif "r" in mode:
+                    raise FileNotFoundError(path)
+            detail, fn = self._check_file(path)
+            _, blocks = detail["fn"], detail["blocks"]
+            if blocks is True:
+                logger.debug("Opening local copy of %s", path)
+            else:
+                raise ValueError(
+                    f"Attempt to open partially cached file {path}"
+                    f" as a wholly cached file"
+                )
+        # Just reading does not need special file handling
+        if "r" in mode and "+" not in mode:
+            # In order to support downstream filesystems to be able to
+            # infer the compression from the original filename, like
+            # the `TarFileSystem`, let's extend the `io.BufferedReader`
+            # fileobject protocol by adding a dedicated attribute
+            # `original`.
+            f = open(fn, mode)
+            f.original = detail.get("original")
+            return f
+        hash = self._mapper(path)
+        fn = os.path.join(self.storage[-1], hash)
+        user_specified_kwargs = {
+            k: v
+            for k, v in kwargs.items()
+            # those kwargs were added by open(), we don't want them
+            if k not in ["autocommit", "block_size", "cache_options"]
+        }
+        return LocalTempFile(self, path, mode=mode, fn=fn, **user_specified_kwargs)
+class SimpleCacheFileSystem(WholeFileCacheFileSystem):
+    """Caches whole remote files on first access
+    This class is intended as a layer over any other file system, and
+    will make a local copy of each file accessed, so that all subsequent
+    reads are local. This implementation only copies whole files, and
+    does not keep any metadata about the download time or file details.
+    It is therefore safer to use in multi-threaded/concurrent situations.
+    This is the only of the caching filesystems that supports write: you will
+    be given a real local open file, and upon close and commit, it will be
+    uploaded to the target filesystem; the writability or the target URL is
+    not checked until that time.
+    """
+    protocol = "simplecache"
+    local_file = True
+    transaction_type = WriteCachedTransaction
+    def __init__(self, **kwargs):
+        kw = kwargs.copy()
+        for key in ["cache_check", "expiry_time", "check_files"]:
+            kw[key] = False
+        super().__init__(**kw)
+        for storage in self.storage:
+            if not os.path.exists(storage):
+                os.makedirs(storage, exist_ok=True)
+    def _check_file(self, path):
+        self._check_cache()
+        sha = self._mapper(path)
+        for storage in self.storage:
+            fn = os.path.join(storage, sha)
+            if os.path.exists(fn):
+                return fn
+    def save_cache(self):
+        pass
+    def load_cache(self):
+        pass
+    def pipe_file(self, path, value=None, **kwargs):
+        if self._intrans:
+            with self.open(path, "wb") as f:
+                f.write(value)
+        else:
+            super().pipe_file(path, value)
+    def ls(self, path, detail=True, **kwargs):
+        path = self._strip_protocol(path)
+        details = []
+        try:
+            details = self.fs.ls(
+                path, detail=True, **kwargs
+            ).copy()  # don't edit original!
+        except FileNotFoundError as e:
+            ex = e
+        else:
+            ex = None
+        if self._intrans:
+            path1 = path.rstrip("/") + "/"
+            for f in self.transaction.files:
+                if f.path == path:
+                    details.append(
+                        {"name": path, "size": f.size or f.tell(), "type": "file"}
+                    )
+                elif f.path.startswith(path1):
+                    if f.path.count("/") == path1.count("/"):
+                        details.append(
+                            {"name": f.path, "size": f.size or f.tell(), "type": "file"}
+                        )
+                    else:
+                        dname = "/".join(f.path.split("/")[: path1.count("/") + 1])
+                        details.append({"name": dname, "size": 0, "type": "directory"})
+        if ex is not None and not details:
+            raise ex
+        if detail:
+            return details
+        return sorted(_["name"] for _ in details)
+    def info(self, path, **kwargs):
+        path = self._strip_protocol(path)
+        if self._intrans:
+            f = [_ for _ in self.transaction.files if _.path == path]
+            if f:
+                size = os.path.getsize(f[0].fn) if f[0].closed else f[0].tell()
+                return {"name": path, "size": size, "type": "file"}
+            f = any(_.path.startswith(path + "/") for _ in self.transaction.files)
+            if f:
+                return {"name": path, "size": 0, "type": "directory"}
+        return self.fs.info(path, **kwargs)
+    def pipe(self, path, value=None, **kwargs):
+        if isinstance(path, str):
+            self.pipe_file(self._strip_protocol(path), value, **kwargs)
+        elif isinstance(path, dict):
+            for k, v in path.items():
+                self.pipe_file(self._strip_protocol(k), v, **kwargs)
+        else:
+            raise ValueError("path must be str or dict")
+    async def _cat_file(self, path, start=None, end=None, **kwargs):
+        logger.debug("async cat_file %s", path)
+        path = self._strip_protocol(path)
+        sha = self._mapper(path)
+        fn = self._check_file(path)
+        if not fn:
+            fn = os.path.join(self.storage[-1], sha)
+            await self.fs._get_file(path, fn, **kwargs)
+        with open(fn, "rb") as f:  # noqa ASYNC230
+            if start:
+                f.seek(start)
+            size = -1 if end is None else end - f.tell()
+            return f.read(size)
+    async def _cat_ranges(
+        self, paths, starts, ends, max_gap=None, on_error="return", **kwargs
+    ):
+        logger.debug("async cat ranges %s", paths)
+        lpaths = []
+        rset = set()
+        download = []
+        rpaths = []
+        for p in paths:
+            fn = self._check_file(p)
+            if fn is None and p not in rset:
+                sha = self._mapper(p)
+                fn = os.path.join(self.storage[-1], sha)
+                download.append(fn)
+                rset.add(p)
+                rpaths.append(p)
+            lpaths.append(fn)
+        if download:
+            await self.fs._get(rpaths, download, on_error=on_error)
+        return LocalFileSystem().cat_ranges(
+            lpaths, starts, ends, max_gap=max_gap, on_error=on_error, **kwargs
+        )
+    def cat_ranges(
+        self, paths, starts, ends, max_gap=None, on_error="return", **kwargs
+    ):
+        logger.debug("cat ranges %s", paths)
+        lpaths = [self._check_file(p) for p in paths]
+        rpaths = [p for l, p in zip(lpaths, paths) if l is False]
+        lpaths = [l for l, p in zip(lpaths, paths) if l is False]
+        self.fs.get(rpaths, lpaths)
+        paths = [self._check_file(p) for p in paths]
+        return LocalFileSystem().cat_ranges(
+            paths, starts, ends, max_gap=max_gap, on_error=on_error, **kwargs
+        )
+    def _get_cached_file_before_open(self, path, **kwargs):
+        sha = self._mapper(path)
+        fn = os.path.join(self.storage[-1], sha)
+        logger.debug("Copying %s to local cache", path)
+        self._mkcache()
+        self._cache_size = None
+        if self.compression:
+            with self.fs._open(path, mode="rb", **kwargs) as f, open(fn, "wb") as f2:
+                if isinstance(f, AbstractBufferedFile):
+                    # want no type of caching if just downloading whole thing
+                    f.cache = BaseCache(0, f.cache.fetcher, f.size)
+                comp = (
+                    infer_compression(path)
+                    if self.compression == "infer"
+                    else self.compression
+                )
+                f = compr[comp](f, mode="rb")
+                data = True
+                while data:
+                    block = getattr(f, "blocksize", 5 * 2**20)
+                    data = f.read(block)
+                    f2.write(data)
+        else:
+            self.fs.get_file(path, fn)
+    def _open(self, path, mode="rb", **kwargs):
+        path = self._strip_protocol(path)
+        sha = self._mapper(path)
+        # For read (or append), (try) download from remote
+        if "r" in mode or "a" in mode:
+            if not self._check_file(path):
+                # append does not require an existing file but read does
+                if self.fs.exists(path):
+                    self._get_cached_file_before_open(path, **kwargs)
+                elif "r" in mode:
+                    raise FileNotFoundError(path)
+        fn = self._check_file(path)
+        # Just reading does not need special file handling
+        if "r" in mode and "+" not in mode:
+            return open(fn, mode)
+        fn = os.path.join(self.storage[-1], sha)
+        user_specified_kwargs = {
+            k: v
+            for k, v in kwargs.items()
+            if k not in ["autocommit", "block_size", "cache_options"]
+        }  # those were added by open()
+        return LocalTempFile(
+            self,
+            path,
+            mode=mode,
+            autocommit=not self._intrans,
+            fn=fn,
+            **user_specified_kwargs,
+        )
+class LocalTempFile:
+    """A temporary local file, which will be uploaded on commit"""
+    def __init__(self, fs, path, fn, mode="wb", autocommit=True, seek=0, **kwargs):
+        self.fn = fn
+        self.fh = open(fn, mode)
+        self.mode = mode
+        if seek:
+            self.fh.seek(seek)
+        self.path = path
+        self.size = None
+        self.fs = fs
+        self.closed = False
+        self.autocommit = autocommit
+        self.kwargs = kwargs
+    def __reduce__(self):
+        # always open in r+b to allow continuing writing at a location
+        return (
+            LocalTempFile,
+            (self.fs, self.path, self.fn, "r+b", self.autocommit, self.tell()),
+        )
+    def __enter__(self):
+        return self.fh
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        self.close()
+    def close(self):
+        # self.size = self.fh.tell()
+        if self.closed:
+            return
+        self.fh.close()
+        self.closed = True
+        if self.autocommit:
+            self.commit()
+    def discard(self):
+        self.fh.close()
+        os.remove(self.fn)
+    def commit(self):
+        # calling put() with list arguments avoids path expansion and additional operations
+        # like isdir()
+        self.fs.put([self.fn], [self.path], **self.kwargs)
+        # we do not delete the local copy, it's still in the cache.
+    @property
+    def name(self):
+        return self.fn
+    def __repr__(self) -> str:
+        return f"LocalTempFile: {self.path}"
+    def __getattr__(self, item):
+        return getattr(self.fh, item)

venv/lib/python3.10/site-packages/fsspec/implementations/chained.py ADDED Viewed

	@@ -0,0 +1,23 @@

+from typing import ClassVar
+from fsspec import AbstractFileSystem
+__all__ = ("ChainedFileSystem",)
+class ChainedFileSystem(AbstractFileSystem):
+    """Chained filesystem base class.
+    A chained filesystem is designed to be layered over another FS.
+    This is useful to implement things like caching.
+    This base class does very little on its own, but is used as a marker
+    that the class is designed for chaining.
+    Right now this is only used in `url_to_fs` to provide the path argument
+    (`fo`) to the chained filesystem from the underlying filesystem.
+    Additional functionality may be added in the future.
+    """
+    protocol: ClassVar[str] = "chained"

venv/lib/python3.10/site-packages/fsspec/implementations/dask.py ADDED Viewed

	@@ -0,0 +1,152 @@

+import dask
+from distributed.client import Client, _get_global_client
+from distributed.worker import Worker
+from fsspec import filesystem
+from fsspec.spec import AbstractBufferedFile, AbstractFileSystem
+from fsspec.utils import infer_storage_options
+def _get_client(client):
+    if client is None:
+        return _get_global_client()
+    elif isinstance(client, Client):
+        return client
+    else:
+        # e.g., connection string
+        return Client(client)
+def _in_worker():
+    return bool(Worker._instances)
+class DaskWorkerFileSystem(AbstractFileSystem):
+    """View files accessible to a worker as any other remote file-system
+    When instances are run on the worker, uses the real filesystem. When
+    run on the client, they call the worker to provide information or data.
+    **Warning** this implementation is experimental, and read-only for now.
+    """
+    def __init__(
+        self, target_protocol=None, target_options=None, fs=None, client=None, **kwargs
+    ):
+        super().__init__(**kwargs)
+        if not (fs is None) ^ (target_protocol is None):
+            raise ValueError(
+                "Please provide one of filesystem instance (fs) or"
+                " target_protocol, not both"
+            )
+        self.target_protocol = target_protocol
+        self.target_options = target_options
+        self.worker = None
+        self.client = client
+        self.fs = fs
+        self._determine_worker()
+    @staticmethod
+    def _get_kwargs_from_urls(path):
+        so = infer_storage_options(path)
+        if "host" in so and "port" in so:
+            return {"client": f"{so['host']}:{so['port']}"}
+        else:
+            return {}
+    def _determine_worker(self):
+        if _in_worker():
+            self.worker = True
+            if self.fs is None:
+                self.fs = filesystem(
+                    self.target_protocol, **(self.target_options or {})
+                )
+        else:
+            self.worker = False
+            self.client = _get_client(self.client)
+            self.rfs = dask.delayed(self)
+    def mkdir(self, *args, **kwargs):
+        if self.worker:
+            self.fs.mkdir(*args, **kwargs)
+        else:
+            self.rfs.mkdir(*args, **kwargs).compute()
+    def rm(self, *args, **kwargs):
+        if self.worker:
+            self.fs.rm(*args, **kwargs)
+        else:
+            self.rfs.rm(*args, **kwargs).compute()
+    def copy(self, *args, **kwargs):
+        if self.worker:
+            self.fs.copy(*args, **kwargs)
+        else:
+            self.rfs.copy(*args, **kwargs).compute()
+    def mv(self, *args, **kwargs):
+        if self.worker:
+            self.fs.mv(*args, **kwargs)
+        else:
+            self.rfs.mv(*args, **kwargs).compute()
+    def ls(self, *args, **kwargs):
+        if self.worker:
+            return self.fs.ls(*args, **kwargs)
+        else:
+            return self.rfs.ls(*args, **kwargs).compute()
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=True,
+        cache_options=None,
+        **kwargs,
+    ):
+        if self.worker:
+            return self.fs._open(
+                path,
+                mode=mode,
+                block_size=block_size,
+                autocommit=autocommit,
+                cache_options=cache_options,
+                **kwargs,
+            )
+        else:
+            return DaskFile(
+                fs=self,
+                path=path,
+                mode=mode,
+                block_size=block_size,
+                autocommit=autocommit,
+                cache_options=cache_options,
+                **kwargs,
+            )
+    def fetch_range(self, path, mode, start, end):
+        if self.worker:
+            with self._open(path, mode) as f:
+                f.seek(start)
+                return f.read(end - start)
+        else:
+            return self.rfs.fetch_range(path, mode, start, end).compute()
+class DaskFile(AbstractBufferedFile):
+    def __init__(self, mode="rb", **kwargs):
+        if mode != "rb":
+            raise ValueError('Remote dask files can only be opened in "rb" mode')
+        super().__init__(**kwargs)
+    def _upload_chunk(self, final=False):
+        pass
+    def _initiate_upload(self):
+        """Create remote file/upload"""
+        pass
+    def _fetch_range(self, start, end):
+        """Get the specified set of bytes from remote"""
+        return self.fs.fetch_range(self.path, self.mode, start, end)

venv/lib/python3.10/site-packages/fsspec/implementations/data.py ADDED Viewed

	@@ -0,0 +1,57 @@

+import base64
+import io
+from urllib.parse import unquote
+from fsspec import AbstractFileSystem
+class DataFileSystem(AbstractFileSystem):
+    """A handy decoder for data-URLs
+    Example
+    -------
+    >>> with fsspec.open("data:,Hello%2C%20World%21") as f:
+    ...     print(f.read())
+    b"Hello, World!"
+    See https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs
+    """
+    protocol = "data"
+    def __init__(self, **kwargs):
+        """No parameters for this filesystem"""
+        super().__init__(**kwargs)
+    def cat_file(self, path, start=None, end=None, **kwargs):
+        pref, data = path.split(",", 1)
+        if pref.endswith("base64"):
+            return base64.b64decode(data)[start:end]
+        return unquote(data).encode()[start:end]
+    def info(self, path, **kwargs):
+        pref, name = path.split(",", 1)
+        data = self.cat_file(path)
+        mime = pref.split(":", 1)[1].split(";", 1)[0]
+        return {"name": name, "size": len(data), "type": "file", "mimetype": mime}
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=True,
+        cache_options=None,
+        **kwargs,
+    ):
+        if "r" not in mode:
+            raise ValueError("Read only filesystem")
+        return io.BytesIO(self.cat_file(path))
+    @staticmethod
+    def encode(data: bytes, mime: str | None = None):
+        """Format the given data into data-URL syntax
+        This version always base64 encodes, even when the data is ascii/url-safe.
+        """
+        return f"data:{mime or ''};base64,{base64.b64encode(data).decode()}"

venv/lib/python3.10/site-packages/fsspec/implementations/dbfs.py ADDED Viewed

	@@ -0,0 +1,496 @@

+from __future__ import annotations
+import base64
+import urllib
+import requests
+from requests.adapters import HTTPAdapter, Retry
+from typing_extensions import override
+from fsspec import AbstractFileSystem
+from fsspec.spec import AbstractBufferedFile
+class DatabricksException(Exception):
+    """
+    Helper class for exceptions raised in this module.
+    """
+    def __init__(self, error_code, message, details=None):
+        """Create a new DatabricksException"""
+        super().__init__(message)
+        self.error_code = error_code
+        self.message = message
+        self.details = details
+class DatabricksFileSystem(AbstractFileSystem):
+    """
+    Get access to the Databricks filesystem implementation over HTTP.
+    Can be used inside and outside of a databricks cluster.
+    """
+    def __init__(self, instance, token, **kwargs):
+        """
+        Create a new DatabricksFileSystem.
+        Parameters
+        ----------
+        instance: str
+            The instance URL of the databricks cluster.
+            For example for an Azure databricks cluster, this
+            has the form adb-<some-number>.<two digits>.azuredatabricks.net.
+        token: str
+            Your personal token. Find out more
+            here: https://docs.databricks.com/dev-tools/api/latest/authentication.html
+        """
+        self.instance = instance
+        self.token = token
+        self.session = requests.Session()
+        self.retries = Retry(
+            total=10,
+            backoff_factor=0.05,
+            status_forcelist=[408, 429, 500, 502, 503, 504],
+        )
+        self.session.mount("https://", HTTPAdapter(max_retries=self.retries))
+        self.session.headers.update({"Authorization": f"Bearer {self.token}"})
+        super().__init__(**kwargs)
+    @override
+    def _ls_from_cache(self, path) -> list[dict[str, str | int]] | None:
+        """Check cache for listing
+        Returns listing, if found (may be empty list for a directory that
+        exists but contains nothing), None if not in cache.
+        """
+        self.dircache.pop(path.rstrip("/"), None)
+        parent = self._parent(path)
+        if parent in self.dircache:
+            for entry in self.dircache[parent]:
+                if entry["name"] == path.rstrip("/"):
+                    if entry["type"] != "directory":
+                        return [entry]
+                    return []
+            raise FileNotFoundError(path)
+    def ls(self, path, detail=True, **kwargs):
+        """
+        List the contents of the given path.
+        Parameters
+        ----------
+        path: str
+            Absolute path
+        detail: bool
+            Return not only the list of filenames,
+            but also additional information on file sizes
+            and types.
+        """
+        try:
+            out = self._ls_from_cache(path)
+        except FileNotFoundError:
+            # This happens if the `path`'s parent was cached, but `path` is not
+            # there. This suggests that `path` is new since the parent was
+            # cached. Attempt to invalidate parent's cache before continuing.
+            self.dircache.pop(self._parent(path), None)
+            out = None
+        if not out:
+            try:
+                r = self._send_to_api(
+                    method="get", endpoint="list", json={"path": path}
+                )
+            except DatabricksException as e:
+                if e.error_code == "RESOURCE_DOES_NOT_EXIST":
+                    raise FileNotFoundError(e.message) from e
+                raise
+            files = r.get("files", [])
+            out = [
+                {
+                    "name": o["path"],
+                    "type": "directory" if o["is_dir"] else "file",
+                    "size": o["file_size"],
+                }
+                for o in files
+            ]
+            self.dircache[path] = out
+        if detail:
+            return out
+        return [o["name"] for o in out]
+    def makedirs(self, path, exist_ok=True):
+        """
+        Create a given absolute path and all of its parents.
+        Parameters
+        ----------
+        path: str
+            Absolute path to create
+        exist_ok: bool
+            If false, checks if the folder
+            exists before creating it (and raises an
+            Exception if this is the case)
+        """
+        if not exist_ok:
+            try:
+                # If the following succeeds, the path is already present
+                self._send_to_api(
+                    method="get", endpoint="get-status", json={"path": path}
+                )
+                raise FileExistsError(f"Path {path} already exists")
+            except DatabricksException as e:
+                if e.error_code == "RESOURCE_DOES_NOT_EXIST":
+                    pass
+        try:
+            self._send_to_api(method="post", endpoint="mkdirs", json={"path": path})
+        except DatabricksException as e:
+            if e.error_code == "RESOURCE_ALREADY_EXISTS":
+                raise FileExistsError(e.message) from e
+            raise
+        self.invalidate_cache(self._parent(path))
+    def mkdir(self, path, create_parents=True, **kwargs):
+        """
+        Create a given absolute path and all of its parents.
+        Parameters
+        ----------
+        path: str
+            Absolute path to create
+        create_parents: bool
+            Whether to create all parents or not.
+            "False" is not implemented so far.
+        """
+        if not create_parents:
+            raise NotImplementedError
+        self.mkdirs(path, **kwargs)
+    def rm(self, path, recursive=False, **kwargs):
+        """
+        Remove the file or folder at the given absolute path.
+        Parameters
+        ----------
+        path: str
+            Absolute path what to remove
+        recursive: bool
+            Recursively delete all files in a folder.
+        """
+        try:
+            self._send_to_api(
+                method="post",
+                endpoint="delete",
+                json={"path": path, "recursive": recursive},
+            )
+        except DatabricksException as e:
+            # This is not really an exception, it just means
+            # not everything was deleted so far
+            if e.error_code == "PARTIAL_DELETE":
+                self.rm(path=path, recursive=recursive)
+            elif e.error_code == "IO_ERROR":
+                # Using the same exception as the os module would use here
+                raise OSError(e.message) from e
+            raise
+        self.invalidate_cache(self._parent(path))
+    def mv(
+        self, source_path, destination_path, recursive=False, maxdepth=None, **kwargs
+    ):
+        """
+        Move a source to a destination path.
+        A note from the original [databricks API manual]
+        (https://docs.databricks.com/dev-tools/api/latest/dbfs.html#move).
+        When moving a large number of files the API call will time out after
+        approximately 60s, potentially resulting in partially moved data.
+        Therefore, for operations that move more than 10k files, we strongly
+        discourage using the DBFS REST API.
+        Parameters
+        ----------
+        source_path: str
+            From where to move (absolute path)
+        destination_path: str
+            To where to move (absolute path)
+        recursive: bool
+            Not implemented to far.
+        maxdepth:
+            Not implemented to far.
+        """
+        if recursive:
+            raise NotImplementedError
+        if maxdepth:
+            raise NotImplementedError
+        try:
+            self._send_to_api(
+                method="post",
+                endpoint="move",
+                json={"source_path": source_path, "destination_path": destination_path},
+            )
+        except DatabricksException as e:
+            if e.error_code == "RESOURCE_DOES_NOT_EXIST":
+                raise FileNotFoundError(e.message) from e
+            elif e.error_code == "RESOURCE_ALREADY_EXISTS":
+                raise FileExistsError(e.message) from e
+            raise
+        self.invalidate_cache(self._parent(source_path))
+        self.invalidate_cache(self._parent(destination_path))
+    def _open(self, path, mode="rb", block_size="default", **kwargs):
+        """
+        Overwrite the base class method to make sure to create a DBFile.
+        All arguments are copied from the base method.
+        Only the default blocksize is allowed.
+        """
+        return DatabricksFile(self, path, mode=mode, block_size=block_size, **kwargs)
+    def _send_to_api(self, method, endpoint, json):
+        """
+        Send the given json to the DBFS API
+        using a get or post request (specified by the argument `method`).
+        Parameters
+        ----------
+        method: str
+            Which http method to use for communication; "get" or "post".
+        endpoint: str
+            Where to send the request to (last part of the API URL)
+        json: dict
+            Dictionary of information to send
+        """
+        if method == "post":
+            session_call = self.session.post
+        elif method == "get":
+            session_call = self.session.get
+        else:
+            raise ValueError(f"Do not understand method {method}")
+        url = urllib.parse.urljoin(f"https://{self.instance}/api/2.0/dbfs/", endpoint)
+        r = session_call(url, json=json)
+        # The DBFS API will return a json, also in case of an exception.
+        # We want to preserve this information as good as possible.
+        try:
+            r.raise_for_status()
+        except requests.HTTPError as e:
+            # try to extract json error message
+            # if that fails, fall back to the original exception
+            try:
+                exception_json = e.response.json()
+            except Exception:
+                raise e from None
+            raise DatabricksException(**exception_json) from e
+        return r.json()
+    def _create_handle(self, path, overwrite=True):
+        """
+        Internal function to create a handle, which can be used to
+        write blocks of a file to DBFS.
+        A handle has a unique identifier which needs to be passed
+        whenever written during this transaction.
+        The handle is active for 10 minutes - after that a new
+        write transaction needs to be created.
+        Make sure to close the handle after you are finished.
+        Parameters
+        ----------
+        path: str
+            Absolute path for this file.
+        overwrite: bool
+            If a file already exist at this location, either overwrite
+            it or raise an exception.
+        """
+        try:
+            r = self._send_to_api(
+                method="post",
+                endpoint="create",
+                json={"path": path, "overwrite": overwrite},
+            )
+            return r["handle"]
+        except DatabricksException as e:
+            if e.error_code == "RESOURCE_ALREADY_EXISTS":
+                raise FileExistsError(e.message) from e
+            raise
+    def _close_handle(self, handle):
+        """
+        Close a handle, which was opened by :func:`_create_handle`.
+        Parameters
+        ----------
+        handle: str
+            Which handle to close.
+        """
+        try:
+            self._send_to_api(method="post", endpoint="close", json={"handle": handle})
+        except DatabricksException as e:
+            if e.error_code == "RESOURCE_DOES_NOT_EXIST":
+                raise FileNotFoundError(e.message) from e
+            raise
+    def _add_data(self, handle, data):
+        """
+        Upload data to an already opened file handle
+        (opened by :func:`_create_handle`).
+        The maximal allowed data size is 1MB after
+        conversion to base64.
+        Remember to close the handle when you are finished.
+        Parameters
+        ----------
+        handle: str
+            Which handle to upload data to.
+        data: bytes
+            Block of data to add to the handle.
+        """
+        data = base64.b64encode(data).decode()
+        try:
+            self._send_to_api(
+                method="post",
+                endpoint="add-block",
+                json={"handle": handle, "data": data},
+            )
+        except DatabricksException as e:
+            if e.error_code == "RESOURCE_DOES_NOT_EXIST":
+                raise FileNotFoundError(e.message) from e
+            elif e.error_code == "MAX_BLOCK_SIZE_EXCEEDED":
+                raise ValueError(e.message) from e
+            raise
+    def _get_data(self, path, start, end):
+        """
+        Download data in bytes from a given absolute path in a block
+        from [start, start+length].
+        The maximum number of allowed bytes to read is 1MB.
+        Parameters
+        ----------
+        path: str
+            Absolute path to download data from
+        start: int
+            Start position of the block
+        end: int
+            End position of the block
+        """
+        try:
+            r = self._send_to_api(
+                method="get",
+                endpoint="read",
+                json={"path": path, "offset": start, "length": end - start},
+            )
+            return base64.b64decode(r["data"])
+        except DatabricksException as e:
+            if e.error_code == "RESOURCE_DOES_NOT_EXIST":
+                raise FileNotFoundError(e.message) from e
+            elif e.error_code in ["INVALID_PARAMETER_VALUE", "MAX_READ_SIZE_EXCEEDED"]:
+                raise ValueError(e.message) from e
+            raise
+    def invalidate_cache(self, path=None):
+        if path is None:
+            self.dircache.clear()
+        else:
+            self.dircache.pop(path, None)
+        super().invalidate_cache(path)
+class DatabricksFile(AbstractBufferedFile):
+    """
+    Helper class for files referenced in the DatabricksFileSystem.
+    """
+    DEFAULT_BLOCK_SIZE = 1 * 2**20  # only allowed block size
+    def __init__(
+        self,
+        fs,
+        path,
+        mode="rb",
+        block_size="default",
+        autocommit=True,
+        cache_type="readahead",
+        cache_options=None,
+        **kwargs,
+    ):
+        """
+        Create a new instance of the DatabricksFile.
+        The blocksize needs to be the default one.
+        """
+        if block_size is None or block_size == "default":
+            block_size = self.DEFAULT_BLOCK_SIZE
+        assert block_size == self.DEFAULT_BLOCK_SIZE, (
+            f"Only the default block size is allowed, not {block_size}"
+        )
+        super().__init__(
+            fs,
+            path,
+            mode=mode,
+            block_size=block_size,
+            autocommit=autocommit,
+            cache_type=cache_type,
+            cache_options=cache_options or {},
+            **kwargs,
+        )
+    def _initiate_upload(self):
+        """Internal function to start a file upload"""
+        self.handle = self.fs._create_handle(self.path)
+    def _upload_chunk(self, final=False):
+        """Internal function to add a chunk of data to a started upload"""
+        self.buffer.seek(0)
+        data = self.buffer.getvalue()
+        data_chunks = [
+            data[start:end] for start, end in self._to_sized_blocks(len(data))
+        ]
+        for data_chunk in data_chunks:
+            self.fs._add_data(handle=self.handle, data=data_chunk)
+        if final:
+            self.fs._close_handle(handle=self.handle)
+            return True
+    def _fetch_range(self, start, end):
+        """Internal function to download a block of data"""
+        return_buffer = b""
+        length = end - start
+        for chunk_start, chunk_end in self._to_sized_blocks(length, start):
+            return_buffer += self.fs._get_data(
+                path=self.path, start=chunk_start, end=chunk_end
+            )
+        return return_buffer
+    def _to_sized_blocks(self, length, start=0):
+        """Helper function to split a range from 0 to total_length into blocksizes"""
+        end = start + length
+        for data_chunk in range(start, end, self.blocksize):
+            data_start = data_chunk
+            data_end = min(end, data_chunk + self.blocksize)
+            yield data_start, data_end

venv/lib/python3.10/site-packages/fsspec/implementations/dirfs.py ADDED Viewed

	@@ -0,0 +1,389 @@

+from .. import filesystem
+from ..asyn import AsyncFileSystem
+from .chained import ChainedFileSystem
+class DirFileSystem(AsyncFileSystem, ChainedFileSystem):
+    """Directory prefix filesystem
+    The DirFileSystem is a filesystem-wrapper. It assumes every path it is dealing with
+    is relative to the `path`. After performing the necessary paths operation it
+    delegates everything to the wrapped filesystem.
+    """
+    protocol = "dir"
+    def __init__(
+        self,
+        path=None,
+        fs=None,
+        fo=None,
+        target_protocol=None,
+        target_options=None,
+        **storage_options,
+    ):
+        """
+        Parameters
+        ----------
+        path: str
+            Path to the directory.
+        fs: AbstractFileSystem
+            An instantiated filesystem to wrap.
+        target_protocol, target_options:
+            if fs is none, construct it from these
+        fo: str
+            Alternate for path; do not provide both
+        """
+        super().__init__(**storage_options)
+        if fs is None:
+            fs = filesystem(protocol=target_protocol, **(target_options or {}))
+        path = path or fo
+        if self.asynchronous and not fs.async_impl:
+            raise ValueError("can't use asynchronous with non-async fs")
+        if fs.async_impl and self.asynchronous != fs.asynchronous:
+            raise ValueError("both dirfs and fs should be in the same sync/async mode")
+        self.path = fs._strip_protocol(path)
+        self.fs = fs
+    def _join(self, path):
+        if isinstance(path, str):
+            if not self.path:
+                return path
+            if not path:
+                return self.path
+            return self.fs.sep.join((self.path, self._strip_protocol(path)))
+        if isinstance(path, dict):
+            return {self._join(_path): value for _path, value in path.items()}
+        return [self._join(_path) for _path in path]
+    def _relpath(self, path):
+        if isinstance(path, str):
+            if not self.path:
+                return path
+            # We need to account for S3FileSystem returning paths that do not
+            # start with a '/'
+            if path == self.path or (
+                self.path.startswith(self.fs.sep) and path == self.path[1:]
+            ):
+                return ""
+            prefix = self.path + self.fs.sep
+            if self.path.startswith(self.fs.sep) and not path.startswith(self.fs.sep):
+                prefix = prefix[1:]
+            assert path.startswith(prefix)
+            return path[len(prefix) :]
+        return [self._relpath(_path) for _path in path]
+    # Wrappers below
+    @property
+    def sep(self):
+        return self.fs.sep
+    async def set_session(self, *args, **kwargs):
+        return await self.fs.set_session(*args, **kwargs)
+    async def _rm_file(self, path, **kwargs):
+        return await self.fs._rm_file(self._join(path), **kwargs)
+    def rm_file(self, path, **kwargs):
+        return self.fs.rm_file(self._join(path), **kwargs)
+    async def _rm(self, path, *args, **kwargs):
+        return await self.fs._rm(self._join(path), *args, **kwargs)
+    def rm(self, path, *args, **kwargs):
+        return self.fs.rm(self._join(path), *args, **kwargs)
+    async def _cp_file(self, path1, path2, **kwargs):
+        return await self.fs._cp_file(self._join(path1), self._join(path2), **kwargs)
+    def cp_file(self, path1, path2, **kwargs):
+        return self.fs.cp_file(self._join(path1), self._join(path2), **kwargs)
+    async def _copy(
+        self,
+        path1,
+        path2,
+        *args,
+        **kwargs,
+    ):
+        return await self.fs._copy(
+            self._join(path1),
+            self._join(path2),
+            *args,
+            **kwargs,
+        )
+    def copy(self, path1, path2, *args, **kwargs):
+        return self.fs.copy(
+            self._join(path1),
+            self._join(path2),
+            *args,
+            **kwargs,
+        )
+    async def _pipe(self, path, *args, **kwargs):
+        return await self.fs._pipe(self._join(path), *args, **kwargs)
+    def pipe(self, path, *args, **kwargs):
+        return self.fs.pipe(self._join(path), *args, **kwargs)
+    async def _pipe_file(self, path, *args, **kwargs):
+        return await self.fs._pipe_file(self._join(path), *args, **kwargs)
+    def pipe_file(self, path, *args, **kwargs):
+        return self.fs.pipe_file(self._join(path), *args, **kwargs)
+    async def _cat_file(self, path, *args, **kwargs):
+        return await self.fs._cat_file(self._join(path), *args, **kwargs)
+    def cat_file(self, path, *args, **kwargs):
+        return self.fs.cat_file(self._join(path), *args, **kwargs)
+    async def _cat(self, path, *args, **kwargs):
+        ret = await self.fs._cat(
+            self._join(path),
+            *args,
+            **kwargs,
+        )
+        if isinstance(ret, dict):
+            return {self._relpath(key): value for key, value in ret.items()}
+        return ret
+    def cat(self, path, *args, **kwargs):
+        ret = self.fs.cat(
+            self._join(path),
+            *args,
+            **kwargs,
+        )
+        if isinstance(ret, dict):
+            return {self._relpath(key): value for key, value in ret.items()}
+        return ret
+    async def _put_file(self, lpath, rpath, **kwargs):
+        return await self.fs._put_file(lpath, self._join(rpath), **kwargs)
+    def put_file(self, lpath, rpath, **kwargs):
+        return self.fs.put_file(lpath, self._join(rpath), **kwargs)
+    async def _put(
+        self,
+        lpath,
+        rpath,
+        *args,
+        **kwargs,
+    ):
+        return await self.fs._put(
+            lpath,
+            self._join(rpath),
+            *args,
+            **kwargs,
+        )
+    def put(self, lpath, rpath, *args, **kwargs):
+        return self.fs.put(
+            lpath,
+            self._join(rpath),
+            *args,
+            **kwargs,
+        )
+    async def _get_file(self, rpath, lpath, **kwargs):
+        return await self.fs._get_file(self._join(rpath), lpath, **kwargs)
+    def get_file(self, rpath, lpath, **kwargs):
+        return self.fs.get_file(self._join(rpath), lpath, **kwargs)
+    async def _get(self, rpath, *args, **kwargs):
+        return await self.fs._get(self._join(rpath), *args, **kwargs)
+    def get(self, rpath, *args, **kwargs):
+        return self.fs.get(self._join(rpath), *args, **kwargs)
+    async def _isfile(self, path):
+        return await self.fs._isfile(self._join(path))
+    def isfile(self, path):
+        return self.fs.isfile(self._join(path))
+    async def _isdir(self, path):
+        return await self.fs._isdir(self._join(path))
+    def isdir(self, path):
+        return self.fs.isdir(self._join(path))
+    async def _size(self, path):
+        return await self.fs._size(self._join(path))
+    def size(self, path):
+        return self.fs.size(self._join(path))
+    async def _exists(self, path):
+        return await self.fs._exists(self._join(path))
+    def exists(self, path):
+        return self.fs.exists(self._join(path))
+    async def _info(self, path, **kwargs):
+        info = await self.fs._info(self._join(path), **kwargs)
+        info = info.copy()
+        info["name"] = self._relpath(info["name"])
+        return info
+    def info(self, path, **kwargs):
+        info = self.fs.info(self._join(path), **kwargs)
+        info = info.copy()
+        info["name"] = self._relpath(info["name"])
+        return info
+    async def _ls(self, path, detail=True, **kwargs):
+        ret = (await self.fs._ls(self._join(path), detail=detail, **kwargs)).copy()
+        if detail:
+            out = []
+            for entry in ret:
+                entry = entry.copy()
+                entry["name"] = self._relpath(entry["name"])
+                out.append(entry)
+            return out
+        return self._relpath(ret)
+    def ls(self, path, detail=True, **kwargs):
+        ret = self.fs.ls(self._join(path), detail=detail, **kwargs).copy()
+        if detail:
+            out = []
+            for entry in ret:
+                entry = entry.copy()
+                entry["name"] = self._relpath(entry["name"])
+                out.append(entry)
+            return out
+        return self._relpath(ret)
+    async def _walk(self, path, *args, **kwargs):
+        async for root, dirs, files in self.fs._walk(self._join(path), *args, **kwargs):
+            yield self._relpath(root), dirs, files
+    def walk(self, path, *args, **kwargs):
+        for root, dirs, files in self.fs.walk(self._join(path), *args, **kwargs):
+            yield self._relpath(root), dirs, files
+    async def _glob(self, path, **kwargs):
+        detail = kwargs.get("detail", False)
+        ret = await self.fs._glob(self._join(path), **kwargs)
+        if detail:
+            return {self._relpath(path): info for path, info in ret.items()}
+        return self._relpath(ret)
+    def glob(self, path, **kwargs):
+        detail = kwargs.get("detail", False)
+        ret = self.fs.glob(self._join(path), **kwargs)
+        if detail:
+            return {self._relpath(path): info for path, info in ret.items()}
+        return self._relpath(ret)
+    async def _du(self, path, *args, **kwargs):
+        total = kwargs.get("total", True)
+        ret = await self.fs._du(self._join(path), *args, **kwargs)
+        if total:
+            return ret
+        return {self._relpath(path): size for path, size in ret.items()}
+    def du(self, path, *args, **kwargs):
+        total = kwargs.get("total", True)
+        ret = self.fs.du(self._join(path), *args, **kwargs)
+        if total:
+            return ret
+        return {self._relpath(path): size for path, size in ret.items()}
+    async def _find(self, path, *args, **kwargs):
+        detail = kwargs.get("detail", False)
+        ret = await self.fs._find(self._join(path), *args, **kwargs)
+        if detail:
+            return {self._relpath(path): info for path, info in ret.items()}
+        return self._relpath(ret)
+    def find(self, path, *args, **kwargs):
+        detail = kwargs.get("detail", False)
+        ret = self.fs.find(self._join(path), *args, **kwargs)
+        if detail:
+            return {self._relpath(path): info for path, info in ret.items()}
+        return self._relpath(ret)
+    async def _expand_path(self, path, *args, **kwargs):
+        return self._relpath(
+            await self.fs._expand_path(self._join(path), *args, **kwargs)
+        )
+    def expand_path(self, path, *args, **kwargs):
+        return self._relpath(self.fs.expand_path(self._join(path), *args, **kwargs))
+    async def _mkdir(self, path, *args, **kwargs):
+        return await self.fs._mkdir(self._join(path), *args, **kwargs)
+    def mkdir(self, path, *args, **kwargs):
+        return self.fs.mkdir(self._join(path), *args, **kwargs)
+    async def _makedirs(self, path, *args, **kwargs):
+        return await self.fs._makedirs(self._join(path), *args, **kwargs)
+    def makedirs(self, path, *args, **kwargs):
+        return self.fs.makedirs(self._join(path), *args, **kwargs)
+    def rmdir(self, path):
+        return self.fs.rmdir(self._join(path))
+    def mv(self, path1, path2, **kwargs):
+        return self.fs.mv(
+            self._join(path1),
+            self._join(path2),
+            **kwargs,
+        )
+    def touch(self, path, **kwargs):
+        return self.fs.touch(self._join(path), **kwargs)
+    def created(self, path):
+        return self.fs.created(self._join(path))
+    def modified(self, path):
+        return self.fs.modified(self._join(path))
+    def sign(self, path, *args, **kwargs):
+        return self.fs.sign(self._join(path), *args, **kwargs)
+    def __repr__(self):
+        return f"{self.__class__.__qualname__}(path='{self.path}', fs={self.fs})"
+    def open(
+        self,
+        path,
+        *args,
+        **kwargs,
+    ):
+        return self.fs.open(
+            self._join(path),
+            *args,
+            **kwargs,
+        )
+    async def open_async(
+        self,
+        path,
+        *args,
+        **kwargs,
+    ):
+        return await self.fs.open_async(
+            self._join(path),
+            *args,
+            **kwargs,
+        )

venv/lib/python3.10/site-packages/fsspec/implementations/ftp.py ADDED Viewed

	@@ -0,0 +1,437 @@

+import os
+import ssl
+import uuid
+from ftplib import FTP, FTP_TLS, Error, error_perm
+from typing import Any
+from ..spec import AbstractBufferedFile, AbstractFileSystem
+from ..utils import infer_storage_options, isfilelike
+SECURITY_PROTOCOL_MAP = {
+    "tls": ssl.PROTOCOL_TLS,
+    "tlsv1": ssl.PROTOCOL_TLSv1,
+    "tlsv1_1": ssl.PROTOCOL_TLSv1_1,
+    "tlsv1_2": ssl.PROTOCOL_TLSv1_2,
+    "sslv23": ssl.PROTOCOL_SSLv23,
+}
+class ImplicitFTPTLS(FTP_TLS):
+    """
+    FTP_TLS subclass that automatically wraps sockets in SSL
+    to support implicit FTPS.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._sock = None
+    @property
+    def sock(self):
+        """Return the socket."""
+        return self._sock
+    @sock.setter
+    def sock(self, value):
+        """When modifying the socket, ensure that it is ssl wrapped."""
+        if value is not None and not isinstance(value, ssl.SSLSocket):
+            value = self.context.wrap_socket(value)
+        self._sock = value
+class FTPFileSystem(AbstractFileSystem):
+    """A filesystem over classic FTP"""
+    root_marker = "/"
+    cachable = False
+    protocol = "ftp"
+    def __init__(
+        self,
+        host,
+        port=21,
+        username=None,
+        password=None,
+        acct=None,
+        block_size=None,
+        tempdir=None,
+        timeout=30,
+        encoding="utf-8",
+        tls=False,
+        **kwargs,
+    ):
+        """
+        You can use _get_kwargs_from_urls to get some kwargs from
+        a reasonable FTP url.
+        Authentication will be anonymous if username/password are not
+        given.
+        Parameters
+        ----------
+        host: str
+            The remote server name/ip to connect to
+        port: int
+            Port to connect with
+        username: str or None
+            If authenticating, the user's identifier
+        password: str of None
+            User's password on the server, if using
+        acct: str or None
+            Some servers also need an "account" string for auth
+        block_size: int or None
+            If given, the read-ahead or write buffer size.
+        tempdir: str
+            Directory on remote to put temporary files when in a transaction
+        timeout: int
+            Timeout of the ftp connection in seconds
+        encoding: str
+            Encoding to use for directories and filenames in FTP connection
+        tls: bool or str
+            Enable FTP-TLS for secure connections:
+                - False: Plain FTP (default)
+                - True: Explicit TLS (FTPS with AUTH TLS command)
+                - "tls": Auto-negotiate highest protocol
+                - "tlsv1": TLS v1.0
+                - "tlsv1_1": TLS v1.1
+                - "tlsv1_2": TLS v1.2
+        """
+        super().__init__(**kwargs)
+        self.host = host
+        self.port = port
+        self.tempdir = tempdir or "/tmp"
+        self.cred = username or "", password or "", acct or ""
+        self.timeout = timeout
+        self.encoding = encoding
+        if block_size is not None:
+            self.blocksize = block_size
+        else:
+            self.blocksize = 2**16
+        self.tls = tls
+        self._connect()
+        if isinstance(self.tls, bool) and self.tls:
+            self.ftp.prot_p()
+    def _connect(self):
+        security = None
+        if self.tls:
+            if isinstance(self.tls, str):
+                ftp_cls = ImplicitFTPTLS
+                security = SECURITY_PROTOCOL_MAP.get(
+                    self.tls,
+                    f"Not supported {self.tls} protocol",
+                )
+                if isinstance(security, str):
+                    raise ValueError(security)
+            else:
+                ftp_cls = FTP_TLS
+        else:
+            ftp_cls = FTP
+        self.ftp = ftp_cls(timeout=self.timeout, encoding=self.encoding)
+        if security:
+            self.ftp.ssl_version = security
+        self.ftp.connect(self.host, self.port)
+        self.ftp.login(*self.cred)
+    @classmethod
+    def _strip_protocol(cls, path):
+        return "/" + infer_storage_options(path)["path"].lstrip("/").rstrip("/")
+    @staticmethod
+    def _get_kwargs_from_urls(urlpath):
+        out = infer_storage_options(urlpath)
+        out.pop("path", None)
+        out.pop("protocol", None)
+        return out
+    def ls(self, path, detail=True, **kwargs):
+        path = self._strip_protocol(path)
+        out = []
+        if path not in self.dircache:
+            try:
+                try:
+                    out = [
+                        (fn, details)
+                        for (fn, details) in self.ftp.mlsd(path)
+                        if fn not in [".", ".."]
+                        and details["type"] not in ["pdir", "cdir"]
+                    ]
+                except error_perm:
+                    out = _mlsd2(self.ftp, path)  # Not platform independent
+                for fn, details in out:
+                    details["name"] = "/".join(
+                        ["" if path == "/" else path, fn.lstrip("/")]
+                    )
+                    if details["type"] == "file":
+                        details["size"] = int(details["size"])
+                    else:
+                        details["size"] = 0
+                    if details["type"] == "dir":
+                        details["type"] = "directory"
+                self.dircache[path] = out
+            except Error:
+                try:
+                    info = self.info(path)
+                    if info["type"] == "file":
+                        out = [(path, info)]
+                except (Error, IndexError) as exc:
+                    raise FileNotFoundError(path) from exc
+        files = self.dircache.get(path, out)
+        if not detail:
+            return sorted([fn for fn, details in files])
+        return [details for fn, details in files]
+    def info(self, path, **kwargs):
+        # implement with direct method
+        path = self._strip_protocol(path)
+        if path == "/":
+            # special case, since this dir has no real entry
+            return {"name": "/", "size": 0, "type": "directory"}
+        files = self.ls(self._parent(path).lstrip("/"), True)
+        try:
+            out = next(f for f in files if f["name"] == path)
+        except StopIteration as exc:
+            raise FileNotFoundError(path) from exc
+        return out
+    def get_file(self, rpath, lpath, **kwargs):
+        if self.isdir(rpath):
+            if not os.path.exists(lpath):
+                os.mkdir(lpath)
+            return
+        if isfilelike(lpath):
+            outfile = lpath
+        else:
+            outfile = open(lpath, "wb")
+        def cb(x):
+            outfile.write(x)
+        self.ftp.retrbinary(
+            f"RETR {rpath}",
+            blocksize=self.blocksize,
+            callback=cb,
+        )
+        if not isfilelike(lpath):
+            outfile.close()
+    def cat_file(self, path, start=None, end=None, **kwargs):
+        if end is not None:
+            return super().cat_file(path, start, end, **kwargs)
+        out = []
+        def cb(x):
+            out.append(x)
+        try:
+            self.ftp.retrbinary(
+                f"RETR {path}",
+                blocksize=self.blocksize,
+                rest=start,
+                callback=cb,
+            )
+        except (Error, error_perm) as orig_exc:
+            raise FileNotFoundError(path) from orig_exc
+        return b"".join(out)
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        cache_options=None,
+        autocommit=True,
+        **kwargs,
+    ):
+        path = self._strip_protocol(path)
+        block_size = block_size or self.blocksize
+        return FTPFile(
+            self,
+            path,
+            mode=mode,
+            block_size=block_size,
+            tempdir=self.tempdir,
+            autocommit=autocommit,
+            cache_options=cache_options,
+        )
+    def _rm(self, path):
+        path = self._strip_protocol(path)
+        self.ftp.delete(path)
+        self.invalidate_cache(self._parent(path))
+    def rm(self, path, recursive=False, maxdepth=None):
+        paths = self.expand_path(path, recursive=recursive, maxdepth=maxdepth)
+        for p in reversed(paths):
+            if self.isfile(p):
+                self.rm_file(p)
+            else:
+                self.rmdir(p)
+    def mkdir(self, path: str, create_parents: bool = True, **kwargs: Any) -> None:
+        path = self._strip_protocol(path)
+        parent = self._parent(path)
+        if parent != self.root_marker and not self.exists(parent) and create_parents:
+            self.mkdir(parent, create_parents=create_parents)
+        self.ftp.mkd(path)
+        self.invalidate_cache(self._parent(path))
+    def makedirs(self, path: str, exist_ok: bool = False) -> None:
+        path = self._strip_protocol(path)
+        if self.exists(path):
+            # NB: "/" does not "exist" as it has no directory entry
+            if not exist_ok:
+                raise FileExistsError(f"{path} exists without `exist_ok`")
+            # exists_ok=True -> no-op
+        else:
+            self.mkdir(path, create_parents=True)
+    def rmdir(self, path):
+        path = self._strip_protocol(path)
+        self.ftp.rmd(path)
+        self.invalidate_cache(self._parent(path))
+    def mv(self, path1, path2, **kwargs):
+        path1 = self._strip_protocol(path1)
+        path2 = self._strip_protocol(path2)
+        self.ftp.rename(path1, path2)
+        self.invalidate_cache(self._parent(path1))
+        self.invalidate_cache(self._parent(path2))
+    def __del__(self):
+        self.ftp.close()
+    def invalidate_cache(self, path=None):
+        if path is None:
+            self.dircache.clear()
+        else:
+            self.dircache.pop(path, None)
+        super().invalidate_cache(path)
+class TransferDone(Exception):
+    """Internal exception to break out of transfer"""
+    pass
+class FTPFile(AbstractBufferedFile):
+    """Interact with a remote FTP file with read/write buffering"""
+    def __init__(
+        self,
+        fs,
+        path,
+        mode="rb",
+        block_size="default",
+        autocommit=True,
+        cache_type="readahead",
+        cache_options=None,
+        **kwargs,
+    ):
+        super().__init__(
+            fs,
+            path,
+            mode=mode,
+            block_size=block_size,
+            autocommit=autocommit,
+            cache_type=cache_type,
+            cache_options=cache_options,
+            **kwargs,
+        )
+        if not autocommit:
+            self.target = self.path
+            self.path = "/".join([kwargs["tempdir"], str(uuid.uuid4())])
+    def commit(self):
+        self.fs.mv(self.path, self.target)
+    def discard(self):
+        self.fs.rm(self.path)
+    def _fetch_range(self, start, end):
+        """Get bytes between given byte limits
+        Implemented by raising an exception in the fetch callback when the
+        number of bytes received reaches the requested amount.
+        Will fail if the server does not respect the REST command on
+        retrieve requests.
+        """
+        out = []
+        total = [0]
+        def callback(x):
+            total[0] += len(x)
+            if total[0] > end - start:
+                out.append(x[: (end - start) - total[0]])
+                if end < self.size:
+                    raise TransferDone
+            else:
+                out.append(x)
+            if total[0] == end - start and end < self.size:
+                raise TransferDone
+        try:
+            self.fs.ftp.retrbinary(
+                f"RETR {self.path}",
+                blocksize=self.blocksize,
+                rest=start,
+                callback=callback,
+            )
+        except TransferDone:
+            try:
+                # stop transfer, we got enough bytes for this block
+                self.fs.ftp.abort()
+                self.fs.ftp.getmultiline()
+            except Error:
+                self.fs._connect()
+        return b"".join(out)
+    def _upload_chunk(self, final=False):
+        self.buffer.seek(0)
+        self.fs.ftp.storbinary(
+            f"STOR {self.path}", self.buffer, blocksize=self.blocksize, rest=self.offset
+        )
+        return True
+def _mlsd2(ftp, path="."):
+    """
+    Fall back to using `dir` instead of `mlsd` if not supported.
+    This parses a Linux style `ls -l` response to `dir`, but the response may
+    be platform dependent.
+    Parameters
+    ----------
+    ftp: ftplib.FTP
+    path: str
+        Expects to be given path, but defaults to ".".
+    """
+    lines = []
+    minfo = []
+    ftp.dir(path, lines.append)
+    for line in lines:
+        split_line = line.split()
+        if len(split_line) < 9:
+            continue
+        this = (
+            split_line[-1],
+            {
+                "modify": " ".join(split_line[5:8]),
+                "unix.owner": split_line[2],
+                "unix.group": split_line[3],
+                "unix.mode": split_line[0],
+                "size": split_line[4],
+            },
+        )
+        if this[1]["unix.mode"][0] == "d":
+            this[1]["type"] = "dir"
+        else:
+            this[1]["type"] = "file"
+        minfo.append(this)
+    return minfo

venv/lib/python3.10/site-packages/fsspec/implementations/gist.py ADDED Viewed

	@@ -0,0 +1,241 @@

+import requests
+from ..spec import AbstractFileSystem
+from ..utils import infer_storage_options
+from .memory import MemoryFile
+class GistFileSystem(AbstractFileSystem):
+    """
+    Interface to files in a single GitHub Gist.
+    Provides read-only access to a gist's files. Gists do not contain
+    subdirectories, so file listing is straightforward.
+    Parameters
+    ----------
+    gist_id: str
+        The ID of the gist you want to access (the long hex value from the URL).
+    filenames: list[str] (optional)
+        If provided, only make a file system representing these files, and do not fetch
+        the list of all files for this gist.
+    sha: str (optional)
+        If provided, fetch a particular revision of the gist. If omitted,
+        the latest revision is used.
+    username: str (optional)
+        GitHub username for authentication.
+    token: str (optional)
+        GitHub personal access token (required if username is given), or.
+    timeout: (float, float) or float, optional
+        Connect and read timeouts for requests (default 60s each).
+    kwargs: dict
+        Stored on `self.request_kw` and passed to `requests.get` when fetching Gist
+        metadata or reading ("opening") a file.
+    """
+    protocol = "gist"
+    gist_url = "https://api.github.com/gists/{gist_id}"
+    gist_rev_url = "https://api.github.com/gists/{gist_id}/{sha}"
+    def __init__(
+        self,
+        gist_id,
+        filenames=None,
+        sha=None,
+        username=None,
+        token=None,
+        timeout=None,
+        **kwargs,
+    ):
+        super().__init__()
+        self.gist_id = gist_id
+        self.filenames = filenames
+        self.sha = sha  # revision of the gist (optional)
+        if username is not None and token is None:
+            raise ValueError("User auth requires a token")
+        self.username = username
+        self.token = token
+        self.request_kw = kwargs
+        # Default timeouts to 60s connect/read if none provided
+        self.timeout = timeout if timeout is not None else (60, 60)
+        # We use a single-level "directory" cache, because a gist is essentially flat
+        self.dircache[""] = self._fetch_file_list()
+    @property
+    def kw(self):
+        """Auth parameters passed to 'requests' if we have username/token."""
+        kw = {
+            "headers": {
+                "Accept": "application/vnd.github+json",
+                "X-GitHub-Api-Version": "2022-11-28",
+            }
+        }
+        kw.update(self.request_kw)
+        if self.username and self.token:
+            kw["auth"] = (self.username, self.token)
+        elif self.token:
+            kw["headers"]["Authorization"] = f"Bearer {self.token}"
+        return kw
+    def _fetch_gist_metadata(self):
+        """
+        Fetch the JSON metadata for this gist (possibly for a specific revision).
+        """
+        if self.sha:
+            url = self.gist_rev_url.format(gist_id=self.gist_id, sha=self.sha)
+        else:
+            url = self.gist_url.format(gist_id=self.gist_id)
+        r = requests.get(url, timeout=self.timeout, **self.kw)
+        if r.status_code == 404:
+            raise FileNotFoundError(
+                f"Gist not found: {self.gist_id}@{self.sha or 'latest'}"
+            )
+        r.raise_for_status()
+        return r.json()
+    def _fetch_file_list(self):
+        """
+        Returns a list of dicts describing each file in the gist. These get stored
+        in self.dircache[""].
+        """
+        meta = self._fetch_gist_metadata()
+        if self.filenames:
+            available_files = meta.get("files", {})
+            files = {}
+            for fn in self.filenames:
+                if fn not in available_files:
+                    raise FileNotFoundError(fn)
+                files[fn] = available_files[fn]
+        else:
+            files = meta.get("files", {})
+        out = []
+        for fname, finfo in files.items():
+            if finfo is None:
+                # Occasionally GitHub returns a file entry with null if it was deleted
+                continue
+            # Build a directory entry
+            out.append(
+                {
+                    "name": fname,  # file's name
+                    "type": "file",  # gists have no subdirectories
+                    "size": finfo.get("size", 0),  # file size in bytes
+                    "raw_url": finfo.get("raw_url"),
+                }
+            )
+        return out
+    @classmethod
+    def _strip_protocol(cls, path):
+        """
+        Remove 'gist://' from the path, if present.
+        """
+        # The default infer_storage_options can handle gist://username:token@id/file
+        # or gist://id/file, but let's ensure we handle a normal usage too.
+        # We'll just strip the protocol prefix if it exists.
+        path = infer_storage_options(path).get("path", path)
+        return path.lstrip("/")
+    @staticmethod
+    def _get_kwargs_from_urls(path):
+        """
+        Parse 'gist://' style URLs into GistFileSystem constructor kwargs.
+        For example:
+          gist://:TOKEN@<gist_id>/file.txt
+          gist://username:TOKEN@<gist_id>/file.txt
+        """
+        so = infer_storage_options(path)
+        out = {}
+        if "username" in so and so["username"]:
+            out["username"] = so["username"]
+        if "password" in so and so["password"]:
+            out["token"] = so["password"]
+        if "host" in so and so["host"]:
+            # We interpret 'host' as the gist ID
+            out["gist_id"] = so["host"]
+        # Extract SHA and filename from path
+        if "path" in so and so["path"]:
+            path_parts = so["path"].rsplit("/", 2)[-2:]
+            if len(path_parts) == 2:
+                if path_parts[0]:  # SHA present
+                    out["sha"] = path_parts[0]
+                if path_parts[1]:  # filename also present
+                    out["filenames"] = [path_parts[1]]
+        return out
+    def ls(self, path="", detail=False, **kwargs):
+        """
+        List files in the gist. Gists are single-level, so any 'path' is basically
+        the filename, or empty for all files.
+        Parameters
+        ----------
+        path : str, optional
+            The filename to list. If empty, returns all files in the gist.
+        detail : bool, default False
+            If True, return a list of dicts; if False, return a list of filenames.
+        """
+        path = self._strip_protocol(path or "")
+        # If path is empty, return all
+        if path == "":
+            results = self.dircache[""]
+        else:
+            # We want just the single file with this name
+            all_files = self.dircache[""]
+            results = [f for f in all_files if f["name"] == path]
+            if not results:
+                raise FileNotFoundError(path)
+        if detail:
+            return results
+        else:
+            return sorted(f["name"] for f in results)
+    def _open(self, path, mode="rb", block_size=None, **kwargs):
+        """
+        Read a single file from the gist.
+        """
+        if mode != "rb":
+            raise NotImplementedError("GitHub Gist FS is read-only (no write).")
+        path = self._strip_protocol(path)
+        # Find the file entry in our dircache
+        matches = [f for f in self.dircache[""] if f["name"] == path]
+        if not matches:
+            raise FileNotFoundError(path)
+        finfo = matches[0]
+        raw_url = finfo.get("raw_url")
+        if not raw_url:
+            raise FileNotFoundError(f"No raw_url for file: {path}")
+        r = requests.get(raw_url, timeout=self.timeout, **self.kw)
+        if r.status_code == 404:
+            raise FileNotFoundError(path)
+        r.raise_for_status()
+        return MemoryFile(path, None, r.content)
+    def cat(self, path, recursive=False, on_error="raise", **kwargs):
+        """
+        Return {path: contents} for the given file or files. If 'recursive' is True,
+        and path is empty, returns all files in the gist.
+        """
+        paths = self.expand_path(path, recursive=recursive)
+        out = {}
+        for p in paths:
+            try:
+                with self.open(p, "rb") as f:
+                    out[p] = f.read()
+            except FileNotFoundError as e:
+                if on_error == "raise":
+                    raise e
+                elif on_error == "omit":
+                    pass  # skip
+                else:
+                    out[p] = e
+        if len(paths) == 1 and paths[0] == path:
+            return out[path]
+        return out

venv/lib/python3.10/site-packages/fsspec/implementations/git.py ADDED Viewed

	@@ -0,0 +1,114 @@

+import os
+import pygit2
+from fsspec.spec import AbstractFileSystem
+from .memory import MemoryFile
+class GitFileSystem(AbstractFileSystem):
+    """Browse the files of a local git repo at any hash/tag/branch
+    (experimental backend)
+    """
+    root_marker = ""
+    cachable = True
+    def __init__(self, path=None, fo=None, ref=None, **kwargs):
+        """
+        Parameters
+        ----------
+        path: str (optional)
+            Local location of the repo (uses current directory if not given).
+            May be deprecated in favour of ``fo``. When used with a higher
+            level function such as fsspec.open(), may be of the form
+            "git://[path-to-repo[:]][ref@]path/to/file" (but the actual
+            file path should not contain "@" or ":").
+        fo: str (optional)
+            Same as ``path``, but passed as part of a chained URL. This one
+            takes precedence if both are given.
+        ref: str (optional)
+            Reference to work with, could be a hash, tag or branch name. Defaults
+            to current working tree. Note that ``ls`` and ``open`` also take hash,
+            so this becomes the default for those operations
+        kwargs
+        """
+        super().__init__(**kwargs)
+        self.repo = pygit2.Repository(fo or path or os.getcwd())
+        self.ref = ref or "master"
+    @classmethod
+    def _strip_protocol(cls, path):
+        path = super()._strip_protocol(path).lstrip("/")
+        if ":" in path:
+            path = path.split(":", 1)[1]
+        if "@" in path:
+            path = path.split("@", 1)[1]
+        return path.lstrip("/")
+    def _path_to_object(self, path, ref):
+        comm, ref = self.repo.resolve_refish(ref or self.ref)
+        parts = path.split("/")
+        tree = comm.tree
+        for part in parts:
+            if part and isinstance(tree, pygit2.Tree):
+                if part not in tree:
+                    raise FileNotFoundError(path)
+                tree = tree[part]
+        return tree
+    @staticmethod
+    def _get_kwargs_from_urls(path):
+        path = path.removeprefix("git://")
+        out = {}
+        if ":" in path:
+            out["path"], path = path.split(":", 1)
+        if "@" in path:
+            out["ref"], path = path.split("@", 1)
+        return out
+    @staticmethod
+    def _object_to_info(obj, path=None):
+        # obj.name and obj.filemode are None for the root tree!
+        is_dir = isinstance(obj, pygit2.Tree)
+        return {
+            "type": "directory" if is_dir else "file",
+            "name": (
+                "/".join([path, obj.name or ""]).lstrip("/") if path else obj.name
+            ),
+            "hex": str(obj.id),
+            "mode": "100644" if obj.filemode is None else f"{obj.filemode:o}",
+            "size": 0 if is_dir else obj.size,
+        }
+    def ls(self, path, detail=True, ref=None, **kwargs):
+        tree = self._path_to_object(self._strip_protocol(path), ref)
+        return [
+            GitFileSystem._object_to_info(obj, path)
+            if detail
+            else GitFileSystem._object_to_info(obj, path)["name"]
+            for obj in (tree if isinstance(tree, pygit2.Tree) else [tree])
+        ]
+    def info(self, path, ref=None, **kwargs):
+        tree = self._path_to_object(self._strip_protocol(path), ref)
+        return GitFileSystem._object_to_info(tree, path)
+    def ukey(self, path, ref=None):
+        return self.info(path, ref=ref)["hex"]
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=True,
+        cache_options=None,
+        ref=None,
+        **kwargs,
+    ):
+        obj = self._path_to_object(path, ref or self.ref)
+        return MemoryFile(data=obj.data)

venv/lib/python3.10/site-packages/fsspec/implementations/github.py ADDED Viewed

	@@ -0,0 +1,333 @@

+import base64
+import re
+import requests
+from ..spec import AbstractFileSystem
+from ..utils import infer_storage_options
+from .memory import MemoryFile
+class GithubFileSystem(AbstractFileSystem):
+    """Interface to files in github
+    An instance of this class provides the files residing within a remote github
+    repository. You may specify a point in the repos history, by SHA, branch
+    or tag (default is current master).
+    For files less than 1 MB in size, file content is returned directly in a
+    MemoryFile. For larger files, or for files tracked by git-lfs, file content
+    is returned as an HTTPFile wrapping the ``download_url`` provided by the
+    GitHub API.
+    When using fsspec.open, allows URIs of the form:
+    - "github://path/file", in which case you must specify org, repo and
+      may specify sha in the extra args
+    - 'github://org:repo@/precip/catalog.yml', where the org and repo are
+      part of the URI
+    - 'github://org:repo@sha/precip/catalog.yml', where the sha is also included
+    ``sha`` can be the full or abbreviated hex of the commit you want to fetch
+    from, or a branch or tag name (so long as it doesn't contain special characters
+    like "/", "?", which would have to be HTTP-encoded).
+    For authorised access, you must provide username and token, which can be made
+    at https://github.com/settings/tokens
+    """
+    url = "https://api.github.com/repos/{org}/{repo}/git/trees/{sha}"
+    content_url = "https://api.github.com/repos/{org}/{repo}/contents/{path}?ref={sha}"
+    protocol = "github"
+    timeout = (60, 60)  # connect, read timeouts
+    def __init__(
+        self, org, repo, sha=None, username=None, token=None, timeout=None, **kwargs
+    ):
+        super().__init__(**kwargs)
+        self.org = org
+        self.repo = repo
+        if (username is None) ^ (token is None):
+            raise ValueError("Auth required both username and token")
+        self.username = username
+        self.token = token
+        if timeout is not None:
+            self.timeout = timeout
+        if sha is None:
+            # look up default branch (not necessarily "master")
+            u = "https://api.github.com/repos/{org}/{repo}"
+            r = requests.get(
+                u.format(org=org, repo=repo), timeout=self.timeout, **self.kw
+            )
+            r.raise_for_status()
+            sha = r.json()["default_branch"]
+        self.root = sha
+        self.ls("")
+        try:
+            from .http import HTTPFileSystem
+            self.http_fs = HTTPFileSystem(**kwargs)
+        except ImportError:
+            self.http_fs = None
+    @property
+    def kw(self):
+        if self.username:
+            return {"auth": (self.username, self.token)}
+        return {}
+    @classmethod
+    def repos(cls, org_or_user, is_org=True):
+        """List repo names for given org or user
+        This may become the top level of the FS
+        Parameters
+        ----------
+        org_or_user: str
+            Name of the github org or user to query
+        is_org: bool (default True)
+            Whether the name is an organisation (True) or user (False)
+        Returns
+        -------
+        List of string
+        """
+        r = requests.get(
+            f"https://api.github.com/{['users', 'orgs'][is_org]}/{org_or_user}/repos",
+            timeout=cls.timeout,
+        )
+        r.raise_for_status()
+        return [repo["name"] for repo in r.json()]
+    @property
+    def tags(self):
+        """Names of tags in the repo"""
+        r = requests.get(
+            f"https://api.github.com/repos/{self.org}/{self.repo}/tags",
+            timeout=self.timeout,
+            **self.kw,
+        )
+        r.raise_for_status()
+        return [t["name"] for t in r.json()]
+    @property
+    def branches(self):
+        """Names of branches in the repo"""
+        r = requests.get(
+            f"https://api.github.com/repos/{self.org}/{self.repo}/branches",
+            timeout=self.timeout,
+            **self.kw,
+        )
+        r.raise_for_status()
+        return [t["name"] for t in r.json()]
+    @property
+    def refs(self):
+        """Named references, tags and branches"""
+        return {"tags": self.tags, "branches": self.branches}
+    def ls(self, path, detail=False, sha=None, _sha=None, **kwargs):
+        """List files at given path
+        Parameters
+        ----------
+        path: str
+            Location to list, relative to repo root
+        detail: bool
+            If True, returns list of dicts, one per file; if False, returns
+            list of full filenames only
+        sha: str (optional)
+            List at the given point in the repo history, branch or tag name or commit
+            SHA
+        _sha: str (optional)
+            List this specific tree object (used internally to descend into trees)
+        """
+        path = self._strip_protocol(path)
+        if path == "":
+            _sha = sha or self.root
+        if _sha is None:
+            parts = path.rstrip("/").split("/")
+            so_far = ""
+            _sha = sha or self.root
+            for part in parts:
+                out = self.ls(so_far, True, sha=sha, _sha=_sha)
+                so_far += "/" + part if so_far else part
+                out = [o for o in out if o["name"] == so_far]
+                if not out:
+                    raise FileNotFoundError(path)
+                out = out[0]
+                if out["type"] == "file":
+                    if detail:
+                        return [out]
+                    else:
+                        return path
+                _sha = out["sha"]
+        if path not in self.dircache or sha not in [self.root, None]:
+            r = requests.get(
+                self.url.format(org=self.org, repo=self.repo, sha=_sha),
+                timeout=self.timeout,
+                **self.kw,
+            )
+            if r.status_code == 404:
+                raise FileNotFoundError(path)
+            r.raise_for_status()
+            types = {"blob": "file", "tree": "directory"}
+            out = [
+                {
+                    "name": path + "/" + f["path"] if path else f["path"],
+                    "mode": f["mode"],
+                    "type": types[f["type"]],
+                    "size": f.get("size", 0),
+                    "sha": f["sha"],
+                }
+                for f in r.json()["tree"]
+                if f["type"] in types
+            ]
+            if sha in [self.root, None]:
+                self.dircache[path] = out
+        else:
+            out = self.dircache[path]
+        if detail:
+            return out
+        else:
+            return sorted([f["name"] for f in out])
+    def invalidate_cache(self, path=None):
+        self.dircache.clear()
+    @classmethod
+    def _strip_protocol(cls, path):
+        opts = infer_storage_options(path)
+        if "username" not in opts:
+            return super()._strip_protocol(path)
+        return opts["path"].lstrip("/")
+    @staticmethod
+    def _get_kwargs_from_urls(path):
+        opts = infer_storage_options(path)
+        if "username" not in opts:
+            return {}
+        out = {"org": opts["username"], "repo": opts["password"]}
+        if opts["host"]:
+            out["sha"] = opts["host"]
+        return out
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        cache_options=None,
+        sha=None,
+        **kwargs,
+    ):
+        if mode != "rb":
+            raise NotImplementedError
+        # construct a url to hit the GitHub API's repo contents API
+        url = self.content_url.format(
+            org=self.org, repo=self.repo, path=path, sha=sha or self.root
+        )
+        # make a request to this API, and parse the response as JSON
+        r = requests.get(url, timeout=self.timeout, **self.kw)
+        if r.status_code == 404:
+            raise FileNotFoundError(path)
+        r.raise_for_status()
+        content_json = r.json()
+        # if the response's content key is not empty, try to parse it as base64
+        if content_json["content"]:
+            content = base64.b64decode(content_json["content"])
+            # as long as the content does not start with the string
+            # "version https://git-lfs.github.com/"
+            # then it is probably not a git-lfs pointer and we can just return
+            # the content directly
+            if not content.startswith(b"version https://git-lfs.github.com/"):
+                return MemoryFile(None, None, content)
+        # we land here if the content was not present in the first response
+        # (regular file over 1MB or git-lfs tracked file)
+        # in this case, we get let the HTTPFileSystem handle the download
+        if self.http_fs is None:
+            raise ImportError(
+                "Please install fsspec[http] to access github files >1 MB "
+                "or git-lfs tracked files."
+            )
+        return self.http_fs.open(
+            content_json["download_url"],
+            mode=mode,
+            block_size=block_size,
+            cache_options=cache_options,
+            **kwargs,
+        )
+    def rm(self, path, recursive=False, maxdepth=None, message=None):
+        path = self.expand_path(path, recursive=recursive, maxdepth=maxdepth)
+        for p in reversed(path):
+            self.rm_file(p, message=message)
+    def rm_file(self, path, message=None, **kwargs):
+        """
+        Remove a file from a specified branch using a given commit message.
+        Since Github DELETE operation requires a branch name, and we can't reliably
+        determine whether the provided SHA refers to a branch, tag, or commit, we
+        assume it's a branch. If it's not, the user will encounter an error when
+        attempting to retrieve the file SHA or delete the file.
+        Parameters
+        ----------
+        path: str
+            The file's location relative to the repository root.
+        message: str, optional
+            The commit message for the deletion.
+        """
+        if not self.username:
+            raise ValueError("Authentication required")
+        path = self._strip_protocol(path)
+        # Attempt to get SHA from cache or Github API
+        sha = self._get_sha_from_cache(path)
+        if not sha:
+            url = self.content_url.format(
+                org=self.org, repo=self.repo, path=path.lstrip("/"), sha=self.root
+            )
+            r = requests.get(url, timeout=self.timeout, **self.kw)
+            if r.status_code == 404:
+                raise FileNotFoundError(path)
+            r.raise_for_status()
+            sha = r.json()["sha"]
+        # Delete the file
+        delete_url = self.content_url.format(
+            org=self.org, repo=self.repo, path=path, sha=self.root
+        )
+        branch = self.root
+        data = {
+            "message": message or f"Delete {path}",
+            "sha": sha,
+            **({"branch": branch} if branch else {}),
+        }
+        r = requests.delete(delete_url, json=data, timeout=self.timeout, **self.kw)
+        error_message = r.json().get("message", "")
+        if re.search(r"Branch .+ not found", error_message):
+            error = "Remove only works when the filesystem is initialised from a branch or default (None)"
+            raise ValueError(error)
+        r.raise_for_status()
+        self.invalidate_cache(path)
+    def _get_sha_from_cache(self, path):
+        for entries in self.dircache.values():
+            for entry in entries:
+                entry_path = entry.get("name")
+                if entry_path and entry_path == path and "sha" in entry:
+                    return entry["sha"]
+        return None

venv/lib/python3.10/site-packages/fsspec/implementations/http.py ADDED Viewed

	@@ -0,0 +1,897 @@

+import asyncio
+import io
+import logging
+import re
+import weakref
+from copy import copy
+from urllib.parse import urlparse
+import aiohttp
+import yarl
+from fsspec.asyn import AbstractAsyncStreamedFile, AsyncFileSystem, sync, sync_wrapper
+from fsspec.callbacks import DEFAULT_CALLBACK
+from fsspec.exceptions import FSTimeoutError
+from fsspec.spec import AbstractBufferedFile
+from fsspec.utils import (
+    DEFAULT_BLOCK_SIZE,
+    glob_translate,
+    isfilelike,
+    nullcontext,
+    tokenize,
+)
+from ..caching import AllBytes
+# https://stackoverflow.com/a/15926317/3821154
+ex = re.compile(r"""<(a|A)\s+(?:[^>]*?\s+)?(href|HREF)=["'](?P<url>[^"']+)""")
+ex2 = re.compile(r"""(?P<url>http[s]?://[-a-zA-Z0-9@:%_+.~#?&/=]+)""")
+logger = logging.getLogger("fsspec.http")
+async def get_client(**kwargs):
+    return aiohttp.ClientSession(**kwargs)
+class HTTPFileSystem(AsyncFileSystem):
+    """
+    Simple File-System for fetching data via HTTP(S)
+    ``ls()`` is implemented by loading the parent page and doing a regex
+    match on the result. If simple_link=True, anything of the form
+    "http(s)://server.com/stuff?thing=other"; otherwise only links within
+    HTML href tags will be used.
+    """
+    protocol = ("http", "https")
+    sep = "/"
+    def __init__(
+        self,
+        simple_links=True,
+        block_size=None,
+        same_scheme=True,
+        size_policy=None,
+        cache_type="bytes",
+        cache_options=None,
+        asynchronous=False,
+        loop=None,
+        client_kwargs=None,
+        get_client=get_client,
+        encoded=False,
+        **storage_options,
+    ):
+        """
+        NB: if this is called async, you must await set_client
+        Parameters
+        ----------
+        block_size: int
+            Blocks to read bytes; if 0, will default to raw requests file-like
+            objects instead of HTTPFile instances
+        simple_links: bool
+            If True, will consider both HTML <a> tags and anything that looks
+            like a URL; if False, will consider only the former.
+        same_scheme: True
+            When doing ls/glob, if this is True, only consider paths that have
+            http/https matching the input URLs.
+        size_policy: this argument is deprecated
+        client_kwargs: dict
+            Passed to aiohttp.ClientSession, see
+            https://docs.aiohttp.org/en/stable/client_reference.html
+            For example, ``{'auth': aiohttp.BasicAuth('user', 'pass')}``
+        get_client: Callable[..., aiohttp.ClientSession]
+            A callable, which takes keyword arguments and constructs
+            an aiohttp.ClientSession. Its state will be managed by
+            the HTTPFileSystem class.
+        storage_options: key-value
+            Any other parameters passed on to requests
+        cache_type, cache_options: defaults used in open()
+        """
+        super().__init__(self, asynchronous=asynchronous, loop=loop, **storage_options)
+        self.block_size = block_size if block_size is not None else DEFAULT_BLOCK_SIZE
+        self.simple_links = simple_links
+        self.same_schema = same_scheme
+        self.cache_type = cache_type
+        self.cache_options = cache_options
+        self.client_kwargs = client_kwargs or {}
+        self.get_client = get_client
+        self.encoded = encoded
+        self.kwargs = storage_options
+        self._session = None
+        # Clean caching-related parameters from `storage_options`
+        # before propagating them as `request_options` through `self.kwargs`.
+        # TODO: Maybe rename `self.kwargs` to `self.request_options` to make
+        #       it clearer.
+        request_options = copy(storage_options)
+        self.use_listings_cache = request_options.pop("use_listings_cache", False)
+        request_options.pop("listings_expiry_time", None)
+        request_options.pop("max_paths", None)
+        request_options.pop("skip_instance_cache", None)
+        self.kwargs = request_options
+    @property
+    def fsid(self):
+        return "http"
+    def encode_url(self, url):
+        return yarl.URL(url, encoded=self.encoded)
+    @staticmethod
+    def close_session(loop, session):
+        if loop is not None and loop.is_running():
+            try:
+                sync(loop, session.close, timeout=0.1)
+                return
+            except (TimeoutError, FSTimeoutError, NotImplementedError):
+                pass
+        connector = getattr(session, "_connector", None)
+        if connector is not None:
+            # close after loop is dead
+            connector._close()
+    async def set_session(self):
+        if self._session is None:
+            self._session = await self.get_client(loop=self.loop, **self.client_kwargs)
+            if not self.asynchronous:
+                weakref.finalize(self, self.close_session, self.loop, self._session)
+        return self._session
+    @classmethod
+    def _strip_protocol(cls, path):
+        """For HTTP, we always want to keep the full URL"""
+        return path
+    @classmethod
+    def _parent(cls, path):
+        # override, since _strip_protocol is different for URLs
+        par = super()._parent(path)
+        if len(par) > 7:  # "http://..."
+            return par
+        return ""
+    async def _ls_real(self, url, detail=True, **kwargs):
+        # ignoring URL-encoded arguments
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        logger.debug(url)
+        session = await self.set_session()
+        async with session.get(self.encode_url(url), **self.kwargs) as r:
+            self._raise_not_found_for_status(r, url)
+            if "Content-Type" in r.headers:
+                mimetype = r.headers["Content-Type"].partition(";")[0]
+            else:
+                mimetype = None
+            if mimetype in ("text/html", None):
+                try:
+                    text = await r.text(errors="ignore")
+                    if self.simple_links:
+                        links = ex2.findall(text) + [u[2] for u in ex.findall(text)]
+                    else:
+                        links = [u[2] for u in ex.findall(text)]
+                except UnicodeDecodeError:
+                    links = []  # binary, not HTML
+            else:
+                links = []
+        out = set()
+        parts = urlparse(url)
+        for l in links:
+            if isinstance(l, tuple):
+                l = l[1]
+            if l.startswith("/") and len(l) > 1:
+                # absolute URL on this server
+                l = f"{parts.scheme}://{parts.netloc}{l}"
+            if l.startswith("http"):
+                if self.same_schema and l.startswith(url.rstrip("/") + "/"):
+                    out.add(l)
+                elif l.replace("https", "http").startswith(
+                    url.replace("https", "http").rstrip("/") + "/"
+                ):
+                    # allowed to cross http <-> https
+                    out.add(l)
+            else:
+                if l not in ["..", "../"]:
+                    # Ignore FTP-like "parent"
+                    out.add("/".join([url.rstrip("/"), l.lstrip("/")]))
+        if not out and url.endswith("/"):
+            out = await self._ls_real(url.rstrip("/"), detail=False)
+        if detail:
+            return [
+                {
+                    "name": u,
+                    "size": None,
+                    "type": "directory" if u.endswith("/") else "file",
+                }
+                for u in out
+            ]
+        else:
+            return sorted(out)
+    async def _ls(self, url, detail=True, **kwargs):
+        if self.use_listings_cache and url in self.dircache:
+            out = self.dircache[url]
+        else:
+            out = await self._ls_real(url, detail=detail, **kwargs)
+            self.dircache[url] = out
+        return out
+    ls = sync_wrapper(_ls)
+    def _raise_not_found_for_status(self, response, url):
+        """
+        Raises FileNotFoundError for 404s, otherwise uses raise_for_status.
+        """
+        if response.status == 404:
+            raise FileNotFoundError(url)
+        response.raise_for_status()
+    async def _cat_file(self, url, start=None, end=None, **kwargs):
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        logger.debug(url)
+        if start is not None or end is not None:
+            if start == end:
+                return b""
+            headers = kw.pop("headers", {}).copy()
+            headers["Range"] = await self._process_limits(url, start, end)
+            kw["headers"] = headers
+        session = await self.set_session()
+        async with session.get(self.encode_url(url), **kw) as r:
+            out = await r.read()
+            self._raise_not_found_for_status(r, url)
+        return out
+    async def _get_file(
+        self, rpath, lpath, chunk_size=5 * 2**20, callback=DEFAULT_CALLBACK, **kwargs
+    ):
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        logger.debug(rpath)
+        session = await self.set_session()
+        async with session.get(self.encode_url(rpath), **kw) as r:
+            try:
+                size = int(r.headers["content-length"])
+            except (ValueError, KeyError):
+                size = None
+            callback.set_size(size)
+            self._raise_not_found_for_status(r, rpath)
+            if isfilelike(lpath):
+                outfile = lpath
+            else:
+                outfile = open(lpath, "wb")  # noqa: ASYNC230
+            try:
+                chunk = True
+                while chunk:
+                    chunk = await r.content.read(chunk_size)
+                    outfile.write(chunk)
+                    callback.relative_update(len(chunk))
+            finally:
+                if not isfilelike(lpath):
+                    outfile.close()
+    async def _put_file(
+        self,
+        lpath,
+        rpath,
+        chunk_size=5 * 2**20,
+        callback=DEFAULT_CALLBACK,
+        method="post",
+        mode="overwrite",
+        **kwargs,
+    ):
+        if mode != "overwrite":
+            raise NotImplementedError("Exclusive write")
+        async def gen_chunks():
+            # Support passing arbitrary file-like objects
+            # and use them instead of streams.
+            if isinstance(lpath, io.IOBase):
+                context = nullcontext(lpath)
+                use_seek = False  # might not support seeking
+            else:
+                context = open(lpath, "rb")  # noqa: ASYNC230
+                use_seek = True
+            with context as f:
+                if use_seek:
+                    callback.set_size(f.seek(0, 2))
+                    f.seek(0)
+                else:
+                    callback.set_size(getattr(f, "size", None))
+                chunk = f.read(chunk_size)
+                while chunk:
+                    yield chunk
+                    callback.relative_update(len(chunk))
+                    chunk = f.read(chunk_size)
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        session = await self.set_session()
+        method = method.lower()
+        if method not in ("post", "put"):
+            raise ValueError(
+                f"method has to be either 'post' or 'put', not: {method!r}"
+            )
+        meth = getattr(session, method)
+        async with meth(self.encode_url(rpath), data=gen_chunks(), **kw) as resp:
+            self._raise_not_found_for_status(resp, rpath)
+    async def _exists(self, path, strict=False, **kwargs):
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        try:
+            logger.debug(path)
+            session = await self.set_session()
+            r = await session.get(self.encode_url(path), **kw)
+            async with r:
+                if strict:
+                    self._raise_not_found_for_status(r, path)
+                return r.status < 400
+        except FileNotFoundError:
+            return False
+        except aiohttp.ClientError:
+            if strict:
+                raise
+            return False
+    async def _isfile(self, path, **kwargs):
+        return await self._exists(path, **kwargs)
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=None,  # XXX: This differs from the base class.
+        cache_type=None,
+        cache_options=None,
+        size=None,
+        **kwargs,
+    ):
+        """Make a file-like object
+        Parameters
+        ----------
+        path: str
+            Full URL with protocol
+        mode: string
+            must be "rb"
+        block_size: int or None
+            Bytes to download in one request; use instance value if None. If
+            zero, will return a streaming Requests file-like instance.
+        kwargs: key-value
+            Any other parameters, passed to requests calls
+        """
+        if mode != "rb":
+            raise NotImplementedError
+        block_size = block_size if block_size is not None else self.block_size
+        kw = self.kwargs.copy()
+        kw["asynchronous"] = self.asynchronous
+        kw.update(kwargs)
+        info = {}
+        size = size or info.update(self.info(path, **kwargs)) or info["size"]
+        session = sync(self.loop, self.set_session)
+        if block_size and size and info.get("partial", True):
+            return HTTPFile(
+                self,
+                path,
+                session=session,
+                block_size=block_size,
+                mode=mode,
+                size=size,
+                cache_type=cache_type or self.cache_type,
+                cache_options=cache_options or self.cache_options,
+                loop=self.loop,
+                **kw,
+            )
+        else:
+            return HTTPStreamFile(
+                self,
+                path,
+                mode=mode,
+                loop=self.loop,
+                session=session,
+                **kw,
+            )
+    async def open_async(self, path, mode="rb", size=None, **kwargs):
+        session = await self.set_session()
+        if size is None:
+            try:
+                size = (await self._info(path, **kwargs))["size"]
+            except FileNotFoundError:
+                pass
+        return AsyncStreamFile(
+            self,
+            path,
+            loop=self.loop,
+            session=session,
+            size=size,
+            **kwargs,
+        )
+    def ukey(self, url):
+        """Unique identifier; assume HTTP files are static, unchanging"""
+        return tokenize(url, self.kwargs, self.protocol)
+    async def _info(self, url, **kwargs):
+        """Get info of URL
+        Tries to access location via HEAD, and then GET methods, but does
+        not fetch the data.
+        It is possible that the server does not supply any size information, in
+        which case size will be given as None (and certain operations on the
+        corresponding file will not work).
+        """
+        info = {}
+        session = await self.set_session()
+        for policy in ["head", "get"]:
+            try:
+                info.update(
+                    await _file_info(
+                        self.encode_url(url),
+                        size_policy=policy,
+                        session=session,
+                        **self.kwargs,
+                        **kwargs,
+                    )
+                )
+                if info.get("size") is not None:
+                    break
+            except Exception as exc:
+                if policy == "get":
+                    # If get failed, then raise a FileNotFoundError
+                    raise FileNotFoundError(url) from exc
+                logger.debug("", exc_info=exc)
+        return {"name": url, "size": None, **info, "type": "file"}
+    async def _glob(self, path, maxdepth=None, **kwargs):
+        """
+        Find files by glob-matching.
+        This implementation is idntical to the one in AbstractFileSystem,
+        but "?" is not considered as a character for globbing, because it is
+        so common in URLs, often identifying the "query" part.
+        """
+        if maxdepth is not None and maxdepth < 1:
+            raise ValueError("maxdepth must be at least 1")
+        import re
+        ends_with_slash = path.endswith("/")  # _strip_protocol strips trailing slash
+        path = self._strip_protocol(path)
+        append_slash_to_dirname = ends_with_slash or path.endswith(("/**", "/*"))
+        idx_star = path.find("*") if path.find("*") >= 0 else len(path)
+        idx_brace = path.find("[") if path.find("[") >= 0 else len(path)
+        min_idx = min(idx_star, idx_brace)
+        detail = kwargs.pop("detail", False)
+        if not has_magic(path):
+            if await self._exists(path, **kwargs):
+                if not detail:
+                    return [path]
+                else:
+                    return {path: await self._info(path, **kwargs)}
+            else:
+                if not detail:
+                    return []  # glob of non-existent returns empty
+                else:
+                    return {}
+        elif "/" in path[:min_idx]:
+            min_idx = path[:min_idx].rindex("/")
+            root = path[: min_idx + 1]
+            depth = path[min_idx + 1 :].count("/") + 1
+        else:
+            root = ""
+            depth = path[min_idx + 1 :].count("/") + 1
+        if "**" in path:
+            if maxdepth is not None:
+                idx_double_stars = path.find("**")
+                depth_double_stars = path[idx_double_stars:].count("/") + 1
+                depth = depth - depth_double_stars + maxdepth
+            else:
+                depth = None
+        allpaths = await self._find(
+            root, maxdepth=depth, withdirs=True, detail=True, **kwargs
+        )
+        pattern = glob_translate(path + ("/" if ends_with_slash else ""))
+        pattern = re.compile(pattern)
+        out = {
+            (
+                p.rstrip("/")
+                if not append_slash_to_dirname
+                and info["type"] == "directory"
+                and p.endswith("/")
+                else p
+            ): info
+            for p, info in sorted(allpaths.items())
+            if pattern.match(p.rstrip("/"))
+        }
+        if detail:
+            return out
+        else:
+            return list(out)
+    async def _isdir(self, path):
+        # override, since all URLs are (also) files
+        try:
+            return bool(await self._ls(path))
+        except (FileNotFoundError, ValueError):
+            return False
+    async def _pipe_file(self, path, value, mode="overwrite", **kwargs):
+        """
+        Write bytes to a remote file over HTTP.
+        Parameters
+        ----------
+        path : str
+            Target URL where the data should be written
+        value : bytes
+            Data to be written
+        mode : str
+            How to write to the file - 'overwrite' or 'append'
+        **kwargs : dict
+            Additional parameters to pass to the HTTP request
+        """
+        url = self._strip_protocol(path)
+        headers = kwargs.pop("headers", {})
+        headers["Content-Length"] = str(len(value))
+        session = await self.set_session()
+        async with session.put(url, data=value, headers=headers, **kwargs) as r:
+            r.raise_for_status()
+class HTTPFile(AbstractBufferedFile):
+    """
+    A file-like object pointing to a remote HTTP(S) resource
+    Supports only reading, with read-ahead of a predetermined block-size.
+    In the case that the server does not supply the filesize, only reading of
+    the complete file in one go is supported.
+    Parameters
+    ----------
+    url: str
+        Full URL of the remote resource, including the protocol
+    session: aiohttp.ClientSession or None
+        All calls will be made within this session, to avoid restarting
+        connections where the server allows this
+    block_size: int or None
+        The amount of read-ahead to do, in bytes. Default is 5MB, or the value
+        configured for the FileSystem creating this file
+    size: None or int
+        If given, this is the size of the file in bytes, and we don't attempt
+        to call the server to find the value.
+    kwargs: all other key-values are passed to requests calls.
+    """
+    def __init__(
+        self,
+        fs,
+        url,
+        session=None,
+        block_size=None,
+        mode="rb",
+        cache_type="bytes",
+        cache_options=None,
+        size=None,
+        loop=None,
+        asynchronous=False,
+        **kwargs,
+    ):
+        if mode != "rb":
+            raise NotImplementedError("File mode not supported")
+        self.asynchronous = asynchronous
+        self.loop = loop
+        self.url = url
+        self.session = session
+        self.details = {"name": url, "size": size, "type": "file"}
+        super().__init__(
+            fs=fs,
+            path=url,
+            mode=mode,
+            block_size=block_size,
+            cache_type=cache_type,
+            cache_options=cache_options,
+            **kwargs,
+        )
+    def read(self, length=-1):
+        """Read bytes from file
+        Parameters
+        ----------
+        length: int
+            Read up to this many bytes. If negative, read all content to end of
+            file. If the server has not supplied the filesize, attempting to
+            read only part of the data will raise a ValueError.
+        """
+        if (
+            (length < 0 and self.loc == 0)  # explicit read all
+            # but not when the size is known and fits into a block anyways
+            and not (self.size is not None and self.size <= self.blocksize)
+        ):
+            self._fetch_all()
+        if self.size is None:
+            if length < 0:
+                self._fetch_all()
+        else:
+            length = min(self.size - self.loc, length)
+        return super().read(length)
+    async def async_fetch_all(self):
+        """Read whole file in one shot, without caching
+        This is only called when position is still at zero,
+        and read() is called without a byte-count.
+        """
+        logger.debug(f"Fetch all for {self}")
+        if not isinstance(self.cache, AllBytes):
+            r = await self.session.get(self.fs.encode_url(self.url), **self.kwargs)
+            async with r:
+                r.raise_for_status()
+                out = await r.read()
+                self.cache = AllBytes(
+                    size=len(out), fetcher=None, blocksize=None, data=out
+                )
+                self.size = len(out)
+    _fetch_all = sync_wrapper(async_fetch_all)
+    def _parse_content_range(self, headers):
+        """Parse the Content-Range header"""
+        s = headers.get("Content-Range", "")
+        m = re.match(r"bytes (\d+-\d+|\*)/(\d+|\*)", s)
+        if not m:
+            return None, None, None
+        if m[1] == "*":
+            start = end = None
+        else:
+            start, end = [int(x) for x in m[1].split("-")]
+        total = None if m[2] == "*" else int(m[2])
+        return start, end, total
+    async def async_fetch_range(self, start, end):
+        """Download a block of data
+        The expectation is that the server returns only the requested bytes,
+        with HTTP code 206. If this is not the case, we first check the headers,
+        and then stream the output - if the data size is bigger than we
+        requested, an exception is raised.
+        """
+        logger.debug(f"Fetch range for {self}: {start}-{end}")
+        kwargs = self.kwargs.copy()
+        headers = kwargs.pop("headers", {}).copy()
+        headers["Range"] = f"bytes={start}-{end - 1}"
+        logger.debug(f"{self.url} : {headers['Range']}")
+        r = await self.session.get(
+            self.fs.encode_url(self.url), headers=headers, **kwargs
+        )
+        async with r:
+            if r.status == 416:
+                # range request outside file
+                return b""
+            r.raise_for_status()
+            # If the server has handled the range request, it should reply
+            # with status 206 (partial content). But we'll guess that a suitable
+            # Content-Range header or a Content-Length no more than the
+            # requested range also mean we have got the desired range.
+            response_is_range = (
+                r.status == 206
+                or self._parse_content_range(r.headers)[0] == start
+                or int(r.headers.get("Content-Length", end + 1)) <= end - start
+            )
+            if response_is_range:
+                # partial content, as expected
+                out = await r.read()
+            elif start > 0:
+                raise ValueError(
+                    "The HTTP server doesn't appear to support range requests. "
+                    "Only reading this file from the beginning is supported. "
+                    "Open with block_size=0 for a streaming file interface."
+                )
+            else:
+                # Response is not a range, but we want the start of the file,
+                # so we can read the required amount anyway.
+                cl = 0
+                out = []
+                while True:
+                    chunk = await r.content.read(2**20)
+                    # data size unknown, let's read until we have enough
+                    if chunk:
+                        out.append(chunk)
+                        cl += len(chunk)
+                        if cl > end - start:
+                            break
+                    else:
+                        break
+                out = b"".join(out)[: end - start]
+            return out
+    _fetch_range = sync_wrapper(async_fetch_range)
+magic_check = re.compile("([*[])")
+def has_magic(s):
+    match = magic_check.search(s)
+    return match is not None
+class HTTPStreamFile(AbstractBufferedFile):
+    def __init__(self, fs, url, mode="rb", loop=None, session=None, **kwargs):
+        self.asynchronous = kwargs.pop("asynchronous", False)
+        self.url = url
+        self.loop = loop
+        self.session = session
+        if mode != "rb":
+            raise ValueError
+        self.details = {"name": url, "size": None}
+        super().__init__(fs=fs, path=url, mode=mode, cache_type="none", **kwargs)
+        async def cor():
+            r = await self.session.get(self.fs.encode_url(url), **kwargs).__aenter__()
+            self.fs._raise_not_found_for_status(r, url)
+            return r
+        self.r = sync(self.loop, cor)
+        self.loop = fs.loop
+    def seek(self, loc, whence=0):
+        if loc == 0 and whence == 1:
+            return
+        if loc == self.loc and whence == 0:
+            return
+        raise ValueError("Cannot seek streaming HTTP file")
+    async def _read(self, num=-1):
+        out = await self.r.content.read(num)
+        self.loc += len(out)
+        return out
+    read = sync_wrapper(_read)
+    async def _close(self):
+        self.r.close()
+    def close(self):
+        asyncio.run_coroutine_threadsafe(self._close(), self.loop)
+        super().close()
+class AsyncStreamFile(AbstractAsyncStreamedFile):
+    def __init__(
+        self, fs, url, mode="rb", loop=None, session=None, size=None, **kwargs
+    ):
+        self.url = url
+        self.session = session
+        self.r = None
+        if mode != "rb":
+            raise ValueError
+        self.details = {"name": url, "size": None}
+        self.kwargs = kwargs
+        super().__init__(fs=fs, path=url, mode=mode, cache_type="none")
+        self.size = size
+    async def read(self, num=-1):
+        if self.r is None:
+            r = await self.session.get(
+                self.fs.encode_url(self.url), **self.kwargs
+            ).__aenter__()
+            self.fs._raise_not_found_for_status(r, self.url)
+            self.r = r
+        out = await self.r.content.read(num)
+        self.loc += len(out)
+        return out
+    async def close(self):
+        if self.r is not None:
+            self.r.close()
+            self.r = None
+        await super().close()
+async def get_range(session, url, start, end, file=None, **kwargs):
+    # explicit get a range when we know it must be safe
+    kwargs = kwargs.copy()
+    headers = kwargs.pop("headers", {}).copy()
+    headers["Range"] = f"bytes={start}-{end - 1}"
+    r = await session.get(url, headers=headers, **kwargs)
+    r.raise_for_status()
+    async with r:
+        out = await r.read()
+    if file:
+        with open(file, "r+b") as f:  # noqa: ASYNC230
+            f.seek(start)
+            f.write(out)
+    else:
+        return out
+async def _file_info(url, session, size_policy="head", **kwargs):
+    """Call HEAD on the server to get details about the file (size/checksum etc.)
+    Default operation is to explicitly allow redirects and use encoding
+    'identity' (no compression) to get the true size of the target.
+    """
+    logger.debug("Retrieve file size for %s", url)
+    kwargs = kwargs.copy()
+    ar = kwargs.pop("allow_redirects", True)
+    head = kwargs.get("headers", {}).copy()
+    head["Accept-Encoding"] = "identity"
+    kwargs["headers"] = head
+    info = {}
+    if size_policy == "head":
+        r = await session.head(url, allow_redirects=ar, **kwargs)
+    elif size_policy == "get":
+        r = await session.get(url, allow_redirects=ar, **kwargs)
+    else:
+        raise TypeError(f'size_policy must be "head" or "get", got {size_policy}')
+    async with r:
+        r.raise_for_status()
+        if "Content-Length" in r.headers:
+            # Some servers may choose to ignore Accept-Encoding and return
+            # compressed content, in which case the returned size is unreliable.
+            if "Content-Encoding" not in r.headers or r.headers["Content-Encoding"] in [
+                "identity",
+                "",
+            ]:
+                info["size"] = int(r.headers["Content-Length"])
+        elif "Content-Range" in r.headers:
+            info["size"] = int(r.headers["Content-Range"].split("/")[1])
+        if "Content-Type" in r.headers:
+            info["mimetype"] = r.headers["Content-Type"].partition(";")[0]
+        if r.headers.get("Accept-Ranges") == "none":
+            # Some servers may explicitly discourage partial content requests, but
+            # the lack of "Accept-Ranges" does not always indicate they would fail
+            info["partial"] = False
+        info["url"] = str(r.url)
+        for checksum_field in ["ETag", "Content-MD5", "Digest", "Last-Modified"]:
+            if r.headers.get(checksum_field):
+                info[checksum_field] = r.headers[checksum_field]
+    return info
+async def _file_size(url, session=None, *args, **kwargs):
+    if session is None:
+        session = await get_client()
+    info = await _file_info(url, session=session, *args, **kwargs)
+    return info.get("size")
+file_size = sync_wrapper(_file_size)

venv/lib/python3.10/site-packages/fsspec/implementations/http_sync.py ADDED Viewed

	@@ -0,0 +1,937 @@

+"""This file is largely copied from http.py"""
+import io
+import logging
+import re
+import urllib.error
+import urllib.parse
+from copy import copy
+from json import dumps, loads
+from urllib.parse import urlparse
+try:
+    import yarl
+except (ImportError, ModuleNotFoundError, OSError):
+    yarl = False
+from fsspec.callbacks import _DEFAULT_CALLBACK
+from fsspec.registry import register_implementation
+from fsspec.spec import AbstractBufferedFile, AbstractFileSystem
+from fsspec.utils import DEFAULT_BLOCK_SIZE, isfilelike, nullcontext, tokenize
+from ..caching import AllBytes
+# https://stackoverflow.com/a/15926317/3821154
+ex = re.compile(r"""<(a|A)\s+(?:[^>]*?\s+)?(href|HREF)=["'](?P<url>[^"']+)""")
+ex2 = re.compile(r"""(?P<url>http[s]?://[-a-zA-Z0-9@:%_+.~#?&/=]+)""")
+logger = logging.getLogger("fsspec.http")
+class JsHttpException(urllib.error.HTTPError): ...
+class StreamIO(io.BytesIO):
+    # fake class, so you can set attributes on it
+    # will eventually actually stream
+    ...
+class ResponseProxy:
+    """Looks like a requests response"""
+    def __init__(self, req, stream=False):
+        self.request = req
+        self.stream = stream
+        self._data = None
+        self._headers = None
+    @property
+    def raw(self):
+        if self._data is None:
+            b = self.request.response.to_bytes()
+            if self.stream:
+                self._data = StreamIO(b)
+            else:
+                self._data = b
+        return self._data
+    def close(self):
+        if hasattr(self, "_data"):
+            del self._data
+    @property
+    def headers(self):
+        if self._headers is None:
+            self._headers = dict(
+                [
+                    _.split(": ")
+                    for _ in self.request.getAllResponseHeaders().strip().split("\r\n")
+                ]
+            )
+        return self._headers
+    @property
+    def status_code(self):
+        return int(self.request.status)
+    def raise_for_status(self):
+        if not self.ok:
+            raise JsHttpException(
+                self.url, self.status_code, self.reason, self.headers, None
+            )
+    def iter_content(self, chunksize, *_, **__):
+        while True:
+            out = self.raw.read(chunksize)
+            if out:
+                yield out
+            else:
+                break
+    @property
+    def reason(self):
+        return self.request.statusText
+    @property
+    def ok(self):
+        return self.status_code < 400
+    @property
+    def url(self):
+        return self.request.response.responseURL
+    @property
+    def text(self):
+        # TODO: encoding from headers
+        return self.content.decode()
+    @property
+    def content(self):
+        self.stream = False
+        return self.raw
+    def json(self):
+        return loads(self.text)
+class RequestsSessionShim:
+    def __init__(self):
+        self.headers = {}
+    def request(
+        self,
+        method,
+        url,
+        params=None,
+        data=None,
+        headers=None,
+        cookies=None,
+        files=None,
+        auth=None,
+        timeout=None,
+        allow_redirects=None,
+        proxies=None,
+        hooks=None,
+        stream=None,
+        verify=None,
+        cert=None,
+        json=None,
+    ):
+        from js import Blob, XMLHttpRequest
+        logger.debug("JS request: %s %s", method, url)
+        if cert or verify or proxies or files or cookies or hooks:
+            raise NotImplementedError
+        if data and json:
+            raise ValueError("Use json= or data=, not both")
+        req = XMLHttpRequest.new()
+        extra = auth if auth else ()
+        if params:
+            url = f"{url}?{urllib.parse.urlencode(params)}"
+        req.open(method, url, False, *extra)
+        if timeout:
+            req.timeout = timeout
+        if headers:
+            for k, v in headers.items():
+                req.setRequestHeader(k, v)
+        req.setRequestHeader("Accept", "application/octet-stream")
+        req.responseType = "arraybuffer"
+        if json:
+            blob = Blob.new([dumps(data)], {type: "application/json"})
+            req.send(blob)
+        elif data:
+            if isinstance(data, io.IOBase):
+                data = data.read()
+            blob = Blob.new([data], {type: "application/octet-stream"})
+            req.send(blob)
+        else:
+            req.send(None)
+        return ResponseProxy(req, stream=stream)
+    def get(self, url, **kwargs):
+        return self.request("GET", url, **kwargs)
+    def head(self, url, **kwargs):
+        return self.request("HEAD", url, **kwargs)
+    def post(self, url, **kwargs):
+        return self.request("POST}", url, **kwargs)
+    def put(self, url, **kwargs):
+        return self.request("PUT", url, **kwargs)
+    def patch(self, url, **kwargs):
+        return self.request("PATCH", url, **kwargs)
+    def delete(self, url, **kwargs):
+        return self.request("DELETE", url, **kwargs)
+class HTTPFileSystem(AbstractFileSystem):
+    """
+    Simple File-System for fetching data via HTTP(S)
+    This is the BLOCKING version of the normal HTTPFileSystem. It uses
+    requests in normal python and the JS runtime in pyodide.
+    ***This implementation is extremely experimental, do not use unless
+    you are testing pyodide/pyscript integration***
+    """
+    protocol = ("http", "https", "sync-http", "sync-https")
+    sep = "/"
+    def __init__(
+        self,
+        simple_links=True,
+        block_size=None,
+        same_scheme=True,
+        cache_type="readahead",
+        cache_options=None,
+        client_kwargs=None,
+        encoded=False,
+        **storage_options,
+    ):
+        """
+        Parameters
+        ----------
+        block_size: int
+            Blocks to read bytes; if 0, will default to raw requests file-like
+            objects instead of HTTPFile instances
+        simple_links: bool
+            If True, will consider both HTML <a> tags and anything that looks
+            like a URL; if False, will consider only the former.
+        same_scheme: True
+            When doing ls/glob, if this is True, only consider paths that have
+            http/https matching the input URLs.
+        size_policy: this argument is deprecated
+        client_kwargs: dict
+            Passed to aiohttp.ClientSession, see
+            https://docs.aiohttp.org/en/stable/client_reference.html
+            For example, ``{'auth': aiohttp.BasicAuth('user', 'pass')}``
+        storage_options: key-value
+            Any other parameters passed on to requests
+        cache_type, cache_options: defaults used in open
+        """
+        super().__init__(self, **storage_options)
+        self.block_size = block_size if block_size is not None else DEFAULT_BLOCK_SIZE
+        self.simple_links = simple_links
+        self.same_schema = same_scheme
+        self.cache_type = cache_type
+        self.cache_options = cache_options
+        self.client_kwargs = client_kwargs or {}
+        self.encoded = encoded
+        self.kwargs = storage_options
+        try:
+            import js  # noqa: F401
+            logger.debug("Starting JS session")
+            self.session = RequestsSessionShim()
+            self.js = True
+        except Exception as e:
+            import requests
+            logger.debug("Starting cpython session because of: %s", e)
+            self.session = requests.Session(**(client_kwargs or {}))
+            self.js = False
+        request_options = copy(storage_options)
+        self.use_listings_cache = request_options.pop("use_listings_cache", False)
+        request_options.pop("listings_expiry_time", None)
+        request_options.pop("max_paths", None)
+        request_options.pop("skip_instance_cache", None)
+        self.kwargs = request_options
+    @property
+    def fsid(self):
+        return "sync-http"
+    def encode_url(self, url):
+        if yarl:
+            return yarl.URL(url, encoded=self.encoded)
+        return url
+    @classmethod
+    def _strip_protocol(cls, path: str) -> str:
+        """For HTTP, we always want to keep the full URL"""
+        path = path.replace("sync-http://", "http://").replace(
+            "sync-https://", "https://"
+        )
+        return path
+    @classmethod
+    def _parent(cls, path):
+        # override, since _strip_protocol is different for URLs
+        par = super()._parent(path)
+        if len(par) > 7:  # "http://..."
+            return par
+        return ""
+    def _ls_real(self, url, detail=True, **kwargs):
+        # ignoring URL-encoded arguments
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        logger.debug(url)
+        r = self.session.get(self.encode_url(url), **self.kwargs)
+        self._raise_not_found_for_status(r, url)
+        text = r.text
+        if self.simple_links:
+            links = ex2.findall(text) + [u[2] for u in ex.findall(text)]
+        else:
+            links = [u[2] for u in ex.findall(text)]
+        out = set()
+        parts = urlparse(url)
+        for l in links:
+            if isinstance(l, tuple):
+                l = l[1]
+            if l.startswith("/") and len(l) > 1:
+                # absolute URL on this server
+                l = parts.scheme + "://" + parts.netloc + l
+            if l.startswith("http"):
+                if self.same_schema and l.startswith(url.rstrip("/") + "/"):
+                    out.add(l)
+                elif l.replace("https", "http").startswith(
+                    url.replace("https", "http").rstrip("/") + "/"
+                ):
+                    # allowed to cross http <-> https
+                    out.add(l)
+            else:
+                if l not in ["..", "../"]:
+                    # Ignore FTP-like "parent"
+                    out.add("/".join([url.rstrip("/"), l.lstrip("/")]))
+        if not out and url.endswith("/"):
+            out = self._ls_real(url.rstrip("/"), detail=False)
+        if detail:
+            return [
+                {
+                    "name": u,
+                    "size": None,
+                    "type": "directory" if u.endswith("/") else "file",
+                }
+                for u in out
+            ]
+        else:
+            return sorted(out)
+    def ls(self, url, detail=True, **kwargs):
+        if self.use_listings_cache and url in self.dircache:
+            out = self.dircache[url]
+        else:
+            out = self._ls_real(url, detail=detail, **kwargs)
+            self.dircache[url] = out
+        return out
+    def _raise_not_found_for_status(self, response, url):
+        """
+        Raises FileNotFoundError for 404s, otherwise uses raise_for_status.
+        """
+        if response.status_code == 404:
+            raise FileNotFoundError(url)
+        response.raise_for_status()
+    def cat_file(self, url, start=None, end=None, **kwargs):
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        logger.debug(url)
+        if start is not None or end is not None:
+            if start == end:
+                return b""
+            headers = kw.pop("headers", {}).copy()
+            headers["Range"] = self._process_limits(url, start, end)
+            kw["headers"] = headers
+        r = self.session.get(self.encode_url(url), **kw)
+        self._raise_not_found_for_status(r, url)
+        return r.content
+    def get_file(
+        self, rpath, lpath, chunk_size=5 * 2**20, callback=_DEFAULT_CALLBACK, **kwargs
+    ):
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        logger.debug(rpath)
+        r = self.session.get(self.encode_url(rpath), **kw)
+        try:
+            size = int(
+                r.headers.get("content-length", None)
+                or r.headers.get("Content-Length", None)
+            )
+        except (ValueError, KeyError, TypeError):
+            size = None
+        callback.set_size(size)
+        self._raise_not_found_for_status(r, rpath)
+        if not isfilelike(lpath):
+            lpath = open(lpath, "wb")
+        for chunk in r.iter_content(chunk_size, decode_unicode=False):
+            lpath.write(chunk)
+            callback.relative_update(len(chunk))
+    def put_file(
+        self,
+        lpath,
+        rpath,
+        chunk_size=5 * 2**20,
+        callback=_DEFAULT_CALLBACK,
+        method="post",
+        **kwargs,
+    ):
+        def gen_chunks():
+            # Support passing arbitrary file-like objects
+            # and use them instead of streams.
+            if isinstance(lpath, io.IOBase):
+                context = nullcontext(lpath)
+                use_seek = False  # might not support seeking
+            else:
+                context = open(lpath, "rb")
+                use_seek = True
+            with context as f:
+                if use_seek:
+                    callback.set_size(f.seek(0, 2))
+                    f.seek(0)
+                else:
+                    callback.set_size(getattr(f, "size", None))
+                chunk = f.read(chunk_size)
+                while chunk:
+                    yield chunk
+                    callback.relative_update(len(chunk))
+                    chunk = f.read(chunk_size)
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        method = method.lower()
+        if method not in ("post", "put"):
+            raise ValueError(
+                f"method has to be either 'post' or 'put', not: {method!r}"
+            )
+        meth = getattr(self.session, method)
+        resp = meth(rpath, data=gen_chunks(), **kw)
+        self._raise_not_found_for_status(resp, rpath)
+    def _process_limits(self, url, start, end):
+        """Helper for "Range"-based _cat_file"""
+        size = None
+        suff = False
+        if start is not None and start < 0:
+            # if start is negative and end None, end is the "suffix length"
+            if end is None:
+                end = -start
+                start = ""
+                suff = True
+            else:
+                size = size or self.info(url)["size"]
+                start = size + start
+        elif start is None:
+            start = 0
+        if not suff:
+            if end is not None and end < 0:
+                if start is not None:
+                    size = size or self.info(url)["size"]
+                    end = size + end
+            elif end is None:
+                end = ""
+            if isinstance(end, int):
+                end -= 1  # bytes range is inclusive
+        return f"bytes={start}-{end}"
+    def exists(self, path, strict=False, **kwargs):
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        try:
+            logger.debug(path)
+            r = self.session.get(self.encode_url(path), **kw)
+            if strict:
+                self._raise_not_found_for_status(r, path)
+            return r.status_code < 400
+        except FileNotFoundError:
+            return False
+        except Exception:
+            if strict:
+                raise
+            return False
+    def isfile(self, path, **kwargs):
+        return self.exists(path, **kwargs)
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=None,  # XXX: This differs from the base class.
+        cache_type=None,
+        cache_options=None,
+        size=None,
+        **kwargs,
+    ):
+        """Make a file-like object
+        Parameters
+        ----------
+        path: str
+            Full URL with protocol
+        mode: string
+            must be "rb"
+        block_size: int or None
+            Bytes to download in one request; use instance value if None. If
+            zero, will return a streaming Requests file-like instance.
+        kwargs: key-value
+            Any other parameters, passed to requests calls
+        """
+        if mode != "rb":
+            raise NotImplementedError
+        block_size = block_size if block_size is not None else self.block_size
+        kw = self.kwargs.copy()
+        kw.update(kwargs)
+        size = size or self.info(path, **kwargs)["size"]
+        if block_size and size:
+            return HTTPFile(
+                self,
+                path,
+                session=self.session,
+                block_size=block_size,
+                mode=mode,
+                size=size,
+                cache_type=cache_type or self.cache_type,
+                cache_options=cache_options or self.cache_options,
+                **kw,
+            )
+        else:
+            return HTTPStreamFile(
+                self,
+                path,
+                mode=mode,
+                session=self.session,
+                **kw,
+            )
+    def ukey(self, url):
+        """Unique identifier; assume HTTP files are static, unchanging"""
+        return tokenize(url, self.kwargs, self.protocol)
+    def info(self, url, **kwargs):
+        """Get info of URL
+        Tries to access location via HEAD, and then GET methods, but does
+        not fetch the data.
+        It is possible that the server does not supply any size information, in
+        which case size will be given as None (and certain operations on the
+        corresponding file will not work).
+        """
+        info = {}
+        for policy in ["head", "get"]:
+            try:
+                info.update(
+                    _file_info(
+                        self.encode_url(url),
+                        size_policy=policy,
+                        session=self.session,
+                        **self.kwargs,
+                        **kwargs,
+                    )
+                )
+                if info.get("size") is not None:
+                    break
+            except Exception as exc:
+                if policy == "get":
+                    # If get failed, then raise a FileNotFoundError
+                    raise FileNotFoundError(url) from exc
+                logger.debug(str(exc))
+        return {"name": url, "size": None, **info, "type": "file"}
+    def glob(self, path, maxdepth=None, **kwargs):
+        """
+        Find files by glob-matching.
+        This implementation is idntical to the one in AbstractFileSystem,
+        but "?" is not considered as a character for globbing, because it is
+        so common in URLs, often identifying the "query" part.
+        """
+        import re
+        ends = path.endswith("/")
+        path = self._strip_protocol(path)
+        indstar = path.find("*") if path.find("*") >= 0 else len(path)
+        indbrace = path.find("[") if path.find("[") >= 0 else len(path)
+        ind = min(indstar, indbrace)
+        detail = kwargs.pop("detail", False)
+        if not has_magic(path):
+            root = path
+            depth = 1
+            if ends:
+                path += "/*"
+            elif self.exists(path):
+                if not detail:
+                    return [path]
+                else:
+                    return {path: self.info(path)}
+            else:
+                if not detail:
+                    return []  # glob of non-existent returns empty
+                else:
+                    return {}
+        elif "/" in path[:ind]:
+            ind2 = path[:ind].rindex("/")
+            root = path[: ind2 + 1]
+            depth = None if "**" in path else path[ind2 + 1 :].count("/") + 1
+        else:
+            root = ""
+            depth = None if "**" in path else path[ind + 1 :].count("/") + 1
+        allpaths = self.find(
+            root, maxdepth=maxdepth or depth, withdirs=True, detail=True, **kwargs
+        )
+        # Escape characters special to python regex, leaving our supported
+        # special characters in place.
+        # See https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html
+        # for shell globbing details.
+        pattern = (
+            "^"
+            + (
+                path.replace("\\", r"\\")
+                .replace(".", r"\.")
+                .replace("+", r"\+")
+                .replace("//", "/")
+                .replace("(", r"\(")
+                .replace(")", r"\)")
+                .replace("|", r"\|")
+                .replace("^", r"\^")
+                .replace("$", r"\$")
+                .replace("{", r"\{")
+                .replace("}", r"\}")
+                .rstrip("/")
+            )
+            + "$"
+        )
+        pattern = re.sub("[*]{2}", "=PLACEHOLDER=", pattern)
+        pattern = re.sub("[*]", "[^/]*", pattern)
+        pattern = re.compile(pattern.replace("=PLACEHOLDER=", ".*"))
+        out = {
+            p: allpaths[p]
+            for p in sorted(allpaths)
+            if pattern.match(p.replace("//", "/").rstrip("/"))
+        }
+        if detail:
+            return out
+        else:
+            return list(out)
+    def isdir(self, path):
+        # override, since all URLs are (also) files
+        try:
+            return bool(self.ls(path))
+        except (FileNotFoundError, ValueError):
+            return False
+class HTTPFile(AbstractBufferedFile):
+    """
+    A file-like object pointing to a remove HTTP(S) resource
+    Supports only reading, with read-ahead of a predermined block-size.
+    In the case that the server does not supply the filesize, only reading of
+    the complete file in one go is supported.
+    Parameters
+    ----------
+    url: str
+        Full URL of the remote resource, including the protocol
+    session: requests.Session or None
+        All calls will be made within this session, to avoid restarting
+        connections where the server allows this
+    block_size: int or None
+        The amount of read-ahead to do, in bytes. Default is 5MB, or the value
+        configured for the FileSystem creating this file
+    size: None or int
+        If given, this is the size of the file in bytes, and we don't attempt
+        to call the server to find the value.
+    kwargs: all other key-values are passed to requests calls.
+    """
+    def __init__(
+        self,
+        fs,
+        url,
+        session=None,
+        block_size=None,
+        mode="rb",
+        cache_type="bytes",
+        cache_options=None,
+        size=None,
+        **kwargs,
+    ):
+        if mode != "rb":
+            raise NotImplementedError("File mode not supported")
+        self.url = url
+        self.session = session
+        self.details = {"name": url, "size": size, "type": "file"}
+        super().__init__(
+            fs=fs,
+            path=url,
+            mode=mode,
+            block_size=block_size,
+            cache_type=cache_type,
+            cache_options=cache_options,
+            **kwargs,
+        )
+    def read(self, length=-1):
+        """Read bytes from file
+        Parameters
+        ----------
+        length: int
+            Read up to this many bytes. If negative, read all content to end of
+            file. If the server has not supplied the filesize, attempting to
+            read only part of the data will raise a ValueError.
+        """
+        if (
+            (length < 0 and self.loc == 0)  # explicit read all
+            # but not when the size is known and fits into a block anyways
+            and not (self.size is not None and self.size <= self.blocksize)
+        ):
+            self._fetch_all()
+        if self.size is None:
+            if length < 0:
+                self._fetch_all()
+        else:
+            length = min(self.size - self.loc, length)
+        return super().read(length)
+    def _fetch_all(self):
+        """Read whole file in one shot, without caching
+        This is only called when position is still at zero,
+        and read() is called without a byte-count.
+        """
+        logger.debug(f"Fetch all for {self}")
+        if not isinstance(self.cache, AllBytes):
+            r = self.session.get(self.fs.encode_url(self.url), **self.kwargs)
+            r.raise_for_status()
+            out = r.content
+            self.cache = AllBytes(size=len(out), fetcher=None, blocksize=None, data=out)
+            self.size = len(out)
+    def _parse_content_range(self, headers):
+        """Parse the Content-Range header"""
+        s = headers.get("Content-Range", "")
+        m = re.match(r"bytes (\d+-\d+|\*)/(\d+|\*)", s)
+        if not m:
+            return None, None, None
+        if m[1] == "*":
+            start = end = None
+        else:
+            start, end = [int(x) for x in m[1].split("-")]
+        total = None if m[2] == "*" else int(m[2])
+        return start, end, total
+    def _fetch_range(self, start, end):
+        """Download a block of data
+        The expectation is that the server returns only the requested bytes,
+        with HTTP code 206. If this is not the case, we first check the headers,
+        and then stream the output - if the data size is bigger than we
+        requested, an exception is raised.
+        """
+        logger.debug(f"Fetch range for {self}: {start}-{end}")
+        kwargs = self.kwargs.copy()
+        headers = kwargs.pop("headers", {}).copy()
+        headers["Range"] = f"bytes={start}-{end - 1}"
+        logger.debug("%s : %s", self.url, headers["Range"])
+        r = self.session.get(self.fs.encode_url(self.url), headers=headers, **kwargs)
+        if r.status_code == 416:
+            # range request outside file
+            return b""
+        r.raise_for_status()
+        # If the server has handled the range request, it should reply
+        # with status 206 (partial content). But we'll guess that a suitable
+        # Content-Range header or a Content-Length no more than the
+        # requested range also mean we have got the desired range.
+        cl = r.headers.get("Content-Length", r.headers.get("content-length", end + 1))
+        response_is_range = (
+            r.status_code == 206
+            or self._parse_content_range(r.headers)[0] == start
+            or int(cl) <= end - start
+        )
+        if response_is_range:
+            # partial content, as expected
+            out = r.content
+        elif start > 0:
+            raise ValueError(
+                "The HTTP server doesn't appear to support range requests. "
+                "Only reading this file from the beginning is supported. "
+                "Open with block_size=0 for a streaming file interface."
+            )
+        else:
+            # Response is not a range, but we want the start of the file,
+            # so we can read the required amount anyway.
+            cl = 0
+            out = []
+            for chunk in r.iter_content(2**20, False):
+                out.append(chunk)
+                cl += len(chunk)
+            out = b"".join(out)[: end - start]
+        return out
+magic_check = re.compile("([*[])")
+def has_magic(s):
+    match = magic_check.search(s)
+    return match is not None
+class HTTPStreamFile(AbstractBufferedFile):
+    def __init__(self, fs, url, mode="rb", session=None, **kwargs):
+        self.url = url
+        self.session = session
+        if mode != "rb":
+            raise ValueError
+        self.details = {"name": url, "size": None}
+        super().__init__(fs=fs, path=url, mode=mode, cache_type="readahead", **kwargs)
+        r = self.session.get(self.fs.encode_url(url), stream=True, **kwargs)
+        self.fs._raise_not_found_for_status(r, url)
+        self.it = r.iter_content(1024, False)
+        self.leftover = b""
+        self.r = r
+    def seek(self, *args, **kwargs):
+        raise ValueError("Cannot seek streaming HTTP file")
+    def read(self, num=-1):
+        bufs = [self.leftover]
+        leng = len(self.leftover)
+        while leng < num or num < 0:
+            try:
+                out = self.it.__next__()
+            except StopIteration:
+                break
+            if out:
+                bufs.append(out)
+            else:
+                break
+            leng += len(out)
+        out = b"".join(bufs)
+        if num >= 0:
+            self.leftover = out[num:]
+            out = out[:num]
+        else:
+            self.leftover = b""
+        self.loc += len(out)
+        return out
+    def close(self):
+        self.r.close()
+        self.closed = True
+def get_range(session, url, start, end, **kwargs):
+    # explicit get a range when we know it must be safe
+    kwargs = kwargs.copy()
+    headers = kwargs.pop("headers", {}).copy()
+    headers["Range"] = f"bytes={start}-{end - 1}"
+    r = session.get(url, headers=headers, **kwargs)
+    r.raise_for_status()
+    return r.content
+def _file_info(url, session, size_policy="head", **kwargs):
+    """Call HEAD on the server to get details about the file (size/checksum etc.)
+    Default operation is to explicitly allow redirects and use encoding
+    'identity' (no compression) to get the true size of the target.
+    """
+    logger.debug("Retrieve file size for %s", url)
+    kwargs = kwargs.copy()
+    ar = kwargs.pop("allow_redirects", True)
+    head = kwargs.get("headers", {}).copy()
+    # TODO: not allowed in JS
+    # head["Accept-Encoding"] = "identity"
+    kwargs["headers"] = head
+    info = {}
+    if size_policy == "head":
+        r = session.head(url, allow_redirects=ar, **kwargs)
+    elif size_policy == "get":
+        r = session.get(url, allow_redirects=ar, **kwargs)
+    else:
+        raise TypeError(f'size_policy must be "head" or "get", got {size_policy}')
+    r.raise_for_status()
+    # TODO:
+    #  recognise lack of 'Accept-Ranges',
+    #                 or 'Accept-Ranges': 'none' (not 'bytes')
+    #  to mean streaming only, no random access => return None
+    if "Content-Length" in r.headers:
+        info["size"] = int(r.headers["Content-Length"])
+    elif "Content-Range" in r.headers:
+        info["size"] = int(r.headers["Content-Range"].split("/")[1])
+    elif "content-length" in r.headers:
+        info["size"] = int(r.headers["content-length"])
+    elif "content-range" in r.headers:
+        info["size"] = int(r.headers["content-range"].split("/")[1])
+    for checksum_field in ["ETag", "Content-MD5", "Digest"]:
+        if r.headers.get(checksum_field):
+            info[checksum_field] = r.headers[checksum_field]
+    return info
+# importing this is enough to register it
+def register():
+    register_implementation("http", HTTPFileSystem, clobber=True)
+    register_implementation("https", HTTPFileSystem, clobber=True)
+    register_implementation("sync-http", HTTPFileSystem, clobber=True)
+    register_implementation("sync-https", HTTPFileSystem, clobber=True)
+register()
+def unregister():
+    from fsspec.implementations.http import HTTPFileSystem
+    register_implementation("http", HTTPFileSystem, clobber=True)
+    register_implementation("https", HTTPFileSystem, clobber=True)

venv/lib/python3.10/site-packages/fsspec/implementations/libarchive.py ADDED Viewed

	@@ -0,0 +1,213 @@

+from contextlib import contextmanager
+from ctypes import (
+    CFUNCTYPE,
+    POINTER,
+    c_int,
+    c_longlong,
+    c_void_p,
+    cast,
+    create_string_buffer,
+)
+import libarchive
+import libarchive.ffi as ffi
+from fsspec import open_files
+from fsspec.archive import AbstractArchiveFileSystem
+from fsspec.implementations.memory import MemoryFile
+from fsspec.utils import DEFAULT_BLOCK_SIZE
+# Libarchive requires seekable files or memory only for certain archive
+# types. However, since we read the directory first to cache the contents
+# and also allow random access to any file, the file-like object needs
+# to be seekable no matter what.
+# Seek call-backs (not provided in the libarchive python wrapper)
+SEEK_CALLBACK = CFUNCTYPE(c_longlong, c_int, c_void_p, c_longlong, c_int)
+read_set_seek_callback = ffi.ffi(
+    "read_set_seek_callback", [ffi.c_archive_p, SEEK_CALLBACK], c_int, ffi.check_int
+)
+new_api = hasattr(ffi, "NO_OPEN_CB")
+@contextmanager
+def custom_reader(file, format_name="all", filter_name="all", block_size=ffi.page_size):
+    """Read an archive from a seekable file-like object.
+    The `file` object must support the standard `readinto` and 'seek' methods.
+    """
+    buf = create_string_buffer(block_size)
+    buf_p = cast(buf, c_void_p)
+    def read_func(archive_p, context, ptrptr):
+        # readinto the buffer, returns number of bytes read
+        length = file.readinto(buf)
+        # write the address of the buffer into the pointer
+        ptrptr = cast(ptrptr, POINTER(c_void_p))
+        ptrptr[0] = buf_p
+        # tell libarchive how much data was written into the buffer
+        return length
+    def seek_func(archive_p, context, offset, whence):
+        file.seek(offset, whence)
+        # tell libarchvie the current position
+        return file.tell()
+    read_cb = ffi.READ_CALLBACK(read_func)
+    seek_cb = SEEK_CALLBACK(seek_func)
+    if new_api:
+        open_cb = ffi.NO_OPEN_CB
+        close_cb = ffi.NO_CLOSE_CB
+    else:
+        open_cb = libarchive.read.OPEN_CALLBACK(ffi.VOID_CB)
+        close_cb = libarchive.read.CLOSE_CALLBACK(ffi.VOID_CB)
+    with libarchive.read.new_archive_read(format_name, filter_name) as archive_p:
+        read_set_seek_callback(archive_p, seek_cb)
+        ffi.read_open(archive_p, None, open_cb, read_cb, close_cb)
+        yield libarchive.read.ArchiveRead(archive_p)
+class LibArchiveFileSystem(AbstractArchiveFileSystem):
+    """Compressed archives as a file-system (read-only)
+    Supports the following formats:
+    tar, pax , cpio, ISO9660, zip, mtree, shar, ar, raw, xar, lha/lzh, rar
+    Microsoft CAB, 7-Zip, WARC
+    See the libarchive documentation for further restrictions.
+    https://www.libarchive.org/
+    Keeps file object open while instance lives. It only works in seekable
+    file-like objects. In case the filesystem does not support this kind of
+    file object, it is recommended to cache locally.
+    This class is pickleable, but not necessarily thread-safe (depends on the
+    platform). See libarchive documentation for details.
+    """
+    root_marker = ""
+    protocol = "libarchive"
+    cachable = False
+    def __init__(
+        self,
+        fo="",
+        mode="r",
+        target_protocol=None,
+        target_options=None,
+        block_size=DEFAULT_BLOCK_SIZE,
+        **kwargs,
+    ):
+        """
+        Parameters
+        ----------
+        fo: str or file-like
+            Contains ZIP, and must exist. If a str, will fetch file using
+            :meth:`~fsspec.open_files`, which must return one file exactly.
+        mode: str
+            Currently, only 'r' accepted
+        target_protocol: str (optional)
+            If ``fo`` is a string, this value can be used to override the
+            FS protocol inferred from a URL
+        target_options: dict (optional)
+            Kwargs passed when instantiating the target FS, if ``fo`` is
+            a string.
+        """
+        super().__init__(self, **kwargs)
+        if mode != "r":
+            raise ValueError("Only read from archive files accepted")
+        if isinstance(fo, str):
+            files = open_files(fo, protocol=target_protocol, **(target_options or {}))
+            if len(files) != 1:
+                raise ValueError(
+                    f'Path "{fo}" did not resolve to exactly one file: "{files}"'
+                )
+            fo = files[0]
+        self.of = fo
+        self.fo = fo.__enter__()  # the whole instance is a context
+        self.block_size = block_size
+        self.dir_cache = None
+    @contextmanager
+    def _open_archive(self):
+        self.fo.seek(0)
+        with custom_reader(self.fo, block_size=self.block_size) as arc:
+            yield arc
+    @classmethod
+    def _strip_protocol(cls, path):
+        # file paths are always relative to the archive root
+        return super()._strip_protocol(path).lstrip("/")
+    def _get_dirs(self):
+        fields = {
+            "name": "pathname",
+            "size": "size",
+            "created": "ctime",
+            "mode": "mode",
+            "uid": "uid",
+            "gid": "gid",
+            "mtime": "mtime",
+        }
+        if self.dir_cache is not None:
+            return
+        self.dir_cache = {}
+        list_names = []
+        with self._open_archive() as arc:
+            for entry in arc:
+                if not entry.isdir and not entry.isfile:
+                    # Skip symbolic links, fifo entries, etc.
+                    continue
+                self.dir_cache.update(
+                    {
+                        dirname: {"name": dirname, "size": 0, "type": "directory"}
+                        for dirname in self._all_dirnames(set(entry.name))
+                    }
+                )
+                f = {key: getattr(entry, fields[key]) for key in fields}
+                f["type"] = "directory" if entry.isdir else "file"
+                list_names.append(entry.name)
+                self.dir_cache[f["name"]] = f
+        # libarchive does not seem to return an entry for the directories (at least
+        # not in all formats), so get the directories names from the files names
+        self.dir_cache.update(
+            {
+                dirname: {"name": dirname, "size": 0, "type": "directory"}
+                for dirname in self._all_dirnames(list_names)
+            }
+        )
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=True,
+        cache_options=None,
+        **kwargs,
+    ):
+        path = self._strip_protocol(path)
+        if mode != "rb":
+            raise NotImplementedError
+        data = b""
+        with self._open_archive() as arc:
+            for entry in arc:
+                if entry.pathname != path:
+                    continue
+                if entry.size == 0:
+                    # empty file, so there are no blocks
+                    break
+                for block in entry.get_blocks(entry.size):
+                    data = block
+                    break
+                else:
+                    raise ValueError
+        return MemoryFile(fs=self, path=path, data=data)

venv/lib/python3.10/site-packages/fsspec/parquet.py ADDED Viewed

	@@ -0,0 +1,572 @@

+import io
+import json
+import warnings
+import fsspec
+from .core import url_to_fs
+from .spec import AbstractBufferedFile
+from .utils import merge_offset_ranges
+# Parquet-Specific Utilities for fsspec
+#
+# Most of the functions defined in this module are NOT
+# intended for public consumption. The only exception
+# to this is `open_parquet_file`, which should be used
+# place of `fs.open()` to open parquet-formatted files
+# on remote file systems.
+class AlreadyBufferedFile(AbstractBufferedFile):
+    def _fetch_range(self, start, end):
+        raise NotImplementedError
+def open_parquet_files(
+    path: list[str],
+    fs: None | fsspec.AbstractFileSystem = None,
+    metadata=None,
+    columns: None | list[str] = None,
+    row_groups: None | list[int] = None,
+    storage_options: None | dict = None,
+    engine: str = "auto",
+    max_gap: int = 64_000,
+    max_block: int = 256_000_000,
+    footer_sample_size: int = 1_000_000,
+    filters: None | list[list[list[str]]] = None,
+    **kwargs,
+):
+    """
+    Return a file-like object for a single Parquet file.
+    The specified parquet `engine` will be used to parse the
+    footer metadata, and determine the required byte ranges
+    from the file. The target path will then be opened with
+    the "parts" (`KnownPartsOfAFile`) caching strategy.
+    Note that this method is intended for usage with remote
+    file systems, and is unlikely to improve parquet-read
+    performance on local file systems.
+    Parameters
+    ----------
+    path: str
+        Target file path.
+    metadata: Any, optional
+        Parquet metadata object. Object type must be supported
+        by the backend parquet engine. For now, only the "fastparquet"
+        engine supports an explicit `ParquetFile` metadata object.
+        If a metadata object is supplied, the remote footer metadata
+        will not need to be transferred into local memory.
+    fs: AbstractFileSystem, optional
+        Filesystem object to use for opening the file. If nothing is
+        specified, an `AbstractFileSystem` object will be inferred.
+    engine : str, default "auto"
+        Parquet engine to use for metadata parsing. Allowed options
+        include "fastparquet", "pyarrow", and "auto". The specified
+        engine must be installed in the current environment. If
+        "auto" is specified, and both engines are installed,
+        "fastparquet" will take precedence over "pyarrow".
+    columns: list, optional
+        List of all column names that may be read from the file.
+    row_groups : list, optional
+        List of all row-groups that may be read from the file. This
+        may be a list of row-group indices (integers), or it may be
+        a list of `RowGroup` metadata objects (if the "fastparquet"
+        engine is used).
+    storage_options : dict, optional
+        Used to generate an `AbstractFileSystem` object if `fs` was
+        not specified.
+    max_gap : int, optional
+        Neighboring byte ranges will only be merged when their
+        inter-range gap is <= `max_gap`. Default is 64KB.
+    max_block : int, optional
+        Neighboring byte ranges will only be merged when the size of
+        the aggregated range is <= `max_block`. Default is 256MB.
+    footer_sample_size : int, optional
+        Number of bytes to read from the end of the path to look
+        for the footer metadata. If the sampled bytes do not contain
+        the footer, a second read request will be required, and
+        performance will suffer. Default is 1MB.
+    filters : list[list], optional
+        List of filters to apply to prevent reading row groups, of the
+        same format as accepted by the loading engines. Ignored if
+        ``row_groups`` is specified.
+    **kwargs :
+        Optional key-word arguments to pass to `fs.open`
+    """
+    # Make sure we have an `AbstractFileSystem` object
+    # to work with
+    if fs is None:
+        path0 = path
+        if isinstance(path, (list, tuple)):
+            path = path[0]
+        fs, path = url_to_fs(path, **(storage_options or {}))
+    else:
+        path0 = path
+    # For now, `columns == []` not supported, is the same
+    # as all columns
+    if columns is not None and len(columns) == 0:
+        columns = None
+    # Set the engine
+    engine = _set_engine(engine)
+    if isinstance(path0, (list, tuple)):
+        paths = path0
+    elif "*" in path:
+        paths = fs.glob(path)
+    elif path0.endswith("/"):  # or fs.isdir(path):
+        paths = [
+            _
+            for _ in fs.find(path, withdirs=False, detail=False)
+            if _.endswith((".parquet", ".parq"))
+        ]
+    else:
+        paths = [path]
+    data = _get_parquet_byte_ranges(
+        paths,
+        fs,
+        metadata=metadata,
+        columns=columns,
+        row_groups=row_groups,
+        engine=engine,
+        max_gap=max_gap,
+        max_block=max_block,
+        footer_sample_size=footer_sample_size,
+        filters=filters,
+    )
+    # Call self.open with "parts" caching
+    options = kwargs.pop("cache_options", {}).copy()
+    return [
+        AlreadyBufferedFile(
+            fs=None,
+            path=fn,
+            mode="rb",
+            cache_type="parts",
+            cache_options={
+                **options,
+                "data": ranges,
+            },
+            size=max(_[1] for _ in ranges),
+            **kwargs,
+        )
+        for fn, ranges in data.items()
+    ]
+def open_parquet_file(*args, **kwargs):
+    """Create files tailed to reading specific parts of parquet files
+    Please see ``open_parquet_files`` for details of the arguments. The
+    difference is, this function always returns a single ``AleadyBufferedFile``,
+    whereas `open_parquet_files`` always returns a list of files, even if
+    there are one or zero matching parquet files.
+    """
+    return open_parquet_files(*args, **kwargs)[0]
+def _get_parquet_byte_ranges(
+    paths,
+    fs,
+    metadata=None,
+    columns=None,
+    row_groups=None,
+    max_gap=64_000,
+    max_block=256_000_000,
+    footer_sample_size=1_000_000,
+    engine="auto",
+    filters=None,
+):
+    """Get a dictionary of the known byte ranges needed
+    to read a specific column/row-group selection from a
+    Parquet dataset. Each value in the output dictionary
+    is intended for use as the `data` argument for the
+    `KnownPartsOfAFile` caching strategy of a single path.
+    """
+    # Set engine if necessary
+    if isinstance(engine, str):
+        engine = _set_engine(engine)
+    # Pass to a specialized function if metadata is defined
+    if metadata is not None:
+        # Use the provided parquet metadata object
+        # to avoid transferring/parsing footer metadata
+        return _get_parquet_byte_ranges_from_metadata(
+            metadata,
+            fs,
+            engine,
+            columns=columns,
+            row_groups=row_groups,
+            max_gap=max_gap,
+            max_block=max_block,
+            filters=filters,
+        )
+    # Populate global paths, starts, & ends
+    if columns is None and row_groups is None and filters is None:
+        # We are NOT selecting specific columns or row-groups.
+        #
+        # We can avoid sampling the footers, and just transfer
+        # all file data with cat_ranges
+        result = {path: {(0, len(data)): data} for path, data in fs.cat(paths).items()}
+    else:
+        # We ARE selecting specific columns or row-groups.
+        #
+        # Get file sizes asynchronously
+        file_sizes = fs.sizes(paths)
+        data_paths = []
+        data_starts = []
+        data_ends = []
+        # Gather file footers.
+        # We just take the last `footer_sample_size` bytes of each
+        # file (or the entire file if it is smaller than that)
+        footer_starts = [
+            max(0, file_size - footer_sample_size) for file_size in file_sizes
+        ]
+        footer_samples = fs.cat_ranges(paths, footer_starts, file_sizes)
+        # Check our footer samples and re-sample if necessary.
+        large_footer = []
+        for i, path in enumerate(paths):
+            footer_size = int.from_bytes(footer_samples[i][-8:-4], "little")
+            real_footer_start = file_sizes[i] - (footer_size + 8)
+            if real_footer_start < footer_starts[i]:
+                large_footer.append((i, real_footer_start))
+        if large_footer:
+            warnings.warn(
+                f"Not enough data was used to sample the parquet footer. "
+                f"Try setting footer_sample_size >= {large_footer}."
+            )
+            path0 = [paths[i] for i, _ in large_footer]
+            starts = [_[1] for _ in large_footer]
+            ends = [file_sizes[i] - footer_sample_size for i, _ in large_footer]
+            data = fs.cat_ranges(path0, starts, ends)
+            for i, (path, start, block) in enumerate(zip(path0, starts, data)):
+                footer_samples[i] = block + footer_samples[i]
+                footer_starts[i] = start
+        result = {
+            path: {(start, size): data}
+            for path, start, size, data in zip(
+                paths, footer_starts, file_sizes, footer_samples
+            )
+        }
+        # Calculate required byte ranges for each path
+        for i, path in enumerate(paths):
+            # Use "engine" to collect data byte ranges
+            path_data_starts, path_data_ends = engine._parquet_byte_ranges(
+                columns,
+                row_groups=row_groups,
+                footer=footer_samples[i],
+                footer_start=footer_starts[i],
+                filters=filters,
+            )
+            data_paths += [path] * len(path_data_starts)
+            data_starts += path_data_starts
+            data_ends += path_data_ends
+        # Merge adjacent offset ranges
+        data_paths, data_starts, data_ends = merge_offset_ranges(
+            data_paths,
+            data_starts,
+            data_ends,
+            max_gap=max_gap,
+            max_block=max_block,
+            sort=True,
+        )
+        # Transfer the data byte-ranges into local memory
+        _transfer_ranges(fs, result, data_paths, data_starts, data_ends)
+    # Add b"PAR1" to headers
+    _add_header_magic(result)
+    return result
+def _get_parquet_byte_ranges_from_metadata(
+    metadata,
+    fs,
+    engine,
+    columns=None,
+    row_groups=None,
+    max_gap=64_000,
+    max_block=256_000_000,
+    filters=None,
+):
+    """Simplified version of `_get_parquet_byte_ranges` for
+    the case that an engine-specific `metadata` object is
+    provided, and the remote footer metadata does not need to
+    be transferred before calculating the required byte ranges.
+    """
+    # Use "engine" to collect data byte ranges
+    data_paths, data_starts, data_ends = engine._parquet_byte_ranges(
+        columns, row_groups=row_groups, metadata=metadata, filters=filters
+    )
+    # Merge adjacent offset ranges
+    data_paths, data_starts, data_ends = merge_offset_ranges(
+        data_paths,
+        data_starts,
+        data_ends,
+        max_gap=max_gap,
+        max_block=max_block,
+        sort=False,  # Should be sorted
+    )
+    # Transfer the data byte-ranges into local memory
+    result = {fn: {} for fn in list(set(data_paths))}
+    _transfer_ranges(fs, result, data_paths, data_starts, data_ends)
+    # Add b"PAR1" to header
+    _add_header_magic(result)
+    return result
+def _transfer_ranges(fs, blocks, paths, starts, ends):
+    # Use cat_ranges to gather the data byte_ranges
+    ranges = (paths, starts, ends)
+    for path, start, stop, data in zip(*ranges, fs.cat_ranges(*ranges)):
+        blocks[path][(start, stop)] = data
+def _add_header_magic(data):
+    # Add b"PAR1" to file headers
+    for path in list(data):
+        add_magic = True
+        for k in data[path]:
+            if k[0] == 0 and k[1] >= 4:
+                add_magic = False
+                break
+        if add_magic:
+            data[path][(0, 4)] = b"PAR1"
+def _set_engine(engine_str):
+    # Define a list of parquet engines to try
+    if engine_str == "auto":
+        try_engines = ("fastparquet", "pyarrow")
+    elif not isinstance(engine_str, str):
+        raise ValueError(
+            "Failed to set parquet engine! "
+            "Please pass 'fastparquet', 'pyarrow', or 'auto'"
+        )
+    elif engine_str not in ("fastparquet", "pyarrow"):
+        raise ValueError(f"{engine_str} engine not supported by `fsspec.parquet`")
+    else:
+        try_engines = [engine_str]
+    # Try importing the engines in `try_engines`,
+    # and choose the first one that succeeds
+    for engine in try_engines:
+        try:
+            if engine == "fastparquet":
+                return FastparquetEngine()
+            elif engine == "pyarrow":
+                return PyarrowEngine()
+        except ImportError:
+            pass
+    # Raise an error if a supported parquet engine
+    # was not found
+    raise ImportError(
+        f"The following parquet engines are not installed "
+        f"in your python environment: {try_engines}."
+        f"Please install 'fastparquert' or 'pyarrow' to "
+        f"utilize the `fsspec.parquet` module."
+    )
+class FastparquetEngine:
+    # The purpose of the FastparquetEngine class is
+    # to check if fastparquet can be imported (on initialization)
+    # and to define a `_parquet_byte_ranges` method. In the
+    # future, this class may also be used to define other
+    # methods/logic that are specific to fastparquet.
+    def __init__(self):
+        import fastparquet as fp
+        self.fp = fp
+    def _parquet_byte_ranges(
+        self,
+        columns,
+        row_groups=None,
+        metadata=None,
+        footer=None,
+        footer_start=None,
+        filters=None,
+    ):
+        # Initialize offset ranges and define ParqetFile metadata
+        pf = metadata
+        data_paths, data_starts, data_ends = [], [], []
+        if filters and row_groups:
+            raise ValueError("filters and row_groups cannot be used together")
+        if pf is None:
+            pf = self.fp.ParquetFile(io.BytesIO(footer))
+        # Convert columns to a set and add any index columns
+        # specified in the pandas metadata (just in case)
+        column_set = None if columns is None else {c.split(".", 1)[0] for c in columns}
+        if column_set is not None and hasattr(pf, "pandas_metadata"):
+            md_index = [
+                ind
+                for ind in pf.pandas_metadata.get("index_columns", [])
+                # Ignore RangeIndex information
+                if not isinstance(ind, dict)
+            ]
+            column_set |= set(md_index)
+        # Check if row_groups is a list of integers
+        # or a list of row-group metadata
+        if filters:
+            from fastparquet.api import filter_row_groups
+            row_group_indices = None
+            row_groups = filter_row_groups(pf, filters)
+        elif row_groups and not isinstance(row_groups[0], int):
+            # Input row_groups contains row-group metadata
+            row_group_indices = None
+        else:
+            # Input row_groups contains row-group indices
+            row_group_indices = row_groups
+            row_groups = pf.row_groups
+        if column_set is not None:
+            column_set = [
+                _ if isinstance(_, list) else _.split(".") for _ in column_set
+            ]
+        # Loop through column chunks to add required byte ranges
+        for r, row_group in enumerate(row_groups):
+            # Skip this row-group if we are targeting
+            # specific row-groups
+            if row_group_indices is None or r in row_group_indices:
+                # Find the target parquet-file path for `row_group`
+                fn = pf.row_group_filename(row_group)
+                for column in row_group.columns:
+                    name = column.meta_data.path_in_schema
+                    # Skip this column if we are targeting specific columns
+                    if column_set is None or _cmp(name, column_set):
+                        file_offset0 = column.meta_data.dictionary_page_offset
+                        if file_offset0 is None:
+                            file_offset0 = column.meta_data.data_page_offset
+                        num_bytes = column.meta_data.total_compressed_size
+                        if footer_start is None or file_offset0 < footer_start:
+                            data_paths.append(fn)
+                            data_starts.append(file_offset0)
+                            data_ends.append(
+                                min(
+                                    file_offset0 + num_bytes,
+                                    footer_start or (file_offset0 + num_bytes),
+                                )
+                            )
+        if metadata:
+            # The metadata in this call may map to multiple
+            # file paths. Need to include `data_paths`
+            return data_paths, data_starts, data_ends
+        return data_starts, data_ends
+class PyarrowEngine:
+    # The purpose of the PyarrowEngine class is
+    # to check if pyarrow can be imported (on initialization)
+    # and to define a `_parquet_byte_ranges` method. In the
+    # future, this class may also be used to define other
+    # methods/logic that are specific to pyarrow.
+    def __init__(self):
+        import pyarrow.parquet as pq
+        self.pq = pq
+    def _parquet_byte_ranges(
+        self,
+        columns,
+        row_groups=None,
+        metadata=None,
+        footer=None,
+        footer_start=None,
+        filters=None,
+    ):
+        if metadata is not None:
+            raise ValueError("metadata input not supported for PyarrowEngine")
+        if filters:
+            # there must be a way!
+            raise NotImplementedError
+        data_starts, data_ends = [], []
+        md = self.pq.ParquetFile(io.BytesIO(footer)).metadata
+        # Convert columns to a set and add any index columns
+        # specified in the pandas metadata (just in case)
+        column_set = None if columns is None else set(columns)
+        if column_set is not None:
+            schema = md.schema.to_arrow_schema()
+            has_pandas_metadata = (
+                schema.metadata is not None and b"pandas" in schema.metadata
+            )
+            if has_pandas_metadata:
+                md_index = [
+                    ind
+                    for ind in json.loads(
+                        schema.metadata[b"pandas"].decode("utf8")
+                    ).get("index_columns", [])
+                    # Ignore RangeIndex information
+                    if not isinstance(ind, dict)
+                ]
+                column_set |= set(md_index)
+        if column_set is not None:
+            column_set = [
+                _[:1] if isinstance(_, list) else _.split(".")[:1] for _ in column_set
+            ]
+        # Loop through column chunks to add required byte ranges
+        for r in range(md.num_row_groups):
+            # Skip this row-group if we are targeting
+            # specific row-groups
+            if row_groups is None or r in row_groups:
+                row_group = md.row_group(r)
+                for c in range(row_group.num_columns):
+                    column = row_group.column(c)
+                    name = column.path_in_schema.split(".")
+                    # Skip this column if we are targeting specific columns
+                    if column_set is None or _cmp(name, column_set):
+                        meta = column.to_dict()
+                        # Any offset could be the first one
+                        file_offset0 = min(
+                            _
+                            for _ in [
+                                meta.get("dictionary_page_offset"),
+                                meta.get("data_page_offset"),
+                                meta.get("index_page_offset"),
+                            ]
+                            if _ is not None
+                        )
+                        if file_offset0 < footer_start:
+                            data_starts.append(file_offset0)
+                            data_ends.append(
+                                min(
+                                    meta["total_compressed_size"] + file_offset0,
+                                    footer_start,
+                                )
+                            )
+        data_starts.append(footer_start)
+        data_ends.append(footer_start + len(footer))
+        return data_starts, data_ends
+def _cmp(name, column_set):
+    return any(all(a == b for a, b in zip(name, _)) for _ in column_set)

venv/lib/python3.10/site-packages/fsspec/registry.py ADDED Viewed

	@@ -0,0 +1,333 @@

+from __future__ import annotations
+import importlib
+import types
+import warnings
+__all__ = ["registry", "get_filesystem_class", "default"]
+# internal, mutable
+_registry: dict[str, type] = {}
+# external, immutable
+registry = types.MappingProxyType(_registry)
+default = "file"
+def register_implementation(name, cls, clobber=False, errtxt=None):
+    """Add implementation class to the registry
+    Parameters
+    ----------
+    name: str
+        Protocol name to associate with the class
+    cls: class or str
+        if a class: fsspec-compliant implementation class (normally inherits from
+        ``fsspec.AbstractFileSystem``, gets added straight to the registry. If a
+        str, the full path to an implementation class like package.module.class,
+        which gets added to known_implementations,
+        so the import is deferred until the filesystem is actually used.
+    clobber: bool (optional)
+        Whether to overwrite a protocol with the same name; if False, will raise
+        instead.
+    errtxt: str (optional)
+        If given, then a failure to import the given class will result in this
+        text being given.
+    """
+    if isinstance(cls, str):
+        if name in known_implementations and clobber is False:
+            if cls != known_implementations[name]["class"]:
+                raise ValueError(
+                    f"Name ({name}) already in the known_implementations and clobber "
+                    f"is False"
+                )
+        else:
+            known_implementations[name] = {
+                "class": cls,
+                "err": errtxt or f"{cls} import failed for protocol {name}",
+            }
+    else:
+        if name in registry and clobber is False:
+            if _registry[name] is not cls:
+                raise ValueError(
+                    f"Name ({name}) already in the registry and clobber is False"
+                )
+        else:
+            _registry[name] = cls
+# protocols mapped to the class which implements them. This dict can be
+# updated with register_implementation
+known_implementations = {
+    "abfs": {
+        "class": "adlfs.AzureBlobFileSystem",
+        "err": "Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage",
+    },
+    "adl": {
+        "class": "adlfs.AzureDatalakeFileSystem",
+        "err": "Install adlfs to access Azure Datalake Gen1",
+    },
+    "arrow_hdfs": {
+        "class": "fsspec.implementations.arrow.HadoopFileSystem",
+        "err": "pyarrow and local java libraries required for HDFS",
+    },
+    "async_wrapper": {
+        "class": "fsspec.implementations.asyn_wrapper.AsyncFileSystemWrapper",
+    },
+    "asynclocal": {
+        "class": "morefs.asyn_local.AsyncLocalFileSystem",
+        "err": "Install 'morefs[asynclocalfs]' to use AsyncLocalFileSystem",
+    },
+    "asyncwrapper": {
+        "class": "fsspec.implementations.asyn_wrapper.AsyncFileSystemWrapper",
+    },
+    "az": {
+        "class": "adlfs.AzureBlobFileSystem",
+        "err": "Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage",
+    },
+    "blockcache": {"class": "fsspec.implementations.cached.CachingFileSystem"},
+    "box": {
+        "class": "boxfs.BoxFileSystem",
+        "err": "Please install boxfs to access BoxFileSystem",
+    },
+    "cached": {"class": "fsspec.implementations.cached.CachingFileSystem"},
+    "dask": {
+        "class": "fsspec.implementations.dask.DaskWorkerFileSystem",
+        "err": "Install dask distributed to access worker file system",
+    },
+    "data": {"class": "fsspec.implementations.data.DataFileSystem"},
+    "dbfs": {
+        "class": "fsspec.implementations.dbfs.DatabricksFileSystem",
+        "err": "Install the requests package to use the DatabricksFileSystem",
+    },
+    "dir": {"class": "fsspec.implementations.dirfs.DirFileSystem"},
+    "dropbox": {
+        "class": "dropboxdrivefs.DropboxDriveFileSystem",
+        "err": (
+            'DropboxFileSystem requires "dropboxdrivefs","requests" and "'
+            '"dropbox" to be installed'
+        ),
+    },
+    "dvc": {
+        "class": "dvc.api.DVCFileSystem",
+        "err": "Install dvc to access DVCFileSystem",
+    },
+    "file": {"class": "fsspec.implementations.local.LocalFileSystem"},
+    "filecache": {"class": "fsspec.implementations.cached.WholeFileCacheFileSystem"},
+    "ftp": {"class": "fsspec.implementations.ftp.FTPFileSystem"},
+    "gcs": {
+        "class": "gcsfs.GCSFileSystem",
+        "err": "Please install gcsfs to access Google Storage",
+    },
+    "gdrive": {
+        "class": "gdrive_fsspec.GoogleDriveFileSystem",
+        "err": "Please install gdrive_fs for access to Google Drive",
+    },
+    "generic": {"class": "fsspec.generic.GenericFileSystem"},
+    "gist": {
+        "class": "fsspec.implementations.gist.GistFileSystem",
+        "err": "Install the requests package to use the gist FS",
+    },
+    "git": {
+        "class": "fsspec.implementations.git.GitFileSystem",
+        "err": "Install pygit2 to browse local git repos",
+    },
+    "github": {
+        "class": "fsspec.implementations.github.GithubFileSystem",
+        "err": "Install the requests package to use the github FS",
+    },
+    "gs": {
+        "class": "gcsfs.GCSFileSystem",
+        "err": "Please install gcsfs to access Google Storage",
+    },
+    "hdfs": {
+        "class": "fsspec.implementations.arrow.HadoopFileSystem",
+        "err": "pyarrow and local java libraries required for HDFS",
+    },
+    "hf": {
+        "class": "huggingface_hub.HfFileSystem",
+        "err": "Install huggingface_hub to access HfFileSystem",
+    },
+    "http": {
+        "class": "fsspec.implementations.http.HTTPFileSystem",
+        "err": 'HTTPFileSystem requires "requests" and "aiohttp" to be installed',
+    },
+    "https": {
+        "class": "fsspec.implementations.http.HTTPFileSystem",
+        "err": 'HTTPFileSystem requires "requests" and "aiohttp" to be installed',
+    },
+    "jlab": {
+        "class": "fsspec.implementations.jupyter.JupyterFileSystem",
+        "err": "Jupyter FS requires requests to be installed",
+    },
+    "jupyter": {
+        "class": "fsspec.implementations.jupyter.JupyterFileSystem",
+        "err": "Jupyter FS requires requests to be installed",
+    },
+    "lakefs": {
+        "class": "lakefs_spec.LakeFSFileSystem",
+        "err": "Please install lakefs-spec to access LakeFSFileSystem",
+    },
+    "libarchive": {
+        "class": "fsspec.implementations.libarchive.LibArchiveFileSystem",
+        "err": "LibArchive requires to be installed",
+    },
+    "local": {"class": "fsspec.implementations.local.LocalFileSystem"},
+    "memory": {"class": "fsspec.implementations.memory.MemoryFileSystem"},
+    "oci": {
+        "class": "ocifs.OCIFileSystem",
+        "err": "Install ocifs to access OCI Object Storage",
+    },
+    "ocilake": {
+        "class": "ocifs.OCIFileSystem",
+        "err": "Install ocifs to access OCI Data Lake",
+    },
+    "oss": {
+        "class": "ossfs.OSSFileSystem",
+        "err": "Install ossfs to access Alibaba Object Storage System",
+    },
+    "pyscript": {
+        "class": "pyscript_fsspec_client.client.PyscriptFileSystem",
+        "err": "This only runs in a pyscript context",
+    },
+    "reference": {"class": "fsspec.implementations.reference.ReferenceFileSystem"},
+    "root": {
+        "class": "fsspec_xrootd.XRootDFileSystem",
+        "err": (
+            "Install fsspec-xrootd to access xrootd storage system. "
+            "Note: 'root' is the protocol name for xrootd storage systems, "
+            "not referring to root directories"
+        ),
+    },
+    "s3": {"class": "s3fs.S3FileSystem", "err": "Install s3fs to access S3"},
+    "s3a": {"class": "s3fs.S3FileSystem", "err": "Install s3fs to access S3"},
+    "sftp": {
+        "class": "fsspec.implementations.sftp.SFTPFileSystem",
+        "err": 'SFTPFileSystem requires "paramiko" to be installed',
+    },
+    "simplecache": {"class": "fsspec.implementations.cached.SimpleCacheFileSystem"},
+    "smb": {
+        "class": "fsspec.implementations.smb.SMBFileSystem",
+        "err": 'SMB requires "smbprotocol" or "smbprotocol[kerberos]" installed',
+    },
+    "ssh": {
+        "class": "fsspec.implementations.sftp.SFTPFileSystem",
+        "err": 'SFTPFileSystem requires "paramiko" to be installed',
+    },
+    "tar": {"class": "fsspec.implementations.tar.TarFileSystem"},
+    "tos": {
+        "class": "tosfs.TosFileSystem",
+        "err": "Install tosfs to access ByteDance volcano engine Tinder Object Storage",
+    },
+    "tosfs": {
+        "class": "tosfs.TosFileSystem",
+        "err": "Install tosfs to access ByteDance volcano engine Tinder Object Storage",
+    },
+    "wandb": {"class": "wandbfs.WandbFS", "err": "Install wandbfs to access wandb"},
+    "webdav": {
+        "class": "webdav4.fsspec.WebdavFileSystem",
+        "err": "Install webdav4 to access WebDAV",
+    },
+    "webhdfs": {
+        "class": "fsspec.implementations.webhdfs.WebHDFS",
+        "err": 'webHDFS access requires "requests" to be installed',
+    },
+    "zip": {"class": "fsspec.implementations.zip.ZipFileSystem"},
+}
+assert list(known_implementations) == sorted(known_implementations), (
+    "Not in alphabetical order"
+)
+def get_filesystem_class(protocol):
+    """Fetch named protocol implementation from the registry
+    The dict ``known_implementations`` maps protocol names to the locations
+    of classes implementing the corresponding file-system. When used for the
+    first time, appropriate imports will happen and the class will be placed in
+    the registry. All subsequent calls will fetch directly from the registry.
+    Some protocol implementations require additional dependencies, and so the
+    import may fail. In this case, the string in the "err" field of the
+    ``known_implementations`` will be given as the error message.
+    """
+    if not protocol:
+        protocol = default
+    if protocol not in registry:
+        if protocol not in known_implementations:
+            raise ValueError(f"Protocol not known: {protocol}")
+        bit = known_implementations[protocol]
+        try:
+            register_implementation(protocol, _import_class(bit["class"]))
+        except ImportError as e:
+            raise ImportError(bit.get("err")) from e
+    cls = registry[protocol]
+    if getattr(cls, "protocol", None) in ("abstract", None):
+        cls.protocol = protocol
+    return cls
+s3_msg = """Your installed version of s3fs is very old and known to cause
+severe performance issues, see also https://github.com/dask/dask/issues/10276
+To fix, you should specify a lower version bound on s3fs, or
+update the current installation.
+"""
+def _import_class(fqp: str):
+    """Take a fully-qualified path and return the imported class or identifier.
+    ``fqp`` is of the form "package.module.klass" or
+    "package.module:subobject.klass".
+    Warnings
+    --------
+    This can import arbitrary modules. Make sure you haven't installed any modules
+    that may execute malicious code at import time.
+    """
+    if ":" in fqp:
+        mod, name = fqp.rsplit(":", 1)
+    else:
+        mod, name = fqp.rsplit(".", 1)
+    is_s3 = mod == "s3fs"
+    mod = importlib.import_module(mod)
+    if is_s3 and mod.__version__.split(".") < ["0", "5"]:
+        warnings.warn(s3_msg)
+    for part in name.split("."):
+        mod = getattr(mod, part)
+    if not isinstance(mod, type):
+        raise TypeError(f"{fqp} is not a class")
+    return mod
+def filesystem(protocol, **storage_options):
+    """Instantiate filesystems for given protocol and arguments
+    ``storage_options`` are specific to the protocol being chosen, and are
+    passed directly to the class.
+    """
+    if protocol == "arrow_hdfs":
+        warnings.warn(
+            "The 'arrow_hdfs' protocol has been deprecated and will be "
+            "removed in the future. Specify it as 'hdfs'.",
+            DeprecationWarning,
+        )
+    cls = get_filesystem_class(protocol)
+    return cls(**storage_options)
+def available_protocols():
+    """Return a list of the implemented protocols.
+    Note that any given protocol may require extra packages to be importable.
+    """
+    return list(known_implementations)

venv/lib/python3.10/site-packages/fsspec/spec.py ADDED Viewed

	@@ -0,0 +1,2281 @@

+from __future__ import annotations
+import io
+import json
+import logging
+import os
+import threading
+import warnings
+import weakref
+from errno import ESPIPE
+from glob import has_magic
+from hashlib import sha256
+from typing import Any, ClassVar
+from .callbacks import DEFAULT_CALLBACK
+from .config import apply_config, conf
+from .dircache import DirCache
+from .transaction import Transaction
+from .utils import (
+    _unstrip_protocol,
+    glob_translate,
+    isfilelike,
+    other_paths,
+    read_block,
+    stringify_path,
+    tokenize,
+)
+logger = logging.getLogger("fsspec")
+def make_instance(cls, args, kwargs):
+    return cls(*args, **kwargs)
+class _Cached(type):
+    """
+    Metaclass for caching file system instances.
+    Notes
+    -----
+    Instances are cached according to
+    * The values of the class attributes listed in `_extra_tokenize_attributes`
+    * The arguments passed to ``__init__``.
+    This creates an additional reference to the filesystem, which prevents the
+    filesystem from being garbage collected when all *user* references go away.
+    A call to the :meth:`AbstractFileSystem.clear_instance_cache` must *also*
+    be made for a filesystem instance to be garbage collected.
+    """
+    def __init__(cls, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        # Note: we intentionally create a reference here, to avoid garbage
+        # collecting instances when all other references are gone. To really
+        # delete a FileSystem, the cache must be cleared.
+        if conf.get("weakref_instance_cache"):  # pragma: no cover
+            # debug option for analysing fork/spawn conditions
+            cls._cache = weakref.WeakValueDictionary()
+        else:
+            cls._cache = {}
+        cls._pid = os.getpid()
+    def __call__(cls, *args, **kwargs):
+        kwargs = apply_config(cls, kwargs)
+        extra_tokens = tuple(
+            getattr(cls, attr, None) for attr in cls._extra_tokenize_attributes
+        )
+        strip_tokenize_options = {
+            k: kwargs.pop(k) for k in cls._strip_tokenize_options if k in kwargs
+        }
+        token = tokenize(
+            cls, cls._pid, threading.get_ident(), *args, *extra_tokens, **kwargs
+        )
+        skip = kwargs.pop("skip_instance_cache", False)
+        if os.getpid() != cls._pid:
+            cls._cache.clear()
+            cls._pid = os.getpid()
+        if not skip and cls.cachable and token in cls._cache:
+            cls._latest = token
+            return cls._cache[token]
+        else:
+            obj = super().__call__(*args, **kwargs, **strip_tokenize_options)
+            # Setting _fs_token here causes some static linters to complain.
+            obj._fs_token_ = token
+            obj.storage_args = args
+            obj.storage_options = kwargs
+            if obj.async_impl and obj.mirror_sync_methods:
+                from .asyn import mirror_sync_methods
+                mirror_sync_methods(obj)
+            if cls.cachable and not skip:
+                cls._latest = token
+                cls._cache[token] = obj
+            return obj
+class AbstractFileSystem(metaclass=_Cached):
+    """
+    An abstract super-class for pythonic file-systems
+    Implementations are expected to be compatible with or, better, subclass
+    from here.
+    """
+    cachable = True  # this class can be cached, instances reused
+    _cached = False
+    blocksize = 2**22
+    sep = "/"
+    protocol: ClassVar[str | tuple[str, ...]] = "abstract"
+    _latest = None
+    async_impl = False
+    mirror_sync_methods = False
+    root_marker = ""  # For some FSs, may require leading '/' or other character
+    transaction_type = Transaction
+    #: Extra *class attributes* that should be considered when hashing.
+    _extra_tokenize_attributes = ()
+    #: *storage options* that should not be considered when hashing.
+    _strip_tokenize_options = ()
+    # Set by _Cached metaclass
+    storage_args: tuple[Any, ...]
+    storage_options: dict[str, Any]
+    def __init__(self, *args, **storage_options):
+        """Create and configure file-system instance
+        Instances may be cachable, so if similar enough arguments are seen
+        a new instance is not required. The token attribute exists to allow
+        implementations to cache instances if they wish.
+        A reasonable default should be provided if there are no arguments.
+        Subclasses should call this method.
+        Parameters
+        ----------
+        use_listings_cache, listings_expiry_time, max_paths:
+            passed to ``DirCache``, if the implementation supports
+            directory listing caching. Pass use_listings_cache=False
+            to disable such caching.
+        skip_instance_cache: bool
+            If this is a cachable implementation, pass True here to force
+            creating a new instance even if a matching instance exists, and prevent
+            storing this instance.
+        asynchronous: bool
+        loop: asyncio-compatible IOLoop or None
+        """
+        if self._cached:
+            # reusing instance, don't change
+            return
+        self._cached = True
+        self._intrans = False
+        self._transaction = None
+        self._invalidated_caches_in_transaction = []
+        self.dircache = DirCache(**storage_options)
+        if storage_options.pop("add_docs", None):
+            warnings.warn("add_docs is no longer supported.", FutureWarning)
+        if storage_options.pop("add_aliases", None):
+            warnings.warn("add_aliases has been removed.", FutureWarning)
+        # This is set in _Cached
+        self._fs_token_ = None
+    @property
+    def fsid(self):
+        """Persistent filesystem id that can be used to compare filesystems
+        across sessions.
+        """
+        raise NotImplementedError
+    @property
+    def _fs_token(self):
+        return self._fs_token_
+    def __dask_tokenize__(self):
+        return self._fs_token
+    def __hash__(self):
+        return int(self._fs_token, 16)
+    def __eq__(self, other):
+        return isinstance(other, type(self)) and self._fs_token == other._fs_token
+    def __reduce__(self):
+        return make_instance, (type(self), self.storage_args, self.storage_options)
+    @classmethod
+    def _strip_protocol(cls, path):
+        """Turn path from fully-qualified to file-system-specific
+        May require FS-specific handling, e.g., for relative paths or links.
+        """
+        if isinstance(path, list):
+            return [cls._strip_protocol(p) for p in path]
+        path = stringify_path(path)
+        protos = (cls.protocol,) if isinstance(cls.protocol, str) else cls.protocol
+        for protocol in protos:
+            if path.startswith(protocol + "://"):
+                path = path[len(protocol) + 3 :]
+            elif path.startswith(protocol + "::"):
+                path = path[len(protocol) + 2 :]
+        path = path.rstrip("/")
+        # use of root_marker to make minimum required path, e.g., "/"
+        return path or cls.root_marker
+    def unstrip_protocol(self, name: str) -> str:
+        """Format FS-specific path to generic, including protocol"""
+        protos = (self.protocol,) if isinstance(self.protocol, str) else self.protocol
+        for protocol in protos:
+            if name.startswith(f"{protocol}://"):
+                return name
+        return f"{protos[0]}://{name}"
+    @staticmethod
+    def _get_kwargs_from_urls(path):
+        """If kwargs can be encoded in the paths, extract them here
+        This should happen before instantiation of the class; incoming paths
+        then should be amended to strip the options in methods.
+        Examples may look like an sftp path "sftp://user@host:/my/path", where
+        the user and host should become kwargs and later get stripped.
+        """
+        # by default, nothing happens
+        return {}
+    @classmethod
+    def current(cls):
+        """Return the most recently instantiated FileSystem
+        If no instance has been created, then create one with defaults
+        """
+        if cls._latest in cls._cache:
+            return cls._cache[cls._latest]
+        return cls()
+    @property
+    def transaction(self):
+        """A context within which files are committed together upon exit
+        Requires the file class to implement `.commit()` and `.discard()`
+        for the normal and exception cases.
+        """
+        if self._transaction is None:
+            self._transaction = self.transaction_type(self)
+        return self._transaction
+    def start_transaction(self):
+        """Begin write transaction for deferring files, non-context version"""
+        self._intrans = True
+        self._transaction = self.transaction_type(self)
+        return self.transaction
+    def end_transaction(self):
+        """Finish write transaction, non-context version"""
+        self.transaction.complete()
+        self._transaction = None
+        # The invalid cache must be cleared after the transaction is completed.
+        for path in self._invalidated_caches_in_transaction:
+            self.invalidate_cache(path)
+        self._invalidated_caches_in_transaction.clear()
+    def invalidate_cache(self, path=None):
+        """
+        Discard any cached directory information
+        Parameters
+        ----------
+        path: string or None
+            If None, clear all listings cached else listings at or under given
+            path.
+        """
+        # Not necessary to implement invalidation mechanism, may have no cache.
+        # But if have, you should call this method of parent class from your
+        # subclass to ensure expiring caches after transacations correctly.
+        # See the implementation of FTPFileSystem in ftp.py
+        if self._intrans:
+            self._invalidated_caches_in_transaction.append(path)
+    def mkdir(self, path, create_parents=True, **kwargs):
+        """
+        Create directory entry at path
+        For systems that don't have true directories, may create an for
+        this instance only and not touch the real filesystem
+        Parameters
+        ----------
+        path: str
+            location
+        create_parents: bool
+            if True, this is equivalent to ``makedirs``
+        kwargs:
+            may be permissions, etc.
+        """
+        pass  # not necessary to implement, may not have directories
+    def makedirs(self, path, exist_ok=False):
+        """Recursively make directories
+        Creates directory at path and any intervening required directories.
+        Raises exception if, for instance, the path already exists but is a
+        file.
+        Parameters
+        ----------
+        path: str
+            leaf directory name
+        exist_ok: bool (False)
+            If False, will error if the target already exists
+        """
+        pass  # not necessary to implement, may not have directories
+    def rmdir(self, path):
+        """Remove a directory, if empty"""
+        pass  # not necessary to implement, may not have directories
+    def ls(self, path, detail=True, **kwargs):
+        """List objects at path.
+        This should include subdirectories and files at that location. The
+        difference between a file and a directory must be clear when details
+        are requested.
+        The specific keys, or perhaps a FileInfo class, or similar, is TBD,
+        but must be consistent across implementations.
+        Must include:
+        - full path to the entry (without protocol)
+        - size of the entry, in bytes. If the value cannot be determined, will
+          be ``None``.
+        - type of entry, "file", "directory" or other
+        Additional information
+        may be present, appropriate to the file-system, e.g., generation,
+        checksum, etc.
+        May use refresh=True|False to allow use of self._ls_from_cache to
+        check for a saved listing and avoid calling the backend. This would be
+        common where listing may be expensive.
+        Parameters
+        ----------
+        path: str
+        detail: bool
+            if True, gives a list of dictionaries, where each is the same as
+            the result of ``info(path)``. If False, gives a list of paths
+            (str).
+        kwargs: may have additional backend-specific options, such as version
+            information
+        Returns
+        -------
+        List of strings if detail is False, or list of directory information
+        dicts if detail is True.
+        """
+        raise NotImplementedError
+    def _ls_from_cache(self, path):
+        """Check cache for listing
+        Returns listing, if found (may be empty list for a directly that exists
+        but contains nothing), None if not in cache.
+        """
+        parent = self._parent(path)
+        try:
+            return self.dircache[path.rstrip("/")]
+        except KeyError:
+            pass
+        try:
+            files = [
+                f
+                for f in self.dircache[parent]
+                if f["name"] == path
+                or (f["name"] == path.rstrip("/") and f["type"] == "directory")
+            ]
+            if len(files) == 0:
+                # parent dir was listed but did not contain this file
+                raise FileNotFoundError(path)
+            return files
+        except KeyError:
+            pass
+    def walk(self, path, maxdepth=None, topdown=True, on_error="omit", **kwargs):
+        """Return all files under the given path.
+        List all files, recursing into subdirectories; output is iterator-style,
+        like ``os.walk()``. For a simple list of files, ``find()`` is available.
+        When topdown is True, the caller can modify the dirnames list in-place (perhaps
+        using del or slice assignment), and walk() will
+        only recurse into the subdirectories whose names remain in dirnames;
+        this can be used to prune the search, impose a specific order of visiting,
+        or even to inform walk() about directories the caller creates or renames before
+        it resumes walk() again.
+        Modifying dirnames when topdown is False has no effect. (see os.walk)
+        Note that the "files" outputted will include anything that is not
+        a directory, such as links.
+        Parameters
+        ----------
+        path: str
+            Root to recurse into
+        maxdepth: int
+            Maximum recursion depth. None means limitless, but not recommended
+            on link-based file-systems.
+        topdown: bool (True)
+            Whether to walk the directory tree from the top downwards or from
+            the bottom upwards.
+        on_error: "omit", "raise", a callable
+            if omit (default), path with exception will simply be empty;
+            If raise, an underlying exception will be raised;
+            if callable, it will be called with a single OSError instance as argument
+        kwargs: passed to ``ls``
+        """
+        if maxdepth is not None and maxdepth < 1:
+            raise ValueError("maxdepth must be at least 1")
+        path = self._strip_protocol(path)
+        full_dirs = {}
+        dirs = {}
+        files = {}
+        detail = kwargs.pop("detail", False)
+        try:
+            listing = self.ls(path, detail=True, **kwargs)
+        except (FileNotFoundError, OSError) as e:
+            if on_error == "raise":
+                raise
+            if callable(on_error):
+                on_error(e)
+            return
+        for info in listing:
+            # each info name must be at least [path]/part , but here
+            # we check also for names like [path]/part/
+            pathname = info["name"].rstrip("/")
+            name = pathname.rsplit("/", 1)[-1]
+            if info["type"] == "directory" and pathname != path:
+                # do not include "self" path
+                full_dirs[name] = pathname
+                dirs[name] = info
+            elif pathname == path:
+                # file-like with same name as give path
+                files[""] = info
+            else:
+                files[name] = info
+        if not detail:
+            dirs = list(dirs)
+            files = list(files)
+        if topdown:
+            # Yield before recursion if walking top down
+            yield path, dirs, files
+        if maxdepth is not None:
+            maxdepth -= 1
+            if maxdepth < 1:
+                if not topdown:
+                    yield path, dirs, files
+                return
+        for d in dirs:
+            yield from self.walk(
+                full_dirs[d],
+                maxdepth=maxdepth,
+                detail=detail,
+                topdown=topdown,
+                **kwargs,
+            )
+        if not topdown:
+            # Yield after recursion if walking bottom up
+            yield path, dirs, files
+    def find(self, path, maxdepth=None, withdirs=False, detail=False, **kwargs):
+        """List all files below path.
+        Like posix ``find`` command without conditions
+        Parameters
+        ----------
+        path : str
+        maxdepth: int or None
+            If not None, the maximum number of levels to descend
+        withdirs: bool
+            Whether to include directory paths in the output. This is True
+            when used by glob, but users usually only want files.
+        kwargs are passed to ``ls``.
+        """
+        # TODO: allow equivalent of -name parameter
+        path = self._strip_protocol(path)
+        out = {}
+        # Add the root directory if withdirs is requested
+        # This is needed for posix glob compliance
+        if withdirs and path != "" and self.isdir(path):
+            out[path] = self.info(path)
+        for _, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs):
+            if withdirs:
+                files.update(dirs)
+            out.update({info["name"]: info for name, info in files.items()})
+        if not out and self.isfile(path):
+            # walk works on directories, but find should also return [path]
+            # when path happens to be a file
+            out[path] = {}
+        names = sorted(out)
+        if not detail:
+            return names
+        else:
+            return {name: out[name] for name in names}
+    def du(self, path, total=True, maxdepth=None, withdirs=False, **kwargs):
+        """Space used by files and optionally directories within a path
+        Directory size does not include the size of its contents.
+        Parameters
+        ----------
+        path: str
+        total: bool
+            Whether to sum all the file sizes
+        maxdepth: int or None
+            Maximum number of directory levels to descend, None for unlimited.
+        withdirs: bool
+            Whether to include directory paths in the output.
+        kwargs: passed to ``find``
+        Returns
+        -------
+        Dict of {path: size} if total=False, or int otherwise, where numbers
+        refer to bytes used.
+        """
+        sizes = {}
+        if withdirs and self.isdir(path):
+            # Include top-level directory in output
+            info = self.info(path)
+            sizes[info["name"]] = info["size"]
+        for f in self.find(path, maxdepth=maxdepth, withdirs=withdirs, **kwargs):
+            info = self.info(f)
+            sizes[info["name"]] = info["size"]
+        if total:
+            return sum(sizes.values())
+        else:
+            return sizes
+    def glob(self, path, maxdepth=None, **kwargs):
+        """Find files by glob-matching.
+        Pattern matching capabilities for finding files that match the given pattern.
+        Parameters
+        ----------
+        path: str
+            The glob pattern to match against
+        maxdepth: int or None
+            Maximum depth for ``'**'`` patterns. Applied on the first ``'**'`` found.
+            Must be at least 1 if provided.
+        kwargs:
+            Additional arguments passed to ``find`` (e.g., detail=True)
+        Returns
+        -------
+        List of matched paths, or dict of paths and their info if detail=True
+        Notes
+        -----
+        Supported patterns:
+        - '*': Matches any sequence of characters within a single directory level
+        - ``'**'``: Matches any number of directory levels (must be an entire path component)
+        - '?': Matches exactly one character
+        - '[abc]': Matches any character in the set
+        - '[a-z]': Matches any character in the range
+        - '[!abc]': Matches any character NOT in the set
+        Special behaviors:
+        - If the path ends with '/', only folders are returned
+        - Consecutive '*' characters are compressed into a single '*'
+        - Empty brackets '[]' never match anything
+        - Negated empty brackets '[!]' match any single character
+        - Special characters in character classes are escaped properly
+        Limitations:
+        - ``'**'`` must be a complete path component (e.g., ``'a/**/b'``, not ``'a**b'``)
+        - No brace expansion ('{a,b}.txt')
+        - No extended glob patterns ('+(pattern)', '!(pattern)')
+        """
+        if maxdepth is not None and maxdepth < 1:
+            raise ValueError("maxdepth must be at least 1")
+        import re
+        seps = (os.path.sep, os.path.altsep) if os.path.altsep else (os.path.sep,)
+        ends_with_sep = path.endswith(seps)  # _strip_protocol strips trailing slash
+        path = self._strip_protocol(path)
+        append_slash_to_dirname = ends_with_sep or path.endswith(
+            tuple(sep + "**" for sep in seps)
+        )
+        idx_star = path.find("*") if path.find("*") >= 0 else len(path)
+        idx_qmark = path.find("?") if path.find("?") >= 0 else len(path)
+        idx_brace = path.find("[") if path.find("[") >= 0 else len(path)
+        min_idx = min(idx_star, idx_qmark, idx_brace)
+        detail = kwargs.pop("detail", False)
+        if not has_magic(path):
+            if self.exists(path, **kwargs):
+                if not detail:
+                    return [path]
+                else:
+                    return {path: self.info(path, **kwargs)}
+            else:
+                if not detail:
+                    return []  # glob of non-existent returns empty
+                else:
+                    return {}
+        elif "/" in path[:min_idx]:
+            min_idx = path[:min_idx].rindex("/")
+            root = path[: min_idx + 1]
+            depth = path[min_idx + 1 :].count("/") + 1
+        else:
+            root = ""
+            depth = path[min_idx + 1 :].count("/") + 1
+        if "**" in path:
+            if maxdepth is not None:
+                idx_double_stars = path.find("**")
+                depth_double_stars = path[idx_double_stars:].count("/") + 1
+                depth = depth - depth_double_stars + maxdepth
+            else:
+                depth = None
+        allpaths = self.find(root, maxdepth=depth, withdirs=True, detail=True, **kwargs)
+        pattern = glob_translate(path + ("/" if ends_with_sep else ""))
+        pattern = re.compile(pattern)
+        out = {
+            p: info
+            for p, info in sorted(allpaths.items())
+            if pattern.match(
+                p + "/"
+                if append_slash_to_dirname and info["type"] == "directory"
+                else p
+            )
+        }
+        if detail:
+            return out
+        else:
+            return list(out)
+    def exists(self, path, **kwargs):
+        """Is there a file at the given path"""
+        try:
+            self.info(path, **kwargs)
+            return True
+        except:  # noqa: E722
+            # any exception allowed bar FileNotFoundError?
+            return False
+    def lexists(self, path, **kwargs):
+        """If there is a file at the given path (including
+        broken links)"""
+        return self.exists(path)
+    def info(self, path, **kwargs):
+        """Give details of entry at path
+        Returns a single dictionary, with exactly the same information as ``ls``
+        would with ``detail=True``.
+        The default implementation calls ls and could be overridden by a
+        shortcut. kwargs are passed on to ```ls()``.
+        Some file systems might not be able to measure the file's size, in
+        which case, the returned dict will include ``'size': None``.
+        Returns
+        -------
+        dict with keys: name (full path in the FS), size (in bytes), type (file,
+        directory, or something else) and other FS-specific keys.
+        """
+        path = self._strip_protocol(path)
+        out = self.ls(self._parent(path), detail=True, **kwargs)
+        out = [o for o in out if o["name"].rstrip("/") == path]
+        if out:
+            return out[0]
+        out = self.ls(path, detail=True, **kwargs)
+        path = path.rstrip("/")
+        out1 = [o for o in out if o["name"].rstrip("/") == path]
+        if len(out1) == 1:
+            if "size" not in out1[0]:
+                out1[0]["size"] = None
+            return out1[0]
+        elif len(out1) > 1 or out:
+            return {"name": path, "size": 0, "type": "directory"}
+        else:
+            raise FileNotFoundError(path)
+    def checksum(self, path):
+        """Unique value for current version of file
+        If the checksum is the same from one moment to another, the contents
+        are guaranteed to be the same. If the checksum changes, the contents
+        *might* have changed.
+        This should normally be overridden; default will probably capture
+        creation/modification timestamp (which would be good) or maybe
+        access timestamp (which would be bad)
+        """
+        return int(tokenize(self.info(path)), 16)
+    def size(self, path):
+        """Size in bytes of file"""
+        return self.info(path).get("size", None)
+    def sizes(self, paths):
+        """Size in bytes of each file in a list of paths"""
+        return [self.size(p) for p in paths]
+    def isdir(self, path):
+        """Is this entry directory-like?"""
+        try:
+            return self.info(path)["type"] == "directory"
+        except OSError:
+            return False
+    def isfile(self, path):
+        """Is this entry file-like?"""
+        try:
+            return self.info(path)["type"] == "file"
+        except:  # noqa: E722
+            return False
+    def read_text(self, path, encoding=None, errors=None, newline=None, **kwargs):
+        """Get the contents of the file as a string.
+        Parameters
+        ----------
+        path: str
+            URL of file on this filesystems
+        encoding, errors, newline: same as `open`.
+        """
+        with self.open(
+            path,
+            mode="r",
+            encoding=encoding,
+            errors=errors,
+            newline=newline,
+            **kwargs,
+        ) as f:
+            return f.read()
+    def write_text(
+        self, path, value, encoding=None, errors=None, newline=None, **kwargs
+    ):
+        """Write the text to the given file.
+        An existing file will be overwritten.
+        Parameters
+        ----------
+        path: str
+            URL of file on this filesystems
+        value: str
+            Text to write.
+        encoding, errors, newline: same as `open`.
+        """
+        with self.open(
+            path,
+            mode="w",
+            encoding=encoding,
+            errors=errors,
+            newline=newline,
+            **kwargs,
+        ) as f:
+            return f.write(value)
+    def cat_file(self, path, start=None, end=None, **kwargs):
+        """Get the content of a file
+        Parameters
+        ----------
+        path: URL of file on this filesystems
+        start, end: int
+            Bytes limits of the read. If negative, backwards from end,
+            like usual python slices. Either can be None for start or
+            end of file, respectively
+        kwargs: passed to ``open()``.
+        """
+        # explicitly set buffering off?
+        with self.open(path, "rb", **kwargs) as f:
+            if start is not None:
+                if start >= 0:
+                    f.seek(start)
+                else:
+                    f.seek(max(0, f.size + start))
+            if end is not None:
+                if end < 0:
+                    end = f.size + end
+                return f.read(end - f.tell())
+            return f.read()
+    def pipe_file(self, path, value, mode="overwrite", **kwargs):
+        """Set the bytes of given file"""
+        if mode == "create" and self.exists(path):
+            # non-atomic but simple way; or could use "xb" in open(), which is likely
+            # not as well supported
+            raise FileExistsError
+        with self.open(path, "wb", **kwargs) as f:
+            f.write(value)
+    def pipe(self, path, value=None, **kwargs):
+        """Put value into path
+        (counterpart to ``cat``)
+        Parameters
+        ----------
+        path: string or dict(str, bytes)
+            If a string, a single remote location to put ``value`` bytes; if a dict,
+            a mapping of {path: bytesvalue}.
+        value: bytes, optional
+            If using a single path, these are the bytes to put there. Ignored if
+            ``path`` is a dict
+        """
+        if isinstance(path, str):
+            self.pipe_file(self._strip_protocol(path), value, **kwargs)
+        elif isinstance(path, dict):
+            for k, v in path.items():
+                self.pipe_file(self._strip_protocol(k), v, **kwargs)
+        else:
+            raise ValueError("path must be str or dict")
+    def cat_ranges(
+        self, paths, starts, ends, max_gap=None, on_error="return", **kwargs
+    ):
+        """Get the contents of byte ranges from one or more files
+        Parameters
+        ----------
+        paths: list
+            A list of of filepaths on this filesystems
+        starts, ends: int or list
+            Bytes limits of the read. If using a single int, the same value will be
+            used to read all the specified files.
+        """
+        if max_gap is not None:
+            raise NotImplementedError
+        if not isinstance(paths, list):
+            raise TypeError
+        if not isinstance(starts, list):
+            starts = [starts] * len(paths)
+        if not isinstance(ends, list):
+            ends = [ends] * len(paths)
+        if len(starts) != len(paths) or len(ends) != len(paths):
+            raise ValueError
+        out = []
+        for p, s, e in zip(paths, starts, ends):
+            try:
+                out.append(self.cat_file(p, s, e))
+            except Exception as e:
+                if on_error == "return":
+                    out.append(e)
+                else:
+                    raise
+        return out
+    def cat(self, path, recursive=False, on_error="raise", **kwargs):
+        """Fetch (potentially multiple) paths' contents
+        Parameters
+        ----------
+        recursive: bool
+            If True, assume the path(s) are directories, and get all the
+            contained files
+        on_error : "raise", "omit", "return"
+            If raise, an underlying exception will be raised (converted to KeyError
+            if the type is in self.missing_exceptions); if omit, keys with exception
+            will simply not be included in the output; if "return", all keys are
+            included in the output, but the value will be bytes or an exception
+            instance.
+        kwargs: passed to cat_file
+        Returns
+        -------
+        dict of {path: contents} if there are multiple paths
+        or the path has been otherwise expanded
+        """
+        paths = self.expand_path(path, recursive=recursive, **kwargs)
+        if (
+            len(paths) > 1
+            or isinstance(path, list)
+            or paths[0] != self._strip_protocol(path)
+        ):
+            out = {}
+            for path in paths:
+                try:
+                    out[path] = self.cat_file(path, **kwargs)
+                except Exception as e:
+                    if on_error == "raise":
+                        raise
+                    if on_error == "return":
+                        out[path] = e
+            return out
+        else:
+            return self.cat_file(paths[0], **kwargs)
+    def get_file(self, rpath, lpath, callback=DEFAULT_CALLBACK, outfile=None, **kwargs):
+        """Copy single remote file to local"""
+        from .implementations.local import LocalFileSystem
+        if isfilelike(lpath):
+            outfile = lpath
+        elif self.isdir(rpath):
+            os.makedirs(lpath, exist_ok=True)
+            return None
+        fs = LocalFileSystem(auto_mkdir=True)
+        fs.makedirs(fs._parent(lpath), exist_ok=True)
+        with self.open(rpath, "rb", **kwargs) as f1:
+            if outfile is None:
+                outfile = open(lpath, "wb")
+            try:
+                callback.set_size(getattr(f1, "size", None))
+                data = True
+                while data:
+                    data = f1.read(self.blocksize)
+                    segment_len = outfile.write(data)
+                    if segment_len is None:
+                        segment_len = len(data)
+                    callback.relative_update(segment_len)
+            finally:
+                if not isfilelike(lpath):
+                    outfile.close()
+    def get(
+        self,
+        rpath,
+        lpath,
+        recursive=False,
+        callback=DEFAULT_CALLBACK,
+        maxdepth=None,
+        **kwargs,
+    ):
+        """Copy file(s) to local.
+        Copies a specific file or tree of files (if recursive=True). If lpath
+        ends with a "/", it will be assumed to be a directory, and target files
+        will go within. Can submit a list of paths, which may be glob-patterns
+        and will be expanded.
+        Calls get_file for each source.
+        """
+        if isinstance(lpath, list) and isinstance(rpath, list):
+            # No need to expand paths when both source and destination
+            # are provided as lists
+            rpaths = rpath
+            lpaths = lpath
+        else:
+            from .implementations.local import (
+                LocalFileSystem,
+                make_path_posix,
+                trailing_sep,
+            )
+            source_is_str = isinstance(rpath, str)
+            rpaths = self.expand_path(
+                rpath, recursive=recursive, maxdepth=maxdepth, **kwargs
+            )
+            if source_is_str and (not recursive or maxdepth is not None):
+                # Non-recursive glob does not copy directories
+                rpaths = [p for p in rpaths if not (trailing_sep(p) or self.isdir(p))]
+                if not rpaths:
+                    return
+            if isinstance(lpath, str):
+                lpath = make_path_posix(lpath)
+            source_is_file = len(rpaths) == 1
+            dest_is_dir = isinstance(lpath, str) and (
+                trailing_sep(lpath) or LocalFileSystem().isdir(lpath)
+            )
+            exists = source_is_str and (
+                (has_magic(rpath) and source_is_file)
+                or (not has_magic(rpath) and dest_is_dir and not trailing_sep(rpath))
+            )
+            lpaths = other_paths(
+                rpaths,
+                lpath,
+                exists=exists,
+                flatten=not source_is_str,
+            )
+        callback.set_size(len(lpaths))
+        for lpath, rpath in callback.wrap(zip(lpaths, rpaths)):
+            with callback.branched(rpath, lpath) as child:
+                self.get_file(rpath, lpath, callback=child, **kwargs)
+    def put_file(
+        self, lpath, rpath, callback=DEFAULT_CALLBACK, mode="overwrite", **kwargs
+    ):
+        """Copy single file to remote"""
+        if mode == "create" and self.exists(rpath):
+            raise FileExistsError
+        if os.path.isdir(lpath):
+            self.makedirs(rpath, exist_ok=True)
+            return None
+        with open(lpath, "rb") as f1:
+            size = f1.seek(0, 2)
+            callback.set_size(size)
+            f1.seek(0)
+            self.mkdirs(self._parent(os.fspath(rpath)), exist_ok=True)
+            with self.open(rpath, "wb", **kwargs) as f2:
+                while f1.tell() < size:
+                    data = f1.read(self.blocksize)
+                    segment_len = f2.write(data)
+                    if segment_len is None:
+                        segment_len = len(data)
+                    callback.relative_update(segment_len)
+    def put(
+        self,
+        lpath,
+        rpath,
+        recursive=False,
+        callback=DEFAULT_CALLBACK,
+        maxdepth=None,
+        **kwargs,
+    ):
+        """Copy file(s) from local.
+        Copies a specific file or tree of files (if recursive=True). If rpath
+        ends with a "/", it will be assumed to be a directory, and target files
+        will go within.
+        Calls put_file for each source.
+        """
+        if isinstance(lpath, list) and isinstance(rpath, list):
+            # No need to expand paths when both source and destination
+            # are provided as lists
+            rpaths = rpath
+            lpaths = lpath
+        else:
+            from .implementations.local import (
+                LocalFileSystem,
+                make_path_posix,
+                trailing_sep,
+            )
+            source_is_str = isinstance(lpath, str)
+            if source_is_str:
+                lpath = make_path_posix(lpath)
+            fs = LocalFileSystem()
+            lpaths = fs.expand_path(
+                lpath, recursive=recursive, maxdepth=maxdepth, **kwargs
+            )
+            if source_is_str and (not recursive or maxdepth is not None):
+                # Non-recursive glob does not copy directories
+                lpaths = [p for p in lpaths if not (trailing_sep(p) or fs.isdir(p))]
+                if not lpaths:
+                    return
+            source_is_file = len(lpaths) == 1
+            dest_is_dir = isinstance(rpath, str) and (
+                trailing_sep(rpath) or self.isdir(rpath)
+            )
+            rpath = (
+                self._strip_protocol(rpath)
+                if isinstance(rpath, str)
+                else [self._strip_protocol(p) for p in rpath]
+            )
+            exists = source_is_str and (
+                (has_magic(lpath) and source_is_file)
+                or (not has_magic(lpath) and dest_is_dir and not trailing_sep(lpath))
+            )
+            rpaths = other_paths(
+                lpaths,
+                rpath,
+                exists=exists,
+                flatten=not source_is_str,
+            )
+        callback.set_size(len(rpaths))
+        for lpath, rpath in callback.wrap(zip(lpaths, rpaths)):
+            with callback.branched(lpath, rpath) as child:
+                self.put_file(lpath, rpath, callback=child, **kwargs)
+    def head(self, path, size=1024):
+        """Get the first ``size`` bytes from file"""
+        with self.open(path, "rb") as f:
+            return f.read(size)
+    def tail(self, path, size=1024):
+        """Get the last ``size`` bytes from file"""
+        with self.open(path, "rb") as f:
+            f.seek(max(-size, -f.size), 2)
+            return f.read()
+    def cp_file(self, path1, path2, **kwargs):
+        raise NotImplementedError
+    def copy(
+        self, path1, path2, recursive=False, maxdepth=None, on_error=None, **kwargs
+    ):
+        """Copy within two locations in the filesystem
+        on_error : "raise", "ignore"
+            If raise, any not-found exceptions will be raised; if ignore any
+            not-found exceptions will cause the path to be skipped; defaults to
+            raise unless recursive is true, where the default is ignore
+        """
+        if on_error is None and recursive:
+            on_error = "ignore"
+        elif on_error is None:
+            on_error = "raise"
+        if isinstance(path1, list) and isinstance(path2, list):
+            # No need to expand paths when both source and destination
+            # are provided as lists
+            paths1 = path1
+            paths2 = path2
+        else:
+            from .implementations.local import trailing_sep
+            source_is_str = isinstance(path1, str)
+            paths1 = self.expand_path(
+                path1, recursive=recursive, maxdepth=maxdepth, **kwargs
+            )
+            if source_is_str and (not recursive or maxdepth is not None):
+                # Non-recursive glob does not copy directories
+                paths1 = [p for p in paths1 if not (trailing_sep(p) or self.isdir(p))]
+                if not paths1:
+                    return
+            source_is_file = len(paths1) == 1
+            dest_is_dir = isinstance(path2, str) and (
+                trailing_sep(path2) or self.isdir(path2)
+            )
+            exists = source_is_str and (
+                (has_magic(path1) and source_is_file)
+                or (not has_magic(path1) and dest_is_dir and not trailing_sep(path1))
+            )
+            paths2 = other_paths(
+                paths1,
+                path2,
+                exists=exists,
+                flatten=not source_is_str,
+            )
+        for p1, p2 in zip(paths1, paths2):
+            try:
+                self.cp_file(p1, p2, **kwargs)
+            except FileNotFoundError:
+                if on_error == "raise":
+                    raise
+    def expand_path(self, path, recursive=False, maxdepth=None, **kwargs):
+        """Turn one or more globs or directories into a list of all matching paths
+        to files or directories.
+        kwargs are passed to ``glob`` or ``find``, which may in turn call ``ls``
+        """
+        if maxdepth is not None and maxdepth < 1:
+            raise ValueError("maxdepth must be at least 1")
+        if isinstance(path, (str, os.PathLike)):
+            out = self.expand_path([path], recursive, maxdepth, **kwargs)
+        else:
+            out = set()
+            path = [self._strip_protocol(p) for p in path]
+            for p in path:
+                if has_magic(p):
+                    bit = set(self.glob(p, maxdepth=maxdepth, **kwargs))
+                    out |= bit
+                    if recursive:
+                        # glob call above expanded one depth so if maxdepth is defined
+                        # then decrement it in expand_path call below. If it is zero
+                        # after decrementing then avoid expand_path call.
+                        if maxdepth is not None and maxdepth <= 1:
+                            continue
+                        out |= set(
+                            self.expand_path(
+                                list(bit),
+                                recursive=recursive,
+                                maxdepth=maxdepth - 1 if maxdepth is not None else None,
+                                **kwargs,
+                            )
+                        )
+                    continue
+                elif recursive:
+                    rec = set(
+                        self.find(
+                            p, maxdepth=maxdepth, withdirs=True, detail=False, **kwargs
+                        )
+                    )
+                    out |= rec
+                if p not in out and (recursive is False or self.exists(p)):
+                    # should only check once, for the root
+                    out.add(p)
+        if not out:
+            raise FileNotFoundError(path)
+        return sorted(out)
+    def mv(self, path1, path2, recursive=False, maxdepth=None, **kwargs):
+        """Move file(s) from one location to another"""
+        if path1 == path2:
+            logger.debug("%s mv: The paths are the same, so no files were moved.", self)
+        else:
+            # explicitly raise exception to prevent data corruption
+            self.copy(
+                path1, path2, recursive=recursive, maxdepth=maxdepth, onerror="raise"
+            )
+            self.rm(path1, recursive=recursive)
+    def rm_file(self, path):
+        """Delete a file"""
+        self._rm(path)
+    def _rm(self, path):
+        """Delete one file"""
+        # this is the old name for the method, prefer rm_file
+        raise NotImplementedError
+    def rm(self, path, recursive=False, maxdepth=None):
+        """Delete files.
+        Parameters
+        ----------
+        path: str or list of str
+            File(s) to delete.
+        recursive: bool
+            If file(s) are directories, recursively delete contents and then
+            also remove the directory
+        maxdepth: int or None
+            Depth to pass to walk for finding files to delete, if recursive.
+            If None, there will be no limit and infinite recursion may be
+            possible.
+        """
+        path = self.expand_path(path, recursive=recursive, maxdepth=maxdepth)
+        for p in reversed(path):
+            self.rm_file(p)
+    @classmethod
+    def _parent(cls, path):
+        path = cls._strip_protocol(path)
+        if "/" in path:
+            parent = path.rsplit("/", 1)[0].lstrip(cls.root_marker)
+            return cls.root_marker + parent
+        else:
+            return cls.root_marker
+    def _open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        autocommit=True,
+        cache_options=None,
+        **kwargs,
+    ):
+        """Return raw bytes-mode file-like from the file-system"""
+        return AbstractBufferedFile(
+            self,
+            path,
+            mode,
+            block_size,
+            autocommit,
+            cache_options=cache_options,
+            **kwargs,
+        )
+    def open(
+        self,
+        path,
+        mode="rb",
+        block_size=None,
+        cache_options=None,
+        compression=None,
+        **kwargs,
+    ):
+        """
+        Return a file-like object from the filesystem
+        The resultant instance must function correctly in a context ``with``
+        block.
+        Parameters
+        ----------
+        path: str
+            Target file
+        mode: str like 'rb', 'w'
+            See builtin ``open()``
+            Mode "x" (exclusive write) may be implemented by the backend. Even if
+            it is, whether  it is checked up front or on commit, and whether it is
+            atomic is implementation-dependent.
+        block_size: int
+            Some indication of buffering - this is a value in bytes
+        cache_options : dict, optional
+            Extra arguments to pass through to the cache.
+        compression: string or None
+            If given, open file using compression codec. Can either be a compression
+            name (a key in ``fsspec.compression.compr``) or "infer" to guess the
+            compression from the filename suffix.
+        encoding, errors, newline: passed on to TextIOWrapper for text mode
+        """
+        import io
+        path = self._strip_protocol(path)
+        if "b" not in mode:
+            mode = mode.replace("t", "") + "b"
+            text_kwargs = {
+                k: kwargs.pop(k)
+                for k in ["encoding", "errors", "newline"]
+                if k in kwargs
+            }
+            return io.TextIOWrapper(
+                self.open(
+                    path,
+                    mode,
+                    block_size=block_size,
+                    cache_options=cache_options,
+                    compression=compression,
+                    **kwargs,
+                ),
+                **text_kwargs,
+            )
+        else:
+            ac = kwargs.pop("autocommit", not self._intrans)
+            f = self._open(
+                path,
+                mode=mode,
+                block_size=block_size,
+                autocommit=ac,
+                cache_options=cache_options,
+                **kwargs,
+            )
+            if compression is not None:
+                from fsspec.compression import compr
+                from fsspec.core import get_compression
+                compression = get_compression(path, compression)
+                compress = compr[compression]
+                f = compress(f, mode=mode[0])
+            if not ac and "r" not in mode:
+                self.transaction.files.append(f)
+            return f
+    def touch(self, path, truncate=True, **kwargs):
+        """Create empty file, or update timestamp
+        Parameters
+        ----------
+        path: str
+            file location
+        truncate: bool
+            If True, always set file size to 0; if False, update timestamp and
+            leave file unchanged, if backend allows this
+        """
+        if truncate or not self.exists(path):
+            with self.open(path, "wb", **kwargs):
+                pass
+        else:
+            raise NotImplementedError  # update timestamp, if possible
+    def ukey(self, path):
+        """Hash of file properties, to tell if it has changed"""
+        return sha256(str(self.info(path)).encode()).hexdigest()
+    def read_block(self, fn, offset, length, delimiter=None):
+        """Read a block of bytes from
+        Starting at ``offset`` of the file, read ``length`` bytes.  If
+        ``delimiter`` is set then we ensure that the read starts and stops at
+        delimiter boundaries that follow the locations ``offset`` and ``offset
+        + length``.  If ``offset`` is zero then we start at zero.  The
+        bytestring returned WILL include the end delimiter string.
+        If offset+length is beyond the eof, reads to eof.
+        Parameters
+        ----------
+        fn: string
+            Path to filename
+        offset: int
+            Byte offset to start read
+        length: int
+            Number of bytes to read. If None, read to end.
+        delimiter: bytes (optional)
+            Ensure reading starts and stops at delimiter bytestring
+        Examples
+        --------
+        >>> fs.read_block('data/file.csv', 0, 13)  # doctest: +SKIP
+        b'Alice, 100\\nBo'
+        >>> fs.read_block('data/file.csv', 0, 13, delimiter=b'\\n')  # doctest: +SKIP
+        b'Alice, 100\\nBob, 200\\n'
+        Use ``length=None`` to read to the end of the file.
+        >>> fs.read_block('data/file.csv', 0, None, delimiter=b'\\n')  # doctest: +SKIP
+        b'Alice, 100\\nBob, 200\\nCharlie, 300'
+        See Also
+        --------
+        :func:`fsspec.utils.read_block`
+        """
+        with self.open(fn, "rb") as f:
+            size = f.size
+            if length is None:
+                length = size
+            if size is not None and offset + length > size:
+                length = size - offset
+            return read_block(f, offset, length, delimiter)
+    def to_json(self, *, include_password: bool = True) -> str:
+        """
+        JSON representation of this filesystem instance.
+        Parameters
+        ----------
+        include_password: bool, default True
+            Whether to include the password (if any) in the output.
+        Returns
+        -------
+        JSON string with keys ``cls`` (the python location of this class),
+        protocol (text name of this class's protocol, first one in case of
+        multiple), ``args`` (positional args, usually empty), and all other
+        keyword arguments as their own keys.
+        Warnings
+        --------
+        Serialized filesystems may contain sensitive information which have been
+        passed to the constructor, such as passwords and tokens. Make sure you
+        store and send them in a secure environment!
+        """
+        from .json import FilesystemJSONEncoder
+        return json.dumps(
+            self,
+            cls=type(
+                "_FilesystemJSONEncoder",
+                (FilesystemJSONEncoder,),
+                {"include_password": include_password},
+            ),
+        )
+    @staticmethod
+    def from_json(blob: str) -> AbstractFileSystem:
+        """
+        Recreate a filesystem instance from JSON representation.
+        See ``.to_json()`` for the expected structure of the input.
+        Parameters
+        ----------
+        blob: str
+        Returns
+        -------
+        file system instance, not necessarily of this particular class.
+        Warnings
+        --------
+        This can import arbitrary modules (as determined by the ``cls`` key).
+        Make sure you haven't installed any modules that may execute malicious code
+        at import time.
+        """
+        from .json import FilesystemJSONDecoder
+        return json.loads(blob, cls=FilesystemJSONDecoder)
+    def to_dict(self, *, include_password: bool = True) -> dict[str, Any]:
+        """
+        JSON-serializable dictionary representation of this filesystem instance.
+        Parameters
+        ----------
+        include_password: bool, default True
+            Whether to include the password (if any) in the output.
+        Returns
+        -------
+        Dictionary with keys ``cls`` (the python location of this class),
+        protocol (text name of this class's protocol, first one in case of
+        multiple), ``args`` (positional args, usually empty), and all other
+        keyword arguments as their own keys.
+        Warnings
+        --------
+        Serialized filesystems may contain sensitive information which have been
+        passed to the constructor, such as passwords and tokens. Make sure you
+        store and send them in a secure environment!
+        """
+        from .json import FilesystemJSONEncoder
+        json_encoder = FilesystemJSONEncoder()
+        cls = type(self)
+        proto = self.protocol
+        storage_options = dict(self.storage_options)
+        if not include_password:
+            storage_options.pop("password", None)
+        return dict(
+            cls=f"{cls.__module__}:{cls.__name__}",
+            protocol=proto[0] if isinstance(proto, (tuple, list)) else proto,
+            args=json_encoder.make_serializable(self.storage_args),
+            **json_encoder.make_serializable(storage_options),
+        )
+    @staticmethod
+    def from_dict(dct: dict[str, Any]) -> AbstractFileSystem:
+        """
+        Recreate a filesystem instance from dictionary representation.
+        See ``.to_dict()`` for the expected structure of the input.
+        Parameters
+        ----------
+        dct: Dict[str, Any]
+        Returns
+        -------
+        file system instance, not necessarily of this particular class.
+        Warnings
+        --------
+        This can import arbitrary modules (as determined by the ``cls`` key).
+        Make sure you haven't installed any modules that may execute malicious code
+        at import time.
+        """
+        from .json import FilesystemJSONDecoder
+        json_decoder = FilesystemJSONDecoder()
+        dct = dict(dct)  # Defensive copy
+        cls = FilesystemJSONDecoder.try_resolve_fs_cls(dct)
+        if cls is None:
+            raise ValueError("Not a serialized AbstractFileSystem")
+        dct.pop("cls", None)
+        dct.pop("protocol", None)
+        return cls(
+            *json_decoder.unmake_serializable(dct.pop("args", ())),
+            **json_decoder.unmake_serializable(dct),
+        )
+    def _get_pyarrow_filesystem(self):
+        """
+        Make a version of the FS instance which will be acceptable to pyarrow
+        """
+        # all instances already also derive from pyarrow
+        return self
+    def get_mapper(self, root="", check=False, create=False, missing_exceptions=None):
+        """Create key/value store based on this file-system
+        Makes a MutableMapping interface to the FS at the given root path.
+        See ``fsspec.mapping.FSMap`` for further details.
+        """
+        from .mapping import FSMap
+        return FSMap(
+            root,
+            self,
+            check=check,
+            create=create,
+            missing_exceptions=missing_exceptions,
+        )
+    @classmethod
+    def clear_instance_cache(cls):
+        """
+        Clear the cache of filesystem instances.
+        Notes
+        -----
+        Unless overridden by setting the ``cachable`` class attribute to False,
+        the filesystem class stores a reference to newly created instances. This
+        prevents Python's normal rules around garbage collection from working,
+        since the instances refcount will not drop to zero until
+        ``clear_instance_cache`` is called.
+        """
+        cls._cache.clear()
+    def created(self, path):
+        """Return the created timestamp of a file as a datetime.datetime"""
+        raise NotImplementedError
+    def modified(self, path):
+        """Return the modified timestamp of a file as a datetime.datetime"""
+        raise NotImplementedError
+    def tree(
+        self,
+        path: str = "/",
+        recursion_limit: int = 2,
+        max_display: int = 25,
+        display_size: bool = False,
+        prefix: str = "",
+        is_last: bool = True,
+        first: bool = True,
+        indent_size: int = 4,
+    ) -> str:
+        """
+        Return a tree-like structure of the filesystem starting from the given path as a string.
+        Parameters
+        ----------
+            path: Root path to start traversal from
+            recursion_limit: Maximum depth of directory traversal
+            max_display: Maximum number of items to display per directory
+            display_size: Whether to display file sizes
+            prefix: Current line prefix for visual tree structure
+            is_last: Whether current item is last in its level
+            first: Whether this is the first call (displays root path)
+            indent_size: Number of spaces by indent
+        Returns
+        -------
+            str: A string representing the tree structure.
+        Example
+        -------
+            >>> from fsspec import filesystem
+            >>> fs = filesystem('ftp', host='test.rebex.net', user='demo', password='password')
+            >>> tree = fs.tree(display_size=True, recursion_limit=3, indent_size=8, max_display=10)
+            >>> print(tree)
+        """
+        def format_bytes(n: int) -> str:
+            """Format bytes as text."""
+            for prefix, k in (
+                ("P", 2**50),
+                ("T", 2**40),
+                ("G", 2**30),
+                ("M", 2**20),
+                ("k", 2**10),
+            ):
+                if n >= 0.9 * k:
+                    return f"{n / k:.2f} {prefix}b"
+            return f"{n}B"
+        result = []
+        if first:
+            result.append(path)
+        if recursion_limit:
+            indent = " " * indent_size
+            contents = self.ls(path, detail=True)
+            contents.sort(
+                key=lambda x: (x.get("type") != "directory", x.get("name", ""))
+            )
+            if max_display is not None and len(contents) > max_display:
+                displayed_contents = contents[:max_display]
+                remaining_count = len(contents) - max_display
+            else:
+                displayed_contents = contents
+                remaining_count = 0
+            for i, item in enumerate(displayed_contents):
+                is_last_item = (i == len(displayed_contents) - 1) and (
+                    remaining_count == 0
+                )
+                branch = (
+                    "└" + ("─" * (indent_size - 2))
+                    if is_last_item
+                    else "├" + ("─" * (indent_size - 2))
+                )
+                branch += " "
+                new_prefix = prefix + (
+                    indent if is_last_item else "│" + " " * (indent_size - 1)
+                )
+                name = os.path.basename(item.get("name", ""))
+                if display_size and item.get("type") == "directory":
+                    sub_contents = self.ls(item.get("name", ""), detail=True)
+                    num_files = sum(
+                        1 for sub_item in sub_contents if sub_item.get("type") == "file"
+                    )
+                    num_folders = sum(
+                        1
+                        for sub_item in sub_contents
+                        if sub_item.get("type") == "directory"
+                    )
+                    if num_files == 0 and num_folders == 0:
+                        size = " (empty folder)"
+                    elif num_files == 0:
+                        size = f" ({num_folders} subfolder{'s' if num_folders > 1 else ''})"
+                    elif num_folders == 0:
+                        size = f" ({num_files} file{'s' if num_files > 1 else ''})"
+                    else:
+                        size = f" ({num_files} file{'s' if num_files > 1 else ''}, {num_folders} subfolder{'s' if num_folders > 1 else ''})"
+                elif display_size and item.get("type") == "file":
+                    size = f" ({format_bytes(item.get('size', 0))})"
+                else:
+                    size = ""
+                result.append(f"{prefix}{branch}{name}{size}")
+                if item.get("type") == "directory" and recursion_limit > 0:
+                    result.append(
+                        self.tree(
+                            path=item.get("name", ""),
+                            recursion_limit=recursion_limit - 1,
+                            max_display=max_display,
+                            display_size=display_size,
+                            prefix=new_prefix,
+                            is_last=is_last_item,
+                            first=False,
+                            indent_size=indent_size,
+                        )
+                    )
+            if remaining_count > 0:
+                more_message = f"{remaining_count} more item(s) not displayed."
+                result.append(
+                    f"{prefix}{'└' + ('─' * (indent_size - 2))} {more_message}"
+                )
+        return "\n".join(_ for _ in result if _)
+    # ------------------------------------------------------------------------
+    # Aliases
+    def read_bytes(self, path, start=None, end=None, **kwargs):
+        """Alias of `AbstractFileSystem.cat_file`."""
+        return self.cat_file(path, start=start, end=end, **kwargs)
+    def write_bytes(self, path, value, **kwargs):
+        """Alias of `AbstractFileSystem.pipe_file`."""
+        self.pipe_file(path, value, **kwargs)
+    def makedir(self, path, create_parents=True, **kwargs):
+        """Alias of `AbstractFileSystem.mkdir`."""
+        return self.mkdir(path, create_parents=create_parents, **kwargs)
+    def mkdirs(self, path, exist_ok=False):
+        """Alias of `AbstractFileSystem.makedirs`."""
+        return self.makedirs(path, exist_ok=exist_ok)
+    def listdir(self, path, detail=True, **kwargs):
+        """Alias of `AbstractFileSystem.ls`."""
+        return self.ls(path, detail=detail, **kwargs)
+    def cp(self, path1, path2, **kwargs):
+        """Alias of `AbstractFileSystem.copy`."""
+        return self.copy(path1, path2, **kwargs)
+    def move(self, path1, path2, **kwargs):
+        """Alias of `AbstractFileSystem.mv`."""
+        return self.mv(path1, path2, **kwargs)
+    def stat(self, path, **kwargs):
+        """Alias of `AbstractFileSystem.info`."""
+        return self.info(path, **kwargs)
+    def disk_usage(self, path, total=True, maxdepth=None, **kwargs):
+        """Alias of `AbstractFileSystem.du`."""
+        return self.du(path, total=total, maxdepth=maxdepth, **kwargs)
+    def rename(self, path1, path2, **kwargs):
+        """Alias of `AbstractFileSystem.mv`."""
+        return self.mv(path1, path2, **kwargs)
+    def delete(self, path, recursive=False, maxdepth=None):
+        """Alias of `AbstractFileSystem.rm`."""
+        return self.rm(path, recursive=recursive, maxdepth=maxdepth)
+    def upload(self, lpath, rpath, recursive=False, **kwargs):
+        """Alias of `AbstractFileSystem.put`."""
+        return self.put(lpath, rpath, recursive=recursive, **kwargs)
+    def download(self, rpath, lpath, recursive=False, **kwargs):
+        """Alias of `AbstractFileSystem.get`."""
+        return self.get(rpath, lpath, recursive=recursive, **kwargs)
+    def sign(self, path, expiration=100, **kwargs):
+        """Create a signed URL representing the given path
+        Some implementations allow temporary URLs to be generated, as a
+        way of delegating credentials.
+        Parameters
+        ----------
+        path : str
+             The path on the filesystem
+        expiration : int
+            Number of seconds to enable the URL for (if supported)
+        Returns
+        -------
+        URL : str
+            The signed URL
+        Raises
+        ------
+        NotImplementedError : if method is not implemented for a filesystem
+        """
+        raise NotImplementedError("Sign is not implemented for this filesystem")
+    def _isfilestore(self):
+        # Originally inherited from pyarrow DaskFileSystem. Keeping this
+        # here for backwards compatibility as long as pyarrow uses its
+        # legacy fsspec-compatible filesystems and thus accepts fsspec
+        # filesystems as well
+        return False
+class AbstractBufferedFile(io.IOBase):
+    """Convenient class to derive from to provide buffering
+    In the case that the backend does not provide a pythonic file-like object
+    already, this class contains much of the logic to build one. The only
+    methods that need to be overridden are ``_upload_chunk``,
+    ``_initiate_upload`` and ``_fetch_range``.
+    """
+    DEFAULT_BLOCK_SIZE = 5 * 2**20
+    _details = None
+    def __init__(
+        self,
+        fs,
+        path,
+        mode="rb",
+        block_size="default",
+        autocommit=True,
+        cache_type="readahead",
+        cache_options=None,
+        size=None,
+        **kwargs,
+    ):
+        """
+        Template for files with buffered reading and writing
+        Parameters
+        ----------
+        fs: instance of FileSystem
+        path: str
+            location in file-system
+        mode: str
+            Normal file modes. Currently only 'wb', 'ab' or 'rb'. Some file
+            systems may be read-only, and some may not support append.
+        block_size: int
+            Buffer size for reading or writing, 'default' for class default
+        autocommit: bool
+            Whether to write to final destination; may only impact what
+            happens when file is being closed.
+        cache_type: {"readahead", "none", "mmap", "bytes"}, default "readahead"
+            Caching policy in read mode. See the definitions in ``core``.
+        cache_options : dict
+            Additional options passed to the constructor for the cache specified
+            by `cache_type`.
+        size: int
+            If given and in read mode, suppressed having to look up the file size
+        kwargs:
+            Gets stored as self.kwargs
+        """
+        from .core import caches
+        self.path = path
+        self.fs = fs
+        self.mode = mode
+        self.blocksize = (
+            self.DEFAULT_BLOCK_SIZE if block_size in ["default", None] else block_size
+        )
+        self.loc = 0
+        self.autocommit = autocommit
+        self.end = None
+        self.start = None
+        self.closed = False
+        if cache_options is None:
+            cache_options = {}
+        if "trim" in kwargs:
+            warnings.warn(
+                "Passing 'trim' to control the cache behavior has been deprecated. "
+                "Specify it within the 'cache_options' argument instead.",
+                FutureWarning,
+            )
+            cache_options["trim"] = kwargs.pop("trim")
+        self.kwargs = kwargs
+        if mode not in {"ab", "rb", "wb", "xb"}:
+            raise NotImplementedError("File mode not supported")
+        if mode == "rb":
+            if size is not None:
+                self.size = size
+            else:
+                self.size = self.details["size"]
+            self.cache = caches[cache_type](
+                self.blocksize, self._fetch_range, self.size, **cache_options
+            )
+        else:
+            self.buffer = io.BytesIO()
+            self.offset = None
+            self.forced = False
+            self.location = None
+    @property
+    def details(self):
+        if self._details is None:
+            self._details = self.fs.info(self.path)
+        return self._details
+    @details.setter
+    def details(self, value):
+        self._details = value
+        self.size = value["size"]
+    @property
+    def full_name(self):
+        return _unstrip_protocol(self.path, self.fs)
+    @property
+    def closed(self):
+        # get around this attr being read-only in IOBase
+        # use getattr here, since this can be called during del
+        return getattr(self, "_closed", True)
+    @closed.setter
+    def closed(self, c):
+        self._closed = c
+    def __hash__(self):
+        if "w" in self.mode:
+            return id(self)
+        else:
+            return int(tokenize(self.details), 16)
+    def __eq__(self, other):
+        """Files are equal if they have the same checksum, only in read mode"""
+        if self is other:
+            return True
+        return (
+            isinstance(other, type(self))
+            and self.mode == "rb"
+            and other.mode == "rb"
+            and hash(self) == hash(other)
+        )
+    def commit(self):
+        """Move from temp to final destination"""
+    def discard(self):
+        """Throw away temporary file"""
+    def info(self):
+        """File information about this path"""
+        if self.readable():
+            return self.details
+        else:
+            raise ValueError("Info not available while writing")
+    def tell(self):
+        """Current file location"""
+        return self.loc
+    def seek(self, loc, whence=0):
+        """Set current file location
+        Parameters
+        ----------
+        loc: int
+            byte location
+        whence: {0, 1, 2}
+            from start of file, current location or end of file, resp.
+        """
+        loc = int(loc)
+        if not self.mode == "rb":
+            raise OSError(ESPIPE, "Seek only available in read mode")
+        if whence == 0:
+            nloc = loc
+        elif whence == 1:
+            nloc = self.loc + loc
+        elif whence == 2:
+            nloc = self.size + loc
+        else:
+            raise ValueError(f"invalid whence ({whence}, should be 0, 1 or 2)")
+        if nloc < 0:
+            raise ValueError("Seek before start of file")
+        self.loc = nloc
+        return self.loc
+    def write(self, data):
+        """
+        Write data to buffer.
+        Buffer only sent on flush() or if buffer is greater than
+        or equal to blocksize.
+        Parameters
+        ----------
+        data: bytes
+            Set of bytes to be written.
+        """
+        if not self.writable():
+            raise ValueError("File not in write mode")
+        if self.closed:
+            raise ValueError("I/O operation on closed file.")
+        if self.forced:
+            raise ValueError("This file has been force-flushed, can only close")
+        out = self.buffer.write(data)
+        self.loc += out
+        if self.buffer.tell() >= self.blocksize:
+            self.flush()
+        return out
+    def flush(self, force=False):
+        """
+        Write buffered data to backend store.
+        Writes the current buffer, if it is larger than the block-size, or if
+        the file is being closed.
+        Parameters
+        ----------
+        force: bool
+            When closing, write the last block even if it is smaller than
+            blocks are allowed to be. Disallows further writing to this file.
+        """
+        if self.closed:
+            raise ValueError("Flush on closed file")
+        if force and self.forced:
+            raise ValueError("Force flush cannot be called more than once")
+        if force:
+            self.forced = True
+        if self.readable():
+            # no-op to flush on read-mode
+            return
+        if not force and self.buffer.tell() < self.blocksize:
+            # Defer write on small block
+            return
+        if self.offset is None:
+            # Initialize a multipart upload
+            self.offset = 0
+            try:
+                self._initiate_upload()
+            except:
+                self.closed = True
+                raise
+        if self._upload_chunk(final=force) is not False:
+            self.offset += self.buffer.seek(0, 2)
+            self.buffer = io.BytesIO()
+    def _upload_chunk(self, final=False):
+        """Write one part of a multi-block file upload
+        Parameters
+        ==========
+        final: bool
+            This is the last block, so should complete file, if
+            self.autocommit is True.
+        """
+        # may not yet have been initialized, may need to call _initialize_upload
+    def _initiate_upload(self):
+        """Create remote file/upload"""
+        pass
+    def _fetch_range(self, start, end):
+        """Get the specified set of bytes from remote"""
+        return self.fs.cat_file(self.path, start=start, end=end)
+    def read(self, length=-1):
+        """
+        Return data from cache, or fetch pieces as necessary
+        Parameters
+        ----------
+        length: int (-1)
+            Number of bytes to read; if <0, all remaining bytes.
+        """
+        length = -1 if length is None else int(length)
+        if self.mode != "rb":
+            raise ValueError("File not in read mode")
+        if length < 0:
+            length = self.size - self.loc
+        if self.closed:
+            raise ValueError("I/O operation on closed file.")
+        if length == 0:
+            # don't even bother calling fetch
+            return b""
+        out = self.cache._fetch(self.loc, self.loc + length)
+        logger.debug(
+            "%s read: %i - %i %s",
+            self,
+            self.loc,
+            self.loc + length,
+            self.cache._log_stats(),
+        )
+        self.loc += len(out)
+        return out
+    def readinto(self, b):
+        """mirrors builtin file's readinto method
+        https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
+        """
+        out = memoryview(b).cast("B")
+        data = self.read(out.nbytes)
+        out[: len(data)] = data
+        return len(data)
+    def readuntil(self, char=b"\n", blocks=None):
+        """Return data between current position and first occurrence of char
+        char is included in the output, except if the end of the tile is
+        encountered first.
+        Parameters
+        ----------
+        char: bytes
+            Thing to find
+        blocks: None or int
+            How much to read in each go. Defaults to file blocksize - which may
+            mean a new read on every call.
+        """
+        out = []
+        while True:
+            start = self.tell()
+            part = self.read(blocks or self.blocksize)
+            if len(part) == 0:
+                break
+            found = part.find(char)
+            if found > -1:
+                out.append(part[: found + len(char)])
+                self.seek(start + found + len(char))
+                break
+            out.append(part)
+        return b"".join(out)
+    def readline(self):
+        """Read until and including the first occurrence of newline character
+        Note that, because of character encoding, this is not necessarily a
+        true line ending.
+        """
+        return self.readuntil(b"\n")
+    def __next__(self):
+        out = self.readline()
+        if out:
+            return out
+        raise StopIteration
+    def __iter__(self):
+        return self
+    def readlines(self):
+        """Return all data, split by the newline character, including the newline character"""
+        data = self.read()
+        lines = data.split(b"\n")
+        out = [l + b"\n" for l in lines[:-1]]
+        if data.endswith(b"\n"):
+            return out
+        else:
+            return out + [lines[-1]]
+        # return list(self)  ???
+    def readinto1(self, b):
+        return self.readinto(b)
+    def close(self):
+        """Close file
+        Finalizes writes, discards cache
+        """
+        if getattr(self, "_unclosable", False):
+            return
+        if self.closed:
+            return
+        try:
+            if self.mode == "rb":
+                self.cache = None
+            else:
+                if not self.forced:
+                    self.flush(force=True)
+                if self.fs is not None:
+                    self.fs.invalidate_cache(self.path)
+                    self.fs.invalidate_cache(self.fs._parent(self.path))
+        finally:
+            self.closed = True
+    def readable(self):
+        """Whether opened for reading"""
+        return "r" in self.mode and not self.closed
+    def seekable(self):
+        """Whether is seekable (only in read mode)"""
+        return self.readable()
+    def writable(self):
+        """Whether opened for writing"""
+        return self.mode in {"wb", "ab", "xb"} and not self.closed
+    def __reduce__(self):
+        if self.mode != "rb":
+            raise RuntimeError("Pickling a writeable file is not supported")
+        return reopen, (
+            self.fs,
+            self.path,
+            self.mode,
+            self.blocksize,
+            self.loc,
+            self.size,
+            self.autocommit,
+            self.cache.name if self.cache else "none",
+            self.kwargs,
+        )
+    def __del__(self):
+        if not self.closed:
+            self.close()
+    def __str__(self):
+        return f"<File-like object {type(self.fs).__name__}, {self.path}>"
+    __repr__ = __str__
+    def __enter__(self):
+        return self
+    def __exit__(self, *args):
+        self.close()
+def reopen(fs, path, mode, blocksize, loc, size, autocommit, cache_type, kwargs):
+    file = fs.open(
+        path,
+        mode=mode,
+        block_size=blocksize,
+        autocommit=autocommit,
+        cache_type=cache_type,
+        size=size,
+        **kwargs,
+    )
+    if loc > 0:
+        file.seek(loc)
+    return file

venv/lib/python3.10/site-packages/fsspec/transaction.py ADDED Viewed

	@@ -0,0 +1,90 @@

+from collections import deque
+class Transaction:
+    """Filesystem transaction write context
+    Gathers files for deferred commit or discard, so that several write
+    operations can be finalized semi-atomically. This works by having this
+    instance as the ``.transaction`` attribute of the given filesystem
+    """
+    def __init__(self, fs, **kwargs):
+        """
+        Parameters
+        ----------
+        fs: FileSystem instance
+        """
+        self.fs = fs
+        self.files = deque()
+    def __enter__(self):
+        self.start()
+        return self
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        """End transaction and commit, if exit is not due to exception"""
+        # only commit if there was no exception
+        self.complete(commit=exc_type is None)
+        if self.fs:
+            self.fs._intrans = False
+            self.fs._transaction = None
+            self.fs = None
+    def start(self):
+        """Start a transaction on this FileSystem"""
+        self.files = deque()  # clean up after previous failed completions
+        self.fs._intrans = True
+    def complete(self, commit=True):
+        """Finish transaction: commit or discard all deferred files"""
+        while self.files:
+            f = self.files.popleft()
+            if commit:
+                f.commit()
+            else:
+                f.discard()
+        self.fs._intrans = False
+        self.fs._transaction = None
+        self.fs = None
+class FileActor:
+    def __init__(self):
+        self.files = []
+    def commit(self):
+        for f in self.files:
+            f.commit()
+        self.files.clear()
+    def discard(self):
+        for f in self.files:
+            f.discard()
+        self.files.clear()
+    def append(self, f):
+        self.files.append(f)
+class DaskTransaction(Transaction):
+    def __init__(self, fs):
+        """
+        Parameters
+        ----------
+        fs: FileSystem instance
+        """
+        import distributed
+        super().__init__(fs)
+        client = distributed.default_client()
+        self.files = client.submit(FileActor, actor=True).result()
+    def complete(self, commit=True):
+        """Finish transaction: commit or discard all deferred files"""
+        if commit:
+            self.files.commit().result()
+        else:
+            self.files.discard().result()
+        self.fs._intrans = False
+        self.fs = None

venv/lib/python3.10/site-packages/fsspec/utils.py ADDED Viewed

	@@ -0,0 +1,748 @@

+from __future__ import annotations
+import contextlib
+import logging
+import math
+import os
+import re
+import sys
+import tempfile
+from collections.abc import Callable, Iterable, Iterator, Sequence
+from functools import partial
+from hashlib import md5
+from importlib.metadata import version
+from typing import IO, TYPE_CHECKING, Any, TypeVar
+from urllib.parse import urlsplit
+if TYPE_CHECKING:
+    import pathlib
+    from typing import TypeGuard
+    from fsspec.spec import AbstractFileSystem
+DEFAULT_BLOCK_SIZE = 5 * 2**20
+T = TypeVar("T")
+def infer_storage_options(
+    urlpath: str, inherit_storage_options: dict[str, Any] | None = None
+) -> dict[str, Any]:
+    """Infer storage options from URL path and merge it with existing storage
+    options.
+    Parameters
+    ----------
+    urlpath: str or unicode
+        Either local absolute file path or URL (hdfs://namenode:8020/file.csv)
+    inherit_storage_options: dict (optional)
+        Its contents will get merged with the inferred information from the
+        given path
+    Returns
+    -------
+    Storage options dict.
+    Examples
+    --------
+    >>> infer_storage_options('/mnt/datasets/test.csv')  # doctest: +SKIP
+    {"protocol": "file", "path", "/mnt/datasets/test.csv"}
+    >>> infer_storage_options(
+    ...     'hdfs://username:pwd@node:123/mnt/datasets/test.csv?q=1',
+    ...     inherit_storage_options={'extra': 'value'},
+    ... )  # doctest: +SKIP
+    {"protocol": "hdfs", "username": "username", "password": "pwd",
+    "host": "node", "port": 123, "path": "/mnt/datasets/test.csv",
+    "url_query": "q=1", "extra": "value"}
+    """
+    # Handle Windows paths including disk name in this special case
+    if (
+        re.match(r"^[a-zA-Z]:[\\/]", urlpath)
+        or re.match(r"^[a-zA-Z0-9]+://", urlpath) is None
+    ):
+        return {"protocol": "file", "path": urlpath}
+    parsed_path = urlsplit(urlpath)
+    protocol = parsed_path.scheme or "file"
+    if parsed_path.fragment:
+        path = "#".join([parsed_path.path, parsed_path.fragment])
+    else:
+        path = parsed_path.path
+    if protocol == "file":
+        # Special case parsing file protocol URL on Windows according to:
+        # https://msdn.microsoft.com/en-us/library/jj710207.aspx
+        windows_path = re.match(r"^/([a-zA-Z])[:|]([\\/].*)$", path)
+        if windows_path:
+            drive, path = windows_path.groups()
+            path = f"{drive}:{path}"
+    if protocol in ["http", "https"]:
+        # for HTTP, we don't want to parse, as requests will anyway
+        return {"protocol": protocol, "path": urlpath}
+    options: dict[str, Any] = {"protocol": protocol, "path": path}
+    if parsed_path.netloc:
+        # Parse `hostname` from netloc manually because `parsed_path.hostname`
+        # lowercases the hostname which is not always desirable (e.g. in S3):
+        # https://github.com/dask/dask/issues/1417
+        options["host"] = parsed_path.netloc.rsplit("@", 1)[-1].rsplit(":", 1)[0]
+        if protocol in ("s3", "s3a", "gcs", "gs"):
+            options["path"] = options["host"] + options["path"]
+        else:
+            options["host"] = options["host"]
+        if parsed_path.port:
+            options["port"] = parsed_path.port
+        if parsed_path.username:
+            options["username"] = parsed_path.username
+        if parsed_path.password:
+            options["password"] = parsed_path.password
+    if parsed_path.query:
+        options["url_query"] = parsed_path.query
+    if parsed_path.fragment:
+        options["url_fragment"] = parsed_path.fragment
+    if inherit_storage_options:
+        update_storage_options(options, inherit_storage_options)
+    return options
+def update_storage_options(
+    options: dict[str, Any], inherited: dict[str, Any] | None = None
+) -> None:
+    if not inherited:
+        inherited = {}
+    collisions = set(options) & set(inherited)
+    if collisions:
+        for collision in collisions:
+            if options.get(collision) != inherited.get(collision):
+                raise KeyError(
+                    f"Collision between inferred and specified storage "
+                    f"option:\n{collision}"
+                )
+    options.update(inherited)
+# Compression extensions registered via fsspec.compression.register_compression
+compressions: dict[str, str] = {}
+def infer_compression(filename: str) -> str | None:
+    """Infer compression, if available, from filename.
+    Infer a named compression type, if registered and available, from filename
+    extension. This includes builtin (gz, bz2, zip) compressions, as well as
+    optional compressions. See fsspec.compression.register_compression.
+    """
+    extension = os.path.splitext(filename)[-1].strip(".").lower()
+    if extension in compressions:
+        return compressions[extension]
+    return None
+def build_name_function(max_int: float) -> Callable[[int], str]:
+    """Returns a function that receives a single integer
+    and returns it as a string padded by enough zero characters
+    to align with maximum possible integer
+    >>> name_f = build_name_function(57)
+    >>> name_f(7)
+    '07'
+    >>> name_f(31)
+    '31'
+    >>> build_name_function(1000)(42)
+    '0042'
+    >>> build_name_function(999)(42)
+    '042'
+    >>> build_name_function(0)(0)
+    '0'
+    """
+    # handle corner cases max_int is 0 or exact power of 10
+    max_int += 1e-8
+    pad_length = int(math.ceil(math.log10(max_int)))
+    def name_function(i: int) -> str:
+        return str(i).zfill(pad_length)
+    return name_function
+def seek_delimiter(file: IO[bytes], delimiter: bytes, blocksize: int) -> bool:
+    r"""Seek current file to file start, file end, or byte after delimiter seq.
+    Seeks file to next chunk delimiter, where chunks are defined on file start,
+    a delimiting sequence, and file end. Use file.tell() to see location afterwards.
+    Note that file start is a valid split, so must be at offset > 0 to seek for
+    delimiter.
+    Parameters
+    ----------
+    file: a file
+    delimiter: bytes
+        a delimiter like ``b'\n'`` or message sentinel, matching file .read() type
+    blocksize: int
+        Number of bytes to read from the file at once.
+    Returns
+    -------
+    Returns True if a delimiter was found, False if at file start or end.
+    """
+    if file.tell() == 0:
+        # beginning-of-file, return without seek
+        return False
+    # Interface is for binary IO, with delimiter as bytes, but initialize last
+    # with result of file.read to preserve compatibility with text IO.
+    last: bytes | None = None
+    while True:
+        current = file.read(blocksize)
+        if not current:
+            # end-of-file without delimiter
+            return False
+        full = last + current if last else current
+        try:
+            if delimiter in full:
+                i = full.index(delimiter)
+                file.seek(file.tell() - (len(full) - i) + len(delimiter))
+                return True
+            elif len(current) < blocksize:
+                # end-of-file without delimiter
+                return False
+        except (OSError, ValueError):
+            pass
+        last = full[-len(delimiter) :]
+def read_block(
+    f: IO[bytes],
+    offset: int,
+    length: int | None,
+    delimiter: bytes | None = None,
+    split_before: bool = False,
+) -> bytes:
+    """Read a block of bytes from a file
+    Parameters
+    ----------
+    f: File
+        Open file
+    offset: int
+        Byte offset to start read
+    length: int
+        Number of bytes to read, read through end of file if None
+    delimiter: bytes (optional)
+        Ensure reading starts and stops at delimiter bytestring
+    split_before: bool (optional)
+        Start/stop read *before* delimiter bytestring.
+    If using the ``delimiter=`` keyword argument we ensure that the read
+    starts and stops at delimiter boundaries that follow the locations
+    ``offset`` and ``offset + length``.  If ``offset`` is zero then we
+    start at zero, regardless of delimiter.  The bytestring returned WILL
+    include the terminating delimiter string.
+    Examples
+    --------
+    >>> from io import BytesIO  # doctest: +SKIP
+    >>> f = BytesIO(b'Alice, 100\\nBob, 200\\nCharlie, 300')  # doctest: +SKIP
+    >>> read_block(f, 0, 13)  # doctest: +SKIP
+    b'Alice, 100\\nBo'
+    >>> read_block(f, 0, 13, delimiter=b'\\n')  # doctest: +SKIP
+    b'Alice, 100\\nBob, 200\\n'
+    >>> read_block(f, 10, 10, delimiter=b'\\n')  # doctest: +SKIP
+    b'Bob, 200\\nCharlie, 300'
+    """
+    if delimiter:
+        f.seek(offset)
+        found_start_delim = seek_delimiter(f, delimiter, 2**16)
+        if length is None:
+            return f.read()
+        start = f.tell()
+        length -= start - offset
+        f.seek(start + length)
+        found_end_delim = seek_delimiter(f, delimiter, 2**16)
+        end = f.tell()
+        # Adjust split location to before delimiter if seek found the
+        # delimiter sequence, not start or end of file.
+        if found_start_delim and split_before:
+            start -= len(delimiter)
+        if found_end_delim and split_before:
+            end -= len(delimiter)
+        offset = start
+        length = end - start
+    f.seek(offset)
+    # TODO: allow length to be None and read to the end of the file?
+    assert length is not None
+    b = f.read(length)
+    return b
+def tokenize(*args: Any, **kwargs: Any) -> str:
+    """Deterministic token
+    (modified from dask.base)
+    >>> tokenize([1, 2, '3'])
+    '9d71491b50023b06fc76928e6eddb952'
+    >>> tokenize('Hello') == tokenize('Hello')
+    True
+    """
+    if kwargs:
+        args += (kwargs,)
+    try:
+        h = md5(str(args).encode())
+    except ValueError:
+        # FIPS systems: https://github.com/fsspec/filesystem_spec/issues/380
+        h = md5(str(args).encode(), usedforsecurity=False)
+    return h.hexdigest()
+def stringify_path(filepath: str | os.PathLike[str] | pathlib.Path) -> str:
+    """Attempt to convert a path-like object to a string.
+    Parameters
+    ----------
+    filepath: object to be converted
+    Returns
+    -------
+    filepath_str: maybe a string version of the object
+    Notes
+    -----
+    Objects supporting the fspath protocol are coerced according to its
+    __fspath__ method.
+    For backwards compatibility with older Python version, pathlib.Path
+    objects are specially coerced.
+    Any other object is passed through unchanged, which includes bytes,
+    strings, buffers, or anything else that's not even path-like.
+    """
+    if isinstance(filepath, str):
+        return filepath
+    elif hasattr(filepath, "__fspath__"):
+        return filepath.__fspath__()
+    elif hasattr(filepath, "path"):
+        return filepath.path
+    else:
+        return filepath  # type: ignore[return-value]
+def make_instance(
+    cls: Callable[..., T], args: Sequence[Any], kwargs: dict[str, Any]
+) -> T:
+    inst = cls(*args, **kwargs)
+    inst._determine_worker()  # type: ignore[attr-defined]
+    return inst
+def common_prefix(paths: Iterable[str]) -> str:
+    """For a list of paths, find the shortest prefix common to all"""
+    parts = [p.split("/") for p in paths]
+    lmax = min(len(p) for p in parts)
+    end = 0
+    for i in range(lmax):
+        end = all(p[i] == parts[0][i] for p in parts)
+        if not end:
+            break
+    i += end
+    return "/".join(parts[0][:i])
+def other_paths(
+    paths: list[str],
+    path2: str | list[str],
+    exists: bool = False,
+    flatten: bool = False,
+) -> list[str]:
+    """In bulk file operations, construct a new file tree from a list of files
+    Parameters
+    ----------
+    paths: list of str
+        The input file tree
+    path2: str or list of str
+        Root to construct the new list in. If this is already a list of str, we just
+        assert it has the right number of elements.
+    exists: bool (optional)
+        For a str destination, it is already exists (and is a dir), files should
+        end up inside.
+    flatten: bool (optional)
+        Whether to flatten the input directory tree structure so that the output files
+        are in the same directory.
+    Returns
+    -------
+    list of str
+    """
+    if isinstance(path2, str):
+        path2 = path2.rstrip("/")
+        if flatten:
+            path2 = ["/".join((path2, p.split("/")[-1])) for p in paths]
+        else:
+            cp = common_prefix(paths)
+            if exists:
+                cp = cp.rsplit("/", 1)[0]
+            if not cp and all(not s.startswith("/") for s in paths):
+                path2 = ["/".join([path2, p]) for p in paths]
+            else:
+                path2 = [p.replace(cp, path2, 1) for p in paths]
+    else:
+        assert len(paths) == len(path2)
+    return path2
+def is_exception(obj: Any) -> bool:
+    return isinstance(obj, BaseException)
+def isfilelike(f: Any) -> TypeGuard[IO[bytes]]:
+    return all(hasattr(f, attr) for attr in ["read", "close", "tell"])
+def get_protocol(url: str) -> str:
+    url = stringify_path(url)
+    parts = re.split(r"(\:\:|\://)", url, maxsplit=1)
+    if len(parts) > 1:
+        return parts[0]
+    return "file"
+def get_file_extension(url: str) -> str:
+    url = stringify_path(url)
+    ext_parts = url.rsplit(".", 1)
+    if len(ext_parts) > 1:
+        return ext_parts[-1]
+    return ""
+def can_be_local(path: str) -> bool:
+    """Can the given URL be used with open_local?"""
+    from fsspec import get_filesystem_class
+    try:
+        return getattr(get_filesystem_class(get_protocol(path)), "local_file", False)
+    except (ValueError, ImportError):
+        # not in registry or import failed
+        return False
+def get_package_version_without_import(name: str) -> str | None:
+    """For given package name, try to find the version without importing it
+    Import and package.__version__ is still the backup here, so an import
+    *might* happen.
+    Returns either the version string, or None if the package
+    or the version was not readily  found.
+    """
+    if name in sys.modules:
+        mod = sys.modules[name]
+        if hasattr(mod, "__version__"):
+            return mod.__version__
+    try:
+        return version(name)
+    except:  # noqa: E722
+        pass
+    try:
+        import importlib
+        mod = importlib.import_module(name)
+        return mod.__version__
+    except (ImportError, AttributeError):
+        return None
+def setup_logging(
+    logger: logging.Logger | None = None,
+    logger_name: str | None = None,
+    level: str = "DEBUG",
+    clear: bool = True,
+) -> logging.Logger:
+    if logger is None and logger_name is None:
+        raise ValueError("Provide either logger object or logger name")
+    logger = logger or logging.getLogger(logger_name)
+    handle = logging.StreamHandler()
+    formatter = logging.Formatter(
+        "%(asctime)s - %(name)s - %(levelname)s - %(funcName)s -- %(message)s"
+    )
+    handle.setFormatter(formatter)
+    if clear:
+        logger.handlers.clear()
+    logger.addHandler(handle)
+    logger.setLevel(level)
+    return logger
+def _unstrip_protocol(name: str, fs: AbstractFileSystem) -> str:
+    return fs.unstrip_protocol(name)
+def mirror_from(
+    origin_name: str, methods: Iterable[str]
+) -> Callable[[type[T]], type[T]]:
+    """Mirror attributes and methods from the given
+    origin_name attribute of the instance to the
+    decorated class"""
+    def origin_getter(method: str, self: Any) -> Any:
+        origin = getattr(self, origin_name)
+        return getattr(origin, method)
+    def wrapper(cls: type[T]) -> type[T]:
+        for method in methods:
+            wrapped_method = partial(origin_getter, method)
+            setattr(cls, method, property(wrapped_method))
+        return cls
+    return wrapper
+@contextlib.contextmanager
+def nullcontext(obj: T) -> Iterator[T]:
+    yield obj
+def merge_offset_ranges(
+    paths: list[str],
+    starts: list[int] | int,
+    ends: list[int] | int,
+    max_gap: int = 0,
+    max_block: int | None = None,
+    sort: bool = True,
+) -> tuple[list[str], list[int], list[int]]:
+    """Merge adjacent byte-offset ranges when the inter-range
+    gap is <= `max_gap`, and when the merged byte range does not
+    exceed `max_block` (if specified). By default, this function
+    will re-order the input paths and byte ranges to ensure sorted
+    order. If the user can guarantee that the inputs are already
+    sorted, passing `sort=False` will skip the re-ordering.
+    """
+    # Check input
+    if not isinstance(paths, list):
+        raise TypeError
+    if not isinstance(starts, list):
+        starts = [starts] * len(paths)
+    if not isinstance(ends, list):
+        ends = [ends] * len(paths)
+    if len(starts) != len(paths) or len(ends) != len(paths):
+        raise ValueError
+    # Early Return
+    if len(starts) <= 1:
+        return paths, starts, ends
+    starts = [s or 0 for s in starts]
+    # Sort by paths and then ranges if `sort=True`
+    if sort:
+        paths, starts, ends = (
+            list(v)
+            for v in zip(
+                *sorted(
+                    zip(paths, starts, ends),
+                )
+            )
+        )
+    remove = []
+    for i, (path, start, end) in enumerate(zip(paths, starts, ends)):
+        if any(
+            e is not None and p == path and start >= s and end <= e and i != i2
+            for i2, (p, s, e) in enumerate(zip(paths, starts, ends))
+        ):
+            remove.append(i)
+    paths = [p for i, p in enumerate(paths) if i not in remove]
+    starts = [s for i, s in enumerate(starts) if i not in remove]
+    ends = [e for i, e in enumerate(ends) if i not in remove]
+    if paths:
+        # Loop through the coupled `paths`, `starts`, and
+        # `ends`, and merge adjacent blocks when appropriate
+        new_paths = paths[:1]
+        new_starts = starts[:1]
+        new_ends = ends[:1]
+        for i in range(1, len(paths)):
+            if paths[i] == paths[i - 1] and new_ends[-1] is None:
+                continue
+            elif (
+                paths[i] != paths[i - 1]
+                or ((starts[i] - new_ends[-1]) > max_gap)
+                or (max_block is not None and (ends[i] - new_starts[-1]) > max_block)
+            ):
+                # Cannot merge with previous block.
+                # Add new `paths`, `starts`, and `ends` elements
+                new_paths.append(paths[i])
+                new_starts.append(starts[i])
+                new_ends.append(ends[i])
+            else:
+                # Merge with the previous block by updating the
+                # last element of `ends`
+                new_ends[-1] = ends[i]
+        return new_paths, new_starts, new_ends
+    # `paths` is empty. Just return input lists
+    return paths, starts, ends
+def file_size(filelike: IO[bytes]) -> int:
+    """Find length of any open read-mode file-like"""
+    pos = filelike.tell()
+    try:
+        return filelike.seek(0, 2)
+    finally:
+        filelike.seek(pos)
+@contextlib.contextmanager
+def atomic_write(path: str, mode: str = "wb"):
+    """
+    A context manager that opens a temporary file next to `path` and, on exit,
+    replaces `path` with the temporary file, thereby updating `path`
+    atomically.
+    """
+    fd, fn = tempfile.mkstemp(
+        dir=os.path.dirname(path), prefix=os.path.basename(path) + "-"
+    )
+    try:
+        with open(fd, mode) as fp:
+            yield fp
+    except BaseException:
+        with contextlib.suppress(FileNotFoundError):
+            os.unlink(fn)
+        raise
+    else:
+        os.replace(fn, path)
+def _translate(pat, STAR, QUESTION_MARK):
+    # Copied from: https://github.com/python/cpython/pull/106703.
+    res: list[str] = []
+    add = res.append
+    i, n = 0, len(pat)
+    while i < n:
+        c = pat[i]
+        i = i + 1
+        if c == "*":
+            # compress consecutive `*` into one
+            if (not res) or res[-1] is not STAR:
+                add(STAR)
+        elif c == "?":
+            add(QUESTION_MARK)
+        elif c == "[":
+            j = i
+            if j < n and pat[j] == "!":
+                j = j + 1
+            if j < n and pat[j] == "]":
+                j = j + 1
+            while j < n and pat[j] != "]":
+                j = j + 1
+            if j >= n:
+                add("\\[")
+            else:
+                stuff = pat[i:j]
+                if "-" not in stuff:
+                    stuff = stuff.replace("\\", r"\\")
+                else:
+                    chunks = []
+                    k = i + 2 if pat[i] == "!" else i + 1
+                    while True:
+                        k = pat.find("-", k, j)
+                        if k < 0:
+                            break
+                        chunks.append(pat[i:k])
+                        i = k + 1
+                        k = k + 3
+                    chunk = pat[i:j]
+                    if chunk:
+                        chunks.append(chunk)
+                    else:
+                        chunks[-1] += "-"
+                    # Remove empty ranges -- invalid in RE.
+                    for k in range(len(chunks) - 1, 0, -1):
+                        if chunks[k - 1][-1] > chunks[k][0]:
+                            chunks[k - 1] = chunks[k - 1][:-1] + chunks[k][1:]
+                            del chunks[k]
+                    # Escape backslashes and hyphens for set difference (--).
+                    # Hyphens that create ranges shouldn't be escaped.
+                    stuff = "-".join(
+                        s.replace("\\", r"\\").replace("-", r"\-") for s in chunks
+                    )
+                # Escape set operations (&&, ~~ and ||).
+                stuff = re.sub(r"([&~|])", r"\\\1", stuff)
+                i = j + 1
+                if not stuff:
+                    # Empty range: never match.
+                    add("(?!)")
+                elif stuff == "!":
+                    # Negated empty range: match any character.
+                    add(".")
+                else:
+                    if stuff[0] == "!":
+                        stuff = "^" + stuff[1:]
+                    elif stuff[0] in ("^", "["):
+                        stuff = "\\" + stuff
+                    add(f"[{stuff}]")
+        else:
+            add(re.escape(c))
+    assert i == n
+    return res
+def glob_translate(pat):
+    # Copied from: https://github.com/python/cpython/pull/106703.
+    # The keyword parameters' values are fixed to:
+    # recursive=True, include_hidden=True, seps=None
+    """Translate a pathname with shell wildcards to a regular expression."""
+    if os.path.altsep:
+        seps = os.path.sep + os.path.altsep
+    else:
+        seps = os.path.sep
+    escaped_seps = "".join(map(re.escape, seps))
+    any_sep = f"[{escaped_seps}]" if len(seps) > 1 else escaped_seps
+    not_sep = f"[^{escaped_seps}]"
+    one_last_segment = f"{not_sep}+"
+    one_segment = f"{one_last_segment}{any_sep}"
+    any_segments = f"(?:.+{any_sep})?"
+    any_last_segments = ".*"
+    results = []
+    parts = re.split(any_sep, pat)
+    last_part_idx = len(parts) - 1
+    for idx, part in enumerate(parts):
+        if part == "*":
+            results.append(one_segment if idx < last_part_idx else one_last_segment)
+            continue
+        if part == "**":
+            results.append(any_segments if idx < last_part_idx else any_last_segments)
+            continue
+        elif "**" in part:
+            raise ValueError(
+                "Invalid pattern: '**' can only be an entire path component"
+            )
+        if part:
+            results.extend(_translate(part, f"{not_sep}*", not_sep))
+        if idx < last_part_idx:
+            results.append(any_sep)
+    res = "".join(results)
+    return rf"(?s:{res})\Z"

venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/INSTALLER ADDED Viewed

	@@ -0,0 +1 @@


1	+ pip

venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/METADATA ADDED Viewed

	@@ -0,0 +1,625 @@

+Metadata-Version: 2.4
+Name: httpcore
+Version: 1.0.9
+Summary: A minimal low-level HTTP client.
+Project-URL: Documentation, https://www.encode.io/httpcore
+Project-URL: Homepage, https://www.encode.io/httpcore/
+Project-URL: Source, https://github.com/encode/httpcore
+Author-email: Tom Christie <tom@tomchristie.com>
+License-Expression: BSD-3-Clause
+License-File: LICENSE.md
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Web Environment
+Classifier: Framework :: AsyncIO
+Classifier: Framework :: Trio
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: BSD License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Internet :: WWW/HTTP
+Requires-Python: >=3.8
+Requires-Dist: certifi
+Requires-Dist: h11>=0.16
+Provides-Extra: asyncio
+Requires-Dist: anyio<5.0,>=4.0; extra == 'asyncio'
+Provides-Extra: http2
+Requires-Dist: h2<5,>=3; extra == 'http2'
+Provides-Extra: socks
+Requires-Dist: socksio==1.*; extra == 'socks'
+Provides-Extra: trio
+Requires-Dist: trio<1.0,>=0.22.0; extra == 'trio'
+Description-Content-Type: text/markdown
+# HTTP Core
+[![Test Suite](https://github.com/encode/httpcore/workflows/Test%20Suite/badge.svg)](https://github.com/encode/httpcore/actions)
+[![Package version](https://badge.fury.io/py/httpcore.svg)](https://pypi.org/project/httpcore/)
+> *Do one thing, and do it well.*
+The HTTP Core package provides a minimal low-level HTTP client, which does
+one thing only. Sending HTTP requests.
+It does not provide any high level model abstractions over the API,
+does not handle redirects, multipart uploads, building authentication headers,
+transparent HTTP caching, URL parsing, session cookie handling,
+content or charset decoding, handling JSON, environment based configuration
+defaults, or any of that Jazz.
+Some things HTTP Core does do:
+* Sending HTTP requests.
+* Thread-safe / task-safe connection pooling.
+* HTTP(S) proxy & SOCKS proxy support.
+* Supports HTTP/1.1 and HTTP/2.
+* Provides both sync and async interfaces.
+* Async backend support for `asyncio` and `trio`.
+## Requirements
+Python 3.8+
+## Installation
+For HTTP/1.1 only support, install with:
+```shell
+$ pip install httpcore
+```
+There are also a number of optional extras available...
+```shell
+$ pip install httpcore['asyncio,trio,http2,socks']
+```
+## Sending requests
+Send an HTTP request:
+```python
+import httpcore
+response = httpcore.request("GET", "https://www.example.com/")
+print(response)
+# <Response [200]>
+print(response.status)
+# 200
+print(response.headers)
+# [(b'Accept-Ranges', b'bytes'), (b'Age', b'557328'), (b'Cache-Control', b'max-age=604800'), ...]
+print(response.content)
+# b'<!doctype html>\n<html>\n<head>\n<title>Example Domain</title>\n\n<meta charset="utf-8"/>\n ...'
+```
+The top-level `httpcore.request()` function is provided for convenience. In practice whenever you're working with `httpcore` you'll want to use the connection pooling functionality that it provides.
+```python
+import httpcore
+http = httpcore.ConnectionPool()
+response = http.request("GET", "https://www.example.com/")
+```
+Once you're ready to get going, [head over to the documentation](https://www.encode.io/httpcore/).
+## Motivation
+You *probably* don't want to be using HTTP Core directly. It might make sense if
+you're writing something like a proxy service in Python, and you just want
+something at the lowest possible level, but more typically you'll want to use
+a higher level client library, such as `httpx`.
+The motivation for `httpcore` is:
+* To provide a reusable low-level client library, that other packages can then build on top of.
+* To provide a *really clear interface split* between the networking code and client logic,
+  so that each is easier to understand and reason about in isolation.
+## Dependencies
+The `httpcore` package has the following dependencies...
+* `h11`
+* `certifi`
+And the following optional extras...
+* `anyio` - Required by `pip install httpcore['asyncio']`.
+* `trio` - Required by `pip install httpcore['trio']`.
+* `h2` - Required by `pip install httpcore['http2']`.
+* `socksio` - Required by `pip install httpcore['socks']`.
+## Versioning
+We use [SEMVER for our versioning policy](https://semver.org/).
+For changes between package versions please see our [project changelog](CHANGELOG.md).
+We recommend pinning your requirements either the most current major version, or a more specific version range:
+```python
+pip install 'httpcore==1.*'
+```
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
+## Version 1.0.9 (April 24th, 2025)
+- Resolve https://github.com/advisories/GHSA-vqfr-h8mv-ghfj with h11 dependency update. (#1008)
+## Version 1.0.8 (April 11th, 2025)
+- Fix `AttributeError` when importing on Python 3.14. (#1005)
+## Version 1.0.7 (November 15th, 2024)
+- Support `proxy=…` configuration on `ConnectionPool()`. (#974)
+## Version 1.0.6 (October 1st, 2024)
+- Relax `trio` dependency pinning. (#956)
+- Handle `trio` raising `NotImplementedError` on unsupported platforms. (#955)
+- Handle mapping `ssl.SSLError` to `httpcore.ConnectError`. (#918)
+## 1.0.5 (March 27th, 2024)
+- Handle `EndOfStream` exception for anyio backend. (#899)
+- Allow trio `0.25.*` series in package dependancies. (#903)
+## 1.0.4 (February 21st, 2024)
+- Add `target` request extension. (#888)
+- Fix support for connection `Upgrade` and `CONNECT` when some data in the stream has been read. (#882)
+## 1.0.3 (February 13th, 2024)
+- Fix support for async cancellations. (#880)
+- Fix trace extension when used with socks proxy. (#849)
+- Fix SSL context for connections using the "wss" scheme (#869)
+## 1.0.2 (November 10th, 2023)
+- Fix `float("inf")` timeouts in `Event.wait` function. (#846)
+## 1.0.1 (November 3rd, 2023)
+- Fix pool timeout to account for the total time spent retrying. (#823)
+- Raise a neater RuntimeError when the correct async deps are not installed. (#826)
+- Add support for synchronous TLS-in-TLS streams. (#840)
+## 1.0.0 (October 6th, 2023)
+From version 1.0 our async support is now optional, as the package has minimal dependencies by default.
+For async support use either `pip install 'httpcore[asyncio]'` or `pip install 'httpcore[trio]'`.
+The project versioning policy is now explicitly governed by SEMVER. See https://semver.org/.
+- Async support becomes fully optional. (#809)
+- Add support for Python 3.12. (#807)
+## 0.18.0 (September 8th, 2023)
+- Add support for HTTPS proxies. (#745, #786)
+- Drop Python 3.7 support. (#727)
+- Handle `sni_hostname` extension with SOCKS proxy. (#774)
+- Handle HTTP/1.1 half-closed connections gracefully. (#641)
+- Change the type of `Extensions` from `Mapping[Str, Any]` to `MutableMapping[Str, Any]`. (#762)
+## 0.17.3 (July 5th, 2023)
+- Support async cancellations, ensuring that the connection pool is left in a clean state when cancellations occur. (#726)
+- The networking backend interface has [been added to the public API](https://www.encode.io/httpcore/network-backends). Some classes which were previously private implementation detail are now part of the top-level public API. (#699)
+- Graceful handling of HTTP/2 GoAway frames, with requests being transparently retried on a new connection. (#730)
+- Add exceptions when a synchronous `trace callback` is passed to an asynchronous request or an asynchronous `trace callback` is passed to a synchronous request. (#717)
+- Drop Python 3.7 support. (#727)
+## 0.17.2 (May 23th, 2023)
+- Add `socket_options` argument to `ConnectionPool` and `HTTProxy` classes. (#668)
+- Improve logging with per-module logger names. (#690)
+- Add `sni_hostname` request extension. (#696)
+- Resolve race condition during import of `anyio` package. (#692)
+- Enable TCP_NODELAY for all synchronous sockets. (#651)
+## 0.17.1 (May 17th, 2023)
+- If 'retries' is set, then allow retries if an SSL handshake error occurs. (#669)
+- Improve correctness of tracebacks on network exceptions, by raising properly chained exceptions. (#678)
+- Prevent connection-hanging behaviour when HTTP/2 connections are closed by a server-sent 'GoAway' frame. (#679)
+- Fix edge-case exception when removing requests from the connection pool. (#680)
+- Fix pool timeout edge-case. (#688)
+## 0.17.0 (March 16th, 2023)
+- Add DEBUG level logging. (#648)
+- Respect HTTP/2 max concurrent streams when settings updates are sent by server. (#652)
+- Increase the allowable HTTP header size to 100kB. (#647)
+- Add `retries` option to SOCKS proxy classes. (#643)
+## 0.16.3 (December 20th, 2022)
+- Allow `ws` and `wss` schemes. Allows us to properly support websocket upgrade connections. (#625)
+- Forwarding HTTP proxies use a connection-per-remote-host. Required by some proxy implementations. (#637)
+- Don't raise `RuntimeError` when closing a connection pool with active connections. Removes some error cases when cancellations are used. (#631)
+- Lazy import `anyio`, so that it's no longer a hard dependancy, and isn't imported if unused. (#639)
+## 0.16.2 (November 25th, 2022)
+- Revert 'Fix async cancellation behaviour', which introduced race conditions. (#627)
+- Raise `RuntimeError` if attempting to us UNIX domain sockets on Windows. (#619)
+## 0.16.1 (November 17th, 2022)
+- Fix HTTP/1.1 interim informational responses, such as "100 Continue". (#605)
+## 0.16.0 (October 11th, 2022)
+- Support HTTP/1.1 informational responses. (#581)
+- Fix async cancellation behaviour. (#580)
+- Support `h11` 0.14. (#579)
+## 0.15.0 (May 17th, 2022)
+- Drop Python 3.6 support (#535)
+- Ensure HTTP proxy CONNECT requests include `timeout` configuration. (#506)
+- Switch to explicit `typing.Optional` for type hints. (#513)
+- For `trio` map OSError exceptions to `ConnectError`. (#543)
+## 0.14.7 (February 4th, 2022)
+- Requests which raise a PoolTimeout need to be removed from the pool queue. (#502)
+- Fix AttributeError that happened when Socks5Connection were terminated. (#501)
+## 0.14.6 (February 1st, 2022)
+- Fix SOCKS support for `http://` URLs. (#492)
+- Resolve race condition around exceptions during streaming a response. (#491)
+## 0.14.5 (January 18th, 2022)
+- SOCKS proxy support. (#478)
+- Add proxy_auth argument to HTTPProxy. (#481)
+- Improve error message on 'RemoteProtocolError' exception when server disconnects without sending a response. (#479)
+## 0.14.4 (January 5th, 2022)
+- Support HTTP/2 on HTTPS tunnelling proxies. (#468)
+- Fix proxy headers missing on HTTP forwarding. (#456)
+- Only instantiate SSL context if required. (#457)
+- More robust HTTP/2 handling. (#253, #439, #440, #441)
+## 0.14.3 (November 17th, 2021)
+- Fix race condition when removing closed connections from the pool. (#437)
+## 0.14.2 (November 16th, 2021)
+- Failed connections no longer remain in the pool. (Pull #433)
+## 0.14.1 (November 12th, 2021)
+- `max_connections` becomes optional. (Pull #429)
+- `certifi` is now included in the install dependancies. (Pull #428)
+- `h2` is now strictly optional. (Pull #428)
+## 0.14.0 (November 11th, 2021)
+The 0.14 release is a complete reworking of `httpcore`, comprehensively addressing some underlying issues in the connection pooling, as well as substantially redesigning the API to be more user friendly.
+Some of the lower-level API design also makes the components more easily testable in isolation, and the package now has 100% test coverage.
+See [discussion #419](https://github.com/encode/httpcore/discussions/419) for a little more background.
+There's some other neat bits in there too, such as the "trace" extension, which gives a hook into inspecting the internal events that occur during the request/response cycle. This extension is needed for the HTTPX cli, in order to...
+* Log the point at which the connection is established, and the IP/port on which it is made.
+* Determine if the outgoing request should log as HTTP/1.1 or HTTP/2, rather than having to assume it's HTTP/2 if the --http2 flag was passed. (Which may not actually be true.)
+* Log SSL version info / certificate info.
+Note that `curio` support is not currently available in 0.14.0. If you're using `httpcore` with `curio` please get in touch, so we can assess if we ought to prioritize it as a feature or not.
+## 0.13.7 (September 13th, 2021)
+- Fix broken error messaging when URL scheme is missing, or a non HTTP(S) scheme is used. (Pull #403)
+## 0.13.6 (June 15th, 2021)
+### Fixed
+- Close sockets when read or write timeouts occur. (Pull #365)
+## 0.13.5 (June 14th, 2021)
+### Fixed
+- Resolved niggles with AnyIO EOF behaviours. (Pull #358, #362)
+## 0.13.4 (June 9th, 2021)
+### Added
+- Improved error messaging when URL scheme is missing, or a non HTTP(S) scheme is used. (Pull #354)
+### Fixed
+- Switched to `anyio` as the default backend implementation when running with `asyncio`. Resolves some awkward [TLS timeout issues](https://github.com/encode/httpx/discussions/1511).
+## 0.13.3 (May 6th, 2021)
+### Added
+- Support HTTP/2 prior knowledge, using `httpcore.SyncConnectionPool(http1=False)`. (Pull #333)
+### Fixed
+- Handle cases where environment does not provide `select.poll` support. (Pull #331)
+## 0.13.2 (April 29th, 2021)
+### Added
+- Improve error message for specific case of `RemoteProtocolError` where server disconnects without sending a response. (Pull #313)
+## 0.13.1 (April 28th, 2021)
+### Fixed
+- More resiliant testing for closed connections. (Pull #311)
+- Don't raise exceptions on ungraceful connection closes. (Pull #310)
+## 0.13.0 (April 21st, 2021)
+The 0.13 release updates the core API in order to match the HTTPX Transport API,
+introduced in HTTPX 0.18 onwards.
+An example of making requests with the new interface is:
+```python
+with httpcore.SyncConnectionPool() as http:
+    status_code, headers, stream, extensions = http.handle_request(
+        method=b'GET',
+        url=(b'https', b'example.org', 443, b'/'),
+        headers=[(b'host', b'example.org'), (b'user-agent', b'httpcore')]
+        stream=httpcore.ByteStream(b''),
+        extensions={}
+    )
+    body = stream.read()
+    print(status_code, body)
+```
+### Changed
+- The `.request()` method is now `handle_request()`. (Pull #296)
+- The `.arequest()` method is now `.handle_async_request()`. (Pull #296)
+- The `headers` argument is no longer optional. (Pull #296)
+- The `stream` argument is no longer optional. (Pull #296)
+- The `ext` argument is now named `extensions`, and is no longer optional. (Pull #296)
+- The `"reason"` extension keyword is now named `"reason_phrase"`. (Pull #296)
+- The `"reason_phrase"` and `"http_version"` extensions now use byte strings for their values. (Pull #296)
+- The `httpcore.PlainByteStream()` class becomes `httpcore.ByteStream()`. (Pull #296)
+### Added
+- Streams now support a `.read()` interface. (Pull #296)
+### Fixed
+- Task cancellation no longer leaks connections from the connection pool. (Pull #305)
+## 0.12.3 (December 7th, 2020)
+### Fixed
+- Abort SSL connections on close rather than waiting for remote EOF when using `asyncio`.  (Pull #167)
+- Fix exception raised in case of connect timeouts when using the `anyio` backend. (Pull #236)
+- Fix `Host` header precedence for `:authority` in HTTP/2. (Pull #241, #243)
+- Handle extra edge case when detecting for socket readability when using `asyncio`. (Pull #242, #244)
+- Fix `asyncio` SSL warning when using proxy tunneling. (Pull #249)
+## 0.12.2 (November 20th, 2020)
+### Fixed
+- Properly wrap connect errors on the asyncio backend. (Pull #235)
+- Fix `ImportError` occurring on Python 3.9 when using the HTTP/1.1 sync client in a multithreaded context. (Pull #237)
+## 0.12.1 (November 7th, 2020)
+### Added
+- Add connect retries. (Pull #221)
+### Fixed
+- Tweak detection of dropped connections, resolving an issue with open files limits on Linux. (Pull #185)
+- Avoid leaking connections when establishing an HTTP tunnel to a proxy has failed. (Pull #223)
+- Properly wrap OS errors when using `trio`. (Pull #225)
+## 0.12.0 (October 6th, 2020)
+### Changed
+- HTTP header casing is now preserved, rather than always sent in lowercase. (#216 and python-hyper/h11#104)
+### Added
+- Add Python 3.9 to officially supported versions.
+### Fixed
+- Gracefully handle a stdlib asyncio bug when a connection is closed while it is in a paused-for-reading state. (#201)
+## 0.11.1 (September 28nd, 2020)
+### Fixed
+- Add await to async semaphore release() coroutine (#197)
+- Drop incorrect curio classifier (#192)
+## 0.11.0 (September 22nd, 2020)
+The Transport API with 0.11.0 has a couple of significant changes.
+Firstly we've moved changed the request interface in order to allow extensions, which will later enable us to support features
+such as trailing headers, HTTP/2 server push, and CONNECT/Upgrade connections.
+The interface changes from:
+```python
+def request(method, url, headers, stream, timeout):
+    return (http_version, status_code, reason, headers, stream)
+```
+To instead including an optional dictionary of extensions on the request and response:
+```python
+def request(method, url, headers, stream, ext):
+    return (status_code, headers, stream, ext)
+```
+Having an open-ended extensions point will allow us to add later support for various optional features, that wouldn't otherwise be supported without these API changes.
+In particular:
+* Trailing headers support.
+* HTTP/2 Server Push
+* sendfile.
+* Exposing raw connection on CONNECT, Upgrade, HTTP/2 bi-di streaming.
+* Exposing debug information out of the API, including template name, template context.
+Currently extensions are limited to:
+* request: `timeout` - Optional. Timeout dictionary.
+* response: `http_version` - Optional. Include the HTTP version used on the response.
+* response: `reason` - Optional. Include the reason phrase used on the response. Only valid with HTTP/1.*.
+See https://github.com/encode/httpx/issues/1274#issuecomment-694884553 for the history behind this.
+Secondly, the async version of `request` is now namespaced as `arequest`.
+This allows concrete transports to support both sync and async implementations on the same class.
+### Added
+- Add curio support. (Pull #168)
+- Add anyio support, with `backend="anyio"`. (Pull #169)
+### Changed
+- Update the Transport API to use 'ext' for optional extensions. (Pull #190)
+- Update the Transport API to use `.request` and `.arequest` so implementations can support both sync and async. (Pull #189)
+## 0.10.2 (August 20th, 2020)
+### Added
+- Added Unix Domain Socket support. (Pull #139)
+### Fixed
+- Always include the port on proxy CONNECT requests. (Pull #154)
+- Fix `max_keepalive_connections` configuration. (Pull #153)
+- Fixes behaviour in HTTP/1.1 where server disconnects can be used to signal the end of the response body. (Pull #164)
+## 0.10.1 (August 7th, 2020)
+- Include `max_keepalive_connections` on `AsyncHTTPProxy`/`SyncHTTPProxy` classes.
+## 0.10.0 (August 7th, 2020)
+The most notable change in the 0.10.0 release is that HTTP/2 support is now fully optional.
+Use either `pip install httpcore` for HTTP/1.1 support only, or `pip install httpcore[http2]` for HTTP/1.1 and HTTP/2 support.
+### Added
+- HTTP/2 support becomes optional. (Pull #121, #130)
+- Add `local_address=...` support. (Pull #100, #134)
+- Add `PlainByteStream`, `IteratorByteStream`, `AsyncIteratorByteStream`. The `AsyncByteSteam` and `SyncByteStream` classes are now pure interface classes. (#133)
+- Add `LocalProtocolError`, `RemoteProtocolError` exceptions. (Pull #129)
+- Add `UnsupportedProtocol` exception. (Pull #128)
+- Add `.get_connection_info()` method. (Pull #102, #137)
+- Add better TRACE logs. (Pull #101)
+### Changed
+- `max_keepalive` is deprecated in favour of `max_keepalive_connections`. (Pull #140)
+### Fixed
+- Improve handling of server disconnects. (Pull #112)
+## 0.9.1 (May 27th, 2020)
+### Fixed
+- Proper host resolution for sync case, including IPv6 support. (Pull #97)
+- Close outstanding connections when connection pool is closed. (Pull #98)
+## 0.9.0 (May 21th, 2020)
+### Changed
+- URL port becomes an `Optional[int]` instead of `int`. (Pull #92)
+### Fixed
+- Honor HTTP/2 max concurrent streams settings. (Pull #89, #90)
+- Remove incorrect debug log. (Pull #83)
+## 0.8.4 (May 11th, 2020)
+### Added
+- Logging via HTTPCORE_LOG_LEVEL and HTTPX_LOG_LEVEL environment variables
+and TRACE level logging. (Pull #79)
+### Fixed
+- Reuse of connections on HTTP/2 in close concurrency situations. (Pull #81)
+## 0.8.3 (May 6rd, 2020)
+### Fixed
+- Include `Host` and `Accept` headers on proxy "CONNECT" requests.
+- De-duplicate any headers also contained in proxy_headers.
+- HTTP/2 flag not being passed down to proxy connections.
+## 0.8.2 (May 3rd, 2020)
+### Fixed
+- Fix connections using proxy forwarding requests not being added to the
+connection pool properly. (Pull #70)
+## 0.8.1 (April 30th, 2020)
+### Changed
+- Allow inherintance of both `httpcore.AsyncByteStream`, `httpcore.SyncByteStream` without type conflicts.
+## 0.8.0 (April 30th, 2020)
+### Fixed
+- Fixed tunnel proxy support.
+### Added
+- New `TimeoutException` base class.
+## 0.7.0 (March 5th, 2020)
+- First integration with HTTPX.

venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/RECORD ADDED Viewed

	@@ -0,0 +1,68 @@

+httpcore-1.0.9.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
+httpcore-1.0.9.dist-info/METADATA,sha256=_i1P2mGZEol4d54M8n88BFxTGGP83Zh-rMdPOhjUHCE,21529
+httpcore-1.0.9.dist-info/RECORD,,
+httpcore-1.0.9.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
+httpcore-1.0.9.dist-info/licenses/LICENSE.md,sha256=_ctZFUx0y6uhahEkL3dAvqnyPW_rVUeRfYxflKgDkqU,1518
+httpcore/__init__.py,sha256=9kT_kqChCCJUTHww24ZmR_ezcdbpRYWksD-gYNzkZP8,3445
+httpcore/__pycache__/__init__.cpython-310.pyc,,
+httpcore/__pycache__/_api.cpython-310.pyc,,
+httpcore/__pycache__/_exceptions.cpython-310.pyc,,
+httpcore/__pycache__/_models.cpython-310.pyc,,
+httpcore/__pycache__/_ssl.cpython-310.pyc,,
+httpcore/__pycache__/_synchronization.cpython-310.pyc,,
+httpcore/__pycache__/_trace.cpython-310.pyc,,
+httpcore/__pycache__/_utils.cpython-310.pyc,,
+httpcore/_api.py,sha256=unZmeDschBWCGCPCwkS3Wot9euK6bg_kKxLtGTxw214,3146
+httpcore/_async/__init__.py,sha256=EWdl2v4thnAHzJpqjU4h2a8DUiGAvNiWrkii9pfhTf0,1221
+httpcore/_async/__pycache__/__init__.cpython-310.pyc,,
+httpcore/_async/__pycache__/connection.cpython-310.pyc,,
+httpcore/_async/__pycache__/connection_pool.cpython-310.pyc,,
+httpcore/_async/__pycache__/http11.cpython-310.pyc,,
+httpcore/_async/__pycache__/http2.cpython-310.pyc,,
+httpcore/_async/__pycache__/http_proxy.cpython-310.pyc,,
+httpcore/_async/__pycache__/interfaces.cpython-310.pyc,,
+httpcore/_async/__pycache__/socks_proxy.cpython-310.pyc,,
+httpcore/_async/connection.py,sha256=6OcPXqMEfc0BU38_-iHUNDd1vKSTc2UVT09XqNb_BOk,8449
+httpcore/_async/connection_pool.py,sha256=DOIQ2s2ZCf9qfwxhzMprTPLqCL8OxGXiKF6qRHxvVyY,17307
+httpcore/_async/http11.py,sha256=-qM9bV7PjSQF5vxs37-eUXOIFwbIjPcZbNliuX9TtBw,13880
+httpcore/_async/http2.py,sha256=azX1fcmtXaIwjputFlZ4vd92J8xwjGOa9ax9QIv4394,23936
+httpcore/_async/http_proxy.py,sha256=2zVkrlv-Ds-rWGaqaXlrhEJiAQFPo23BT3Gq_sWoBXU,14701
+httpcore/_async/interfaces.py,sha256=jTiaWL83pgpGC9ziv90ZfwaKNMmHwmOalzaKiuTxATo,4455
+httpcore/_async/socks_proxy.py,sha256=lLKgLlggPfhFlqi0ODeBkOWvt9CghBBUyqsnsU1tx6Q,13841
+httpcore/_backends/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+httpcore/_backends/__pycache__/__init__.cpython-310.pyc,,
+httpcore/_backends/__pycache__/anyio.cpython-310.pyc,,
+httpcore/_backends/__pycache__/auto.cpython-310.pyc,,
+httpcore/_backends/__pycache__/base.cpython-310.pyc,,
+httpcore/_backends/__pycache__/mock.cpython-310.pyc,,
+httpcore/_backends/__pycache__/sync.cpython-310.pyc,,
+httpcore/_backends/__pycache__/trio.cpython-310.pyc,,
+httpcore/_backends/anyio.py,sha256=x8PgEhXRC8bVqsdzk_YJx8Y6d9Tub06CuUSwnbmtqoY,5252
+httpcore/_backends/auto.py,sha256=zO136PKZmsaTDK-HRk84eA-MUg8_2wJf4NvmK432Aio,1662
+httpcore/_backends/base.py,sha256=aShgRdZnMmRhFWHetjumlM73f8Kz1YOAyCUP_4kHslA,3042
+httpcore/_backends/mock.py,sha256=er9T436uSe7NLrfiLa4x6Nuqg5ivQ693CxWYCWsgbH4,4077
+httpcore/_backends/sync.py,sha256=bhE4d9iK9Umxdsdsgm2EfKnXaBms2WggGYU-7jmUujU,7977
+httpcore/_backends/trio.py,sha256=LHu4_Mr5MswQmmT3yE4oLgf9b_JJfeVS4BjDxeJc7Ro,5996
+httpcore/_exceptions.py,sha256=looCKga3_YVYu3s-d3L9RMPRJyhsY7fiuuGxvkOD0c0,1184
+httpcore/_models.py,sha256=IO2CcXcdpovRcLTdGFGB6RyBZdEm2h_TOmoCc4rEKho,17623
+httpcore/_ssl.py,sha256=srqmSNU4iOUvWF-SrJvb8G_YEbHFELOXQOwdDIBTS9c,187
+httpcore/_sync/__init__.py,sha256=JBDIgXt5la1LCJ1sLQeKhjKFpLnpNr8Svs6z2ni3fgg,1141
+httpcore/_sync/__pycache__/__init__.cpython-310.pyc,,
+httpcore/_sync/__pycache__/connection.cpython-310.pyc,,
+httpcore/_sync/__pycache__/connection_pool.cpython-310.pyc,,
+httpcore/_sync/__pycache__/http11.cpython-310.pyc,,
+httpcore/_sync/__pycache__/http2.cpython-310.pyc,,
+httpcore/_sync/__pycache__/http_proxy.cpython-310.pyc,,
+httpcore/_sync/__pycache__/interfaces.cpython-310.pyc,,
+httpcore/_sync/__pycache__/socks_proxy.cpython-310.pyc,,
+httpcore/_sync/connection.py,sha256=9exGOb3PB-Mp2T1-sckSeL2t-tJ_9-NXomV8ihmWCgU,8238
+httpcore/_sync/connection_pool.py,sha256=a-T8LTsUxc7r0Ww1atfHSDoWPjQ0fA8Ul7S3-F0Mj70,16955
+httpcore/_sync/http11.py,sha256=IFobD1Md5JFlJGKWnh1_Q3epikUryI8qo09v8MiJIEA,13476
+httpcore/_sync/http2.py,sha256=AxU4yhcq68Bn5vqdJYtiXKYUj7nvhYbxz3v4rT4xnvA,23400
+httpcore/_sync/http_proxy.py,sha256=_al_6crKuEZu2wyvu493RZImJdBJnj5oGKNjLOJL2Zo,14463
+httpcore/_sync/interfaces.py,sha256=snXON42vUDHO5JBJvo8D4VWk2Wat44z2OXXHDrjbl94,4344
+httpcore/_sync/socks_proxy.py,sha256=zegZW9Snqj2_992DFJa8_CppOVBkVL4AgwduRkStakQ,13614
+httpcore/_synchronization.py,sha256=zSi13mAColBnknjZBknUC6hKNDQT4C6ijnezZ-r0T2s,9434
+httpcore/_trace.py,sha256=ck6ZoIzYTkdNAIfq5MGeKqBXDtqjOX-qfYwmZFbrGco,3952
+httpcore/_utils.py,sha256=_RLgXYOAYC350ikALV59GZ68IJrdocRZxPs9PjmzdFY,1537
+httpcore/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0

venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/WHEEL ADDED Viewed

	@@ -0,0 +1,4 @@

+Wheel-Version: 1.0
+Generator: hatchling 1.27.0
+Root-Is-Purelib: true
+Tag: py3-none-any

venv/lib/python3.10/site-packages/httpcore-1.0.9.dist-info/licenses/LICENSE.md ADDED Viewed

	@@ -0,0 +1,27 @@

+Copyright © 2020, [Encode OSS Ltd](https://www.encode.io/).
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.