cranky-coder08 commited on
Commit
9a4172d
·
verified ·
1 Parent(s): a91ea44

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. phivenv/Lib/site-packages/filelock-3.19.1.dist-info/INSTALLER +1 -0
  2. phivenv/Lib/site-packages/filelock-3.19.1.dist-info/METADATA +42 -0
  3. phivenv/Lib/site-packages/filelock-3.19.1.dist-info/RECORD +24 -0
  4. phivenv/Lib/site-packages/filelock-3.19.1.dist-info/WHEEL +4 -0
  5. phivenv/Lib/site-packages/filelock-3.19.1.dist-info/licenses/LICENSE +24 -0
  6. phivenv/Lib/site-packages/filelock/__pycache__/__init__.cpython-39.pyc +0 -0
  7. phivenv/Lib/site-packages/filelock/__pycache__/_api.cpython-39.pyc +0 -0
  8. phivenv/Lib/site-packages/filelock/__pycache__/_error.cpython-39.pyc +0 -0
  9. phivenv/Lib/site-packages/filelock/__pycache__/_soft.cpython-39.pyc +0 -0
  10. phivenv/Lib/site-packages/filelock/__pycache__/_unix.cpython-39.pyc +0 -0
  11. phivenv/Lib/site-packages/filelock/__pycache__/_util.cpython-39.pyc +0 -0
  12. phivenv/Lib/site-packages/filelock/__pycache__/_windows.cpython-39.pyc +0 -0
  13. phivenv/Lib/site-packages/fsspec/__init__.py +71 -0
  14. phivenv/Lib/site-packages/fsspec/_version.py +34 -0
  15. phivenv/Lib/site-packages/fsspec/archive.py +75 -0
  16. phivenv/Lib/site-packages/fsspec/asyn.py +1097 -0
  17. phivenv/Lib/site-packages/fsspec/caching.py +1004 -0
  18. phivenv/Lib/site-packages/fsspec/callbacks.py +324 -0
  19. phivenv/Lib/site-packages/fsspec/compression.py +182 -0
  20. phivenv/Lib/site-packages/fsspec/config.py +131 -0
  21. phivenv/Lib/site-packages/fsspec/conftest.py +55 -0
  22. phivenv/Lib/site-packages/fsspec/core.py +743 -0
  23. phivenv/Lib/site-packages/fsspec/dircache.py +98 -0
  24. phivenv/Lib/site-packages/fsspec/exceptions.py +18 -0
  25. phivenv/Lib/site-packages/fsspec/fuse.py +324 -0
  26. phivenv/Lib/site-packages/fsspec/generic.py +394 -0
  27. phivenv/Lib/site-packages/fsspec/gui.py +417 -0
  28. phivenv/Lib/site-packages/fsspec/implementations/__init__.py +0 -0
  29. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/__init__.cpython-39.pyc +0 -0
  30. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/arrow.cpython-39.pyc +0 -0
  31. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/asyn_wrapper.cpython-39.pyc +0 -0
  32. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/cache_mapper.cpython-39.pyc +0 -0
  33. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/cache_metadata.cpython-39.pyc +0 -0
  34. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/cached.cpython-39.pyc +0 -0
  35. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/dask.cpython-39.pyc +0 -0
  36. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/data.cpython-39.pyc +0 -0
  37. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/dbfs.cpython-39.pyc +0 -0
  38. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/dirfs.cpython-39.pyc +0 -0
  39. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/ftp.cpython-39.pyc +0 -0
  40. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/gist.cpython-39.pyc +0 -0
  41. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/git.cpython-39.pyc +0 -0
  42. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/github.cpython-39.pyc +0 -0
  43. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/http.cpython-39.pyc +0 -0
  44. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/http_sync.cpython-39.pyc +0 -0
  45. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/jupyter.cpython-39.pyc +0 -0
  46. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/libarchive.cpython-39.pyc +0 -0
  47. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/local.cpython-39.pyc +0 -0
  48. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/memory.cpython-39.pyc +0 -0
  49. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/reference.cpython-39.pyc +0 -0
  50. phivenv/Lib/site-packages/fsspec/implementations/__pycache__/sftp.cpython-39.pyc +0 -0
phivenv/Lib/site-packages/filelock-3.19.1.dist-info/INSTALLER ADDED
@@ -0,0 +1 @@
 
 
1
+ pip
phivenv/Lib/site-packages/filelock-3.19.1.dist-info/METADATA ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: filelock
3
+ Version: 3.19.1
4
+ Summary: A platform independent file lock.
5
+ Project-URL: Documentation, https://py-filelock.readthedocs.io
6
+ Project-URL: Homepage, https://github.com/tox-dev/py-filelock
7
+ Project-URL: Source, https://github.com/tox-dev/py-filelock
8
+ Project-URL: Tracker, https://github.com/tox-dev/py-filelock/issues
9
+ Maintainer-email: Bernát Gábor <gaborjbernat@gmail.com>
10
+ License-Expression: Unlicense
11
+ License-File: LICENSE
12
+ Keywords: application,cache,directory,log,user
13
+ Classifier: Development Status :: 5 - Production/Stable
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: The Unlicense (Unlicense)
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python
18
+ Classifier: Programming Language :: Python :: 3 :: Only
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Programming Language :: Python :: 3.13
24
+ Classifier: Topic :: Internet
25
+ Classifier: Topic :: Software Development :: Libraries
26
+ Classifier: Topic :: System
27
+ Requires-Python: >=3.9
28
+ Description-Content-Type: text/markdown
29
+
30
+ # filelock
31
+
32
+ [![PyPI](https://img.shields.io/pypi/v/filelock)](https://pypi.org/project/filelock/)
33
+ [![Supported Python
34
+ versions](https://img.shields.io/pypi/pyversions/filelock.svg)](https://pypi.org/project/filelock/)
35
+ [![Documentation
36
+ status](https://readthedocs.org/projects/py-filelock/badge/?version=latest)](https://py-filelock.readthedocs.io/en/latest/?badge=latest)
37
+ [![Code style:
38
+ black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
39
+ [![Downloads](https://static.pepy.tech/badge/filelock/month)](https://pepy.tech/project/filelock)
40
+ [![check](https://github.com/tox-dev/py-filelock/actions/workflows/check.yaml/badge.svg)](https://github.com/tox-dev/py-filelock/actions/workflows/check.yaml)
41
+
42
+ For more information checkout the [official documentation](https://py-filelock.readthedocs.io/en/latest/index.html).
phivenv/Lib/site-packages/filelock-3.19.1.dist-info/RECORD ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ filelock-3.19.1.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
2
+ filelock-3.19.1.dist-info/METADATA,sha256=gi6Y1j1mac0141sJB_Qa2MTvhwySJg2EqGjdfhBA4Og,2108
3
+ filelock-3.19.1.dist-info/RECORD,,
4
+ filelock-3.19.1.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
5
+ filelock-3.19.1.dist-info/licenses/LICENSE,sha256=iNm062BXnBkew5HKBMFhMFctfu3EqG2qWL8oxuFMm80,1210
6
+ filelock/__init__.py,sha256=_t_-OAGXo_qyPa9lNQ1YnzVYEvSW3I0onPqzpomsVVg,1769
7
+ filelock/__pycache__/__init__.cpython-39.pyc,,
8
+ filelock/__pycache__/_api.cpython-39.pyc,,
9
+ filelock/__pycache__/_error.cpython-39.pyc,,
10
+ filelock/__pycache__/_soft.cpython-39.pyc,,
11
+ filelock/__pycache__/_unix.cpython-39.pyc,,
12
+ filelock/__pycache__/_util.cpython-39.pyc,,
13
+ filelock/__pycache__/_windows.cpython-39.pyc,,
14
+ filelock/__pycache__/asyncio.cpython-39.pyc,,
15
+ filelock/__pycache__/version.cpython-39.pyc,,
16
+ filelock/_api.py,sha256=2aATBeJ3-jtMj5OSm7EE539iNaTBsf13KXtcBMoi8oM,14545
17
+ filelock/_error.py,sha256=-5jMcjTu60YAvAO1UbqDD1GIEjVkwr8xCFwDBtMeYDg,787
18
+ filelock/_soft.py,sha256=haqtc_TB_KJbYv2a8iuEAclKuM4fMG1vTcp28sK919c,1711
19
+ filelock/_unix.py,sha256=eGOs4gDgZ-5fGnJUz-OkJDeZkAMzgvYcD8hVD6XH7e4,2351
20
+ filelock/_util.py,sha256=QHBoNFIYfbAThhotH3Q8E2acFc84wpG49-T-uu017ZE,1715
21
+ filelock/_windows.py,sha256=8k4XIBl_zZVfGC2gz0kEr8DZBvpNa8wdU9qeM1YrBb8,2179
22
+ filelock/asyncio.py,sha256=LD9yksC24FV0mh_hzgzVi4mmOjFgVVCb7ZLbLqJcqs4,12483
23
+ filelock/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
24
+ filelock/version.py,sha256=W0fQJGqi8xqGqxm8Edh-oPZAI3aJh5WPtMYpJb1FwKQ,513
phivenv/Lib/site-packages/filelock-3.19.1.dist-info/WHEEL ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.27.0
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
phivenv/Lib/site-packages/filelock-3.19.1.dist-info/licenses/LICENSE ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is free and unencumbered software released into the public domain.
2
+
3
+ Anyone is free to copy, modify, publish, use, compile, sell, or
4
+ distribute this software, either in source code form or as a compiled
5
+ binary, for any purpose, commercial or non-commercial, and by any
6
+ means.
7
+
8
+ In jurisdictions that recognize copyright laws, the author or authors
9
+ of this software dedicate any and all copyright interest in the
10
+ software to the public domain. We make this dedication for the benefit
11
+ of the public at large and to the detriment of our heirs and
12
+ successors. We intend this dedication to be an overt act of
13
+ relinquishment in perpetuity of all present and future rights to this
14
+ software under copyright law.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20
+ OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21
+ ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
23
+
24
+ For more information, please refer to <http://unlicense.org>
phivenv/Lib/site-packages/filelock/__pycache__/__init__.cpython-39.pyc ADDED
Binary file (1.36 kB). View file
 
phivenv/Lib/site-packages/filelock/__pycache__/_api.cpython-39.pyc ADDED
Binary file (12.6 kB). View file
 
phivenv/Lib/site-packages/filelock/__pycache__/_error.cpython-39.pyc ADDED
Binary file (1.41 kB). View file
 
phivenv/Lib/site-packages/filelock/__pycache__/_soft.cpython-39.pyc ADDED
Binary file (1.51 kB). View file
 
phivenv/Lib/site-packages/filelock/__pycache__/_unix.cpython-39.pyc ADDED
Binary file (2.21 kB). View file
 
phivenv/Lib/site-packages/filelock/__pycache__/_util.cpython-39.pyc ADDED
Binary file (1.45 kB). View file
 
phivenv/Lib/site-packages/filelock/__pycache__/_windows.cpython-39.pyc ADDED
Binary file (2.08 kB). View file
 
phivenv/Lib/site-packages/fsspec/__init__.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from . import caching
2
+ from ._version import __version__ # noqa: F401
3
+ from .callbacks import Callback
4
+ from .compression import available_compressions
5
+ from .core import get_fs_token_paths, open, open_files, open_local, url_to_fs
6
+ from .exceptions import FSTimeoutError
7
+ from .mapping import FSMap, get_mapper
8
+ from .registry import (
9
+ available_protocols,
10
+ filesystem,
11
+ get_filesystem_class,
12
+ register_implementation,
13
+ registry,
14
+ )
15
+ from .spec import AbstractFileSystem
16
+
17
+ __all__ = [
18
+ "AbstractFileSystem",
19
+ "FSTimeoutError",
20
+ "FSMap",
21
+ "filesystem",
22
+ "register_implementation",
23
+ "get_filesystem_class",
24
+ "get_fs_token_paths",
25
+ "get_mapper",
26
+ "open",
27
+ "open_files",
28
+ "open_local",
29
+ "registry",
30
+ "caching",
31
+ "Callback",
32
+ "available_protocols",
33
+ "available_compressions",
34
+ "url_to_fs",
35
+ ]
36
+
37
+
38
+ def process_entries():
39
+ try:
40
+ from importlib.metadata import entry_points
41
+ except ImportError:
42
+ return
43
+ if entry_points is not None:
44
+ try:
45
+ eps = entry_points()
46
+ except TypeError:
47
+ pass # importlib-metadata < 0.8
48
+ else:
49
+ if hasattr(eps, "select"): # Python 3.10+ / importlib_metadata >= 3.9.0
50
+ specs = eps.select(group="fsspec.specs")
51
+ else:
52
+ specs = eps.get("fsspec.specs", [])
53
+ registered_names = {}
54
+ for spec in specs:
55
+ err_msg = f"Unable to load filesystem from {spec}"
56
+ name = spec.name
57
+ if name in registered_names:
58
+ continue
59
+ registered_names[name] = True
60
+ register_implementation(
61
+ name,
62
+ spec.value.replace(":", "."),
63
+ errtxt=err_msg,
64
+ # We take our implementations as the ones to overload with if
65
+ # for some reason we encounter some, may be the same, already
66
+ # registered
67
+ clobber=True,
68
+ )
69
+
70
+
71
+ process_entries()
phivenv/Lib/site-packages/fsspec/_version.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # file generated by setuptools-scm
2
+ # don't change, don't track in version control
3
+
4
+ __all__ = [
5
+ "__version__",
6
+ "__version_tuple__",
7
+ "version",
8
+ "version_tuple",
9
+ "__commit_id__",
10
+ "commit_id",
11
+ ]
12
+
13
+ TYPE_CHECKING = False
14
+ if TYPE_CHECKING:
15
+ from typing import Tuple
16
+ from typing import Union
17
+
18
+ VERSION_TUPLE = Tuple[Union[int, str], ...]
19
+ COMMIT_ID = Union[str, None]
20
+ else:
21
+ VERSION_TUPLE = object
22
+ COMMIT_ID = object
23
+
24
+ version: str
25
+ __version__: str
26
+ __version_tuple__: VERSION_TUPLE
27
+ version_tuple: VERSION_TUPLE
28
+ commit_id: COMMIT_ID
29
+ __commit_id__: COMMIT_ID
30
+
31
+ __version__ = version = '2025.9.0'
32
+ __version_tuple__ = version_tuple = (2025, 9, 0)
33
+
34
+ __commit_id__ = commit_id = None
phivenv/Lib/site-packages/fsspec/archive.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import operator
2
+
3
+ from fsspec import AbstractFileSystem
4
+ from fsspec.utils import tokenize
5
+
6
+
7
+ class AbstractArchiveFileSystem(AbstractFileSystem):
8
+ """
9
+ A generic superclass for implementing Archive-based filesystems.
10
+
11
+ Currently, it is shared amongst
12
+ :class:`~fsspec.implementations.zip.ZipFileSystem`,
13
+ :class:`~fsspec.implementations.libarchive.LibArchiveFileSystem` and
14
+ :class:`~fsspec.implementations.tar.TarFileSystem`.
15
+ """
16
+
17
+ def __str__(self):
18
+ return f"<Archive-like object {type(self).__name__} at {id(self)}>"
19
+
20
+ __repr__ = __str__
21
+
22
+ def ukey(self, path):
23
+ return tokenize(path, self.fo, self.protocol)
24
+
25
+ def _all_dirnames(self, paths):
26
+ """Returns *all* directory names for each path in paths, including intermediate
27
+ ones.
28
+
29
+ Parameters
30
+ ----------
31
+ paths: Iterable of path strings
32
+ """
33
+ if len(paths) == 0:
34
+ return set()
35
+
36
+ dirnames = {self._parent(path) for path in paths} - {self.root_marker}
37
+ return dirnames | self._all_dirnames(dirnames)
38
+
39
+ def info(self, path, **kwargs):
40
+ self._get_dirs()
41
+ path = self._strip_protocol(path)
42
+ if path in {"", "/"} and self.dir_cache:
43
+ return {"name": "", "type": "directory", "size": 0}
44
+ if path in self.dir_cache:
45
+ return self.dir_cache[path]
46
+ elif path + "/" in self.dir_cache:
47
+ return self.dir_cache[path + "/"]
48
+ else:
49
+ raise FileNotFoundError(path)
50
+
51
+ def ls(self, path, detail=True, **kwargs):
52
+ self._get_dirs()
53
+ paths = {}
54
+ for p, f in self.dir_cache.items():
55
+ p = p.rstrip("/")
56
+ if "/" in p:
57
+ root = p.rsplit("/", 1)[0]
58
+ else:
59
+ root = ""
60
+ if root == path.rstrip("/"):
61
+ paths[p] = f
62
+ elif all(
63
+ (a == b)
64
+ for a, b in zip(path.split("/"), [""] + p.strip("/").split("/"))
65
+ ):
66
+ # root directory entry
67
+ ppath = p.rstrip("/").split("/", 1)[0]
68
+ if ppath not in paths:
69
+ out = {"name": ppath, "size": 0, "type": "directory"}
70
+ paths[ppath] = out
71
+ if detail:
72
+ out = sorted(paths.values(), key=operator.itemgetter("name"))
73
+ return out
74
+ else:
75
+ return sorted(paths)
phivenv/Lib/site-packages/fsspec/asyn.py ADDED
@@ -0,0 +1,1097 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import asyncio
2
+ import asyncio.events
3
+ import functools
4
+ import inspect
5
+ import io
6
+ import numbers
7
+ import os
8
+ import re
9
+ import threading
10
+ from collections.abc import Iterable
11
+ from glob import has_magic
12
+ from typing import TYPE_CHECKING
13
+
14
+ from .callbacks import DEFAULT_CALLBACK
15
+ from .exceptions import FSTimeoutError
16
+ from .implementations.local import LocalFileSystem, make_path_posix, trailing_sep
17
+ from .spec import AbstractBufferedFile, AbstractFileSystem
18
+ from .utils import glob_translate, is_exception, other_paths
19
+
20
+ private = re.compile("_[^_]")
21
+ iothread = [None] # dedicated fsspec IO thread
22
+ loop = [None] # global event loop for any non-async instance
23
+ _lock = None # global lock placeholder
24
+ get_running_loop = asyncio.get_running_loop
25
+
26
+
27
+ def get_lock():
28
+ """Allocate or return a threading lock.
29
+
30
+ The lock is allocated on first use to allow setting one lock per forked process.
31
+ """
32
+ global _lock
33
+ if not _lock:
34
+ _lock = threading.Lock()
35
+ return _lock
36
+
37
+
38
+ def reset_lock():
39
+ """Reset the global lock.
40
+
41
+ This should be called only on the init of a forked process to reset the lock to
42
+ None, enabling the new forked process to get a new lock.
43
+ """
44
+ global _lock
45
+
46
+ iothread[0] = None
47
+ loop[0] = None
48
+ _lock = None
49
+
50
+
51
+ async def _runner(event, coro, result, timeout=None):
52
+ timeout = timeout if timeout else None # convert 0 or 0.0 to None
53
+ if timeout is not None:
54
+ coro = asyncio.wait_for(coro, timeout=timeout)
55
+ try:
56
+ result[0] = await coro
57
+ except Exception as ex:
58
+ result[0] = ex
59
+ finally:
60
+ event.set()
61
+
62
+
63
+ def sync(loop, func, *args, timeout=None, **kwargs):
64
+ """
65
+ Make loop run coroutine until it returns. Runs in other thread
66
+
67
+ Examples
68
+ --------
69
+ >>> fsspec.asyn.sync(fsspec.asyn.get_loop(), func, *args,
70
+ timeout=timeout, **kwargs)
71
+ """
72
+ timeout = timeout if timeout else None # convert 0 or 0.0 to None
73
+ # NB: if the loop is not running *yet*, it is OK to submit work
74
+ # and we will wait for it
75
+ if loop is None or loop.is_closed():
76
+ raise RuntimeError("Loop is not running")
77
+ try:
78
+ loop0 = asyncio.events.get_running_loop()
79
+ if loop0 is loop:
80
+ raise NotImplementedError("Calling sync() from within a running loop")
81
+ except NotImplementedError:
82
+ raise
83
+ except RuntimeError:
84
+ pass
85
+ coro = func(*args, **kwargs)
86
+ result = [None]
87
+ event = threading.Event()
88
+ asyncio.run_coroutine_threadsafe(_runner(event, coro, result, timeout), loop)
89
+ while True:
90
+ # this loops allows thread to get interrupted
91
+ if event.wait(1):
92
+ break
93
+ if timeout is not None:
94
+ timeout -= 1
95
+ if timeout < 0:
96
+ raise FSTimeoutError
97
+
98
+ return_result = result[0]
99
+ if isinstance(return_result, asyncio.TimeoutError):
100
+ # suppress asyncio.TimeoutError, raise FSTimeoutError
101
+ raise FSTimeoutError from return_result
102
+ elif isinstance(return_result, BaseException):
103
+ raise return_result
104
+ else:
105
+ return return_result
106
+
107
+
108
+ def sync_wrapper(func, obj=None):
109
+ """Given a function, make so can be called in blocking contexts
110
+
111
+ Leave obj=None if defining within a class. Pass the instance if attaching
112
+ as an attribute of the instance.
113
+ """
114
+
115
+ @functools.wraps(func)
116
+ def wrapper(*args, **kwargs):
117
+ self = obj or args[0]
118
+ return sync(self.loop, func, *args, **kwargs)
119
+
120
+ return wrapper
121
+
122
+
123
+ def get_loop():
124
+ """Create or return the default fsspec IO loop
125
+
126
+ The loop will be running on a separate thread.
127
+ """
128
+ if loop[0] is None:
129
+ with get_lock():
130
+ # repeat the check just in case the loop got filled between the
131
+ # previous two calls from another thread
132
+ if loop[0] is None:
133
+ loop[0] = asyncio.new_event_loop()
134
+ th = threading.Thread(target=loop[0].run_forever, name="fsspecIO")
135
+ th.daemon = True
136
+ th.start()
137
+ iothread[0] = th
138
+ return loop[0]
139
+
140
+
141
+ def reset_after_fork():
142
+ global lock
143
+ loop[0] = None
144
+ iothread[0] = None
145
+ lock = None
146
+
147
+
148
+ if hasattr(os, "register_at_fork"):
149
+ # should be posix; this will do nothing for spawn or forkserver subprocesses
150
+ os.register_at_fork(after_in_child=reset_after_fork)
151
+
152
+
153
+ if TYPE_CHECKING:
154
+ import resource
155
+
156
+ ResourceError = resource.error
157
+ else:
158
+ try:
159
+ import resource
160
+ except ImportError:
161
+ resource = None
162
+ ResourceError = OSError
163
+ else:
164
+ ResourceError = getattr(resource, "error", OSError)
165
+
166
+ _DEFAULT_BATCH_SIZE = 128
167
+ _NOFILES_DEFAULT_BATCH_SIZE = 1280
168
+
169
+
170
+ def _get_batch_size(nofiles=False):
171
+ from fsspec.config import conf
172
+
173
+ if nofiles:
174
+ if "nofiles_gather_batch_size" in conf:
175
+ return conf["nofiles_gather_batch_size"]
176
+ else:
177
+ if "gather_batch_size" in conf:
178
+ return conf["gather_batch_size"]
179
+ if nofiles:
180
+ return _NOFILES_DEFAULT_BATCH_SIZE
181
+ if resource is None:
182
+ return _DEFAULT_BATCH_SIZE
183
+
184
+ try:
185
+ soft_limit, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
186
+ except (ImportError, ValueError, ResourceError):
187
+ return _DEFAULT_BATCH_SIZE
188
+
189
+ if soft_limit == resource.RLIM_INFINITY:
190
+ return -1
191
+ else:
192
+ return soft_limit // 8
193
+
194
+
195
+ def running_async() -> bool:
196
+ """Being executed by an event loop?"""
197
+ try:
198
+ asyncio.get_running_loop()
199
+ return True
200
+ except RuntimeError:
201
+ return False
202
+
203
+
204
+ async def _run_coros_in_chunks(
205
+ coros,
206
+ batch_size=None,
207
+ callback=DEFAULT_CALLBACK,
208
+ timeout=None,
209
+ return_exceptions=False,
210
+ nofiles=False,
211
+ ):
212
+ """Run the given coroutines in chunks.
213
+
214
+ Parameters
215
+ ----------
216
+ coros: list of coroutines to run
217
+ batch_size: int or None
218
+ Number of coroutines to submit/wait on simultaneously.
219
+ If -1, then it will not be any throttling. If
220
+ None, it will be inferred from _get_batch_size()
221
+ callback: fsspec.callbacks.Callback instance
222
+ Gets a relative_update when each coroutine completes
223
+ timeout: number or None
224
+ If given, each coroutine times out after this time. Note that, since
225
+ there are multiple batches, the total run time of this function will in
226
+ general be longer
227
+ return_exceptions: bool
228
+ Same meaning as in asyncio.gather
229
+ nofiles: bool
230
+ If inferring the batch_size, does this operation involve local files?
231
+ If yes, you normally expect smaller batches.
232
+ """
233
+
234
+ if batch_size is None:
235
+ batch_size = _get_batch_size(nofiles=nofiles)
236
+
237
+ if batch_size == -1:
238
+ batch_size = len(coros)
239
+
240
+ assert batch_size > 0
241
+
242
+ async def _run_coro(coro, i):
243
+ try:
244
+ return await asyncio.wait_for(coro, timeout=timeout), i
245
+ except Exception as e:
246
+ if not return_exceptions:
247
+ raise
248
+ return e, i
249
+ finally:
250
+ callback.relative_update(1)
251
+
252
+ i = 0
253
+ n = len(coros)
254
+ results = [None] * n
255
+ pending = set()
256
+
257
+ while pending or i < n:
258
+ while len(pending) < batch_size and i < n:
259
+ pending.add(asyncio.ensure_future(_run_coro(coros[i], i)))
260
+ i += 1
261
+
262
+ if not pending:
263
+ break
264
+
265
+ done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
266
+ while done:
267
+ result, k = await done.pop()
268
+ results[k] = result
269
+
270
+ return results
271
+
272
+
273
+ # these methods should be implemented as async by any async-able backend
274
+ async_methods = [
275
+ "_ls",
276
+ "_cat_file",
277
+ "_get_file",
278
+ "_put_file",
279
+ "_rm_file",
280
+ "_cp_file",
281
+ "_pipe_file",
282
+ "_expand_path",
283
+ "_info",
284
+ "_isfile",
285
+ "_isdir",
286
+ "_exists",
287
+ "_walk",
288
+ "_glob",
289
+ "_find",
290
+ "_du",
291
+ "_size",
292
+ "_mkdir",
293
+ "_makedirs",
294
+ ]
295
+
296
+
297
+ class AsyncFileSystem(AbstractFileSystem):
298
+ """Async file operations, default implementations
299
+
300
+ Passes bulk operations to asyncio.gather for concurrent operation.
301
+
302
+ Implementations that have concurrent batch operations and/or async methods
303
+ should inherit from this class instead of AbstractFileSystem. Docstrings are
304
+ copied from the un-underscored method in AbstractFileSystem, if not given.
305
+ """
306
+
307
+ # note that methods do not have docstring here; they will be copied
308
+ # for _* methods and inferred for overridden methods.
309
+
310
+ async_impl = True
311
+ mirror_sync_methods = True
312
+ disable_throttling = False
313
+
314
+ def __init__(self, *args, asynchronous=False, loop=None, batch_size=None, **kwargs):
315
+ self.asynchronous = asynchronous
316
+ self._pid = os.getpid()
317
+ if not asynchronous:
318
+ self._loop = loop or get_loop()
319
+ else:
320
+ self._loop = None
321
+ self.batch_size = batch_size
322
+ super().__init__(*args, **kwargs)
323
+
324
+ @property
325
+ def loop(self):
326
+ if self._pid != os.getpid():
327
+ raise RuntimeError("This class is not fork-safe")
328
+ return self._loop
329
+
330
+ async def _rm_file(self, path, **kwargs):
331
+ raise NotImplementedError
332
+
333
+ async def _rm(self, path, recursive=False, batch_size=None, **kwargs):
334
+ # TODO: implement on_error
335
+ batch_size = batch_size or self.batch_size
336
+ path = await self._expand_path(path, recursive=recursive)
337
+ return await _run_coros_in_chunks(
338
+ [self._rm_file(p, **kwargs) for p in reversed(path)],
339
+ batch_size=batch_size,
340
+ nofiles=True,
341
+ )
342
+
343
+ async def _cp_file(self, path1, path2, **kwargs):
344
+ raise NotImplementedError
345
+
346
+ async def _mv_file(self, path1, path2):
347
+ await self._cp_file(path1, path2)
348
+ await self._rm_file(path1)
349
+
350
+ async def _copy(
351
+ self,
352
+ path1,
353
+ path2,
354
+ recursive=False,
355
+ on_error=None,
356
+ maxdepth=None,
357
+ batch_size=None,
358
+ **kwargs,
359
+ ):
360
+ if on_error is None and recursive:
361
+ on_error = "ignore"
362
+ elif on_error is None:
363
+ on_error = "raise"
364
+
365
+ if isinstance(path1, list) and isinstance(path2, list):
366
+ # No need to expand paths when both source and destination
367
+ # are provided as lists
368
+ paths1 = path1
369
+ paths2 = path2
370
+ else:
371
+ source_is_str = isinstance(path1, str)
372
+ paths1 = await self._expand_path(
373
+ path1, maxdepth=maxdepth, recursive=recursive
374
+ )
375
+ if source_is_str and (not recursive or maxdepth is not None):
376
+ # Non-recursive glob does not copy directories
377
+ paths1 = [
378
+ p for p in paths1 if not (trailing_sep(p) or await self._isdir(p))
379
+ ]
380
+ if not paths1:
381
+ return
382
+
383
+ source_is_file = len(paths1) == 1
384
+ dest_is_dir = isinstance(path2, str) and (
385
+ trailing_sep(path2) or await self._isdir(path2)
386
+ )
387
+
388
+ exists = source_is_str and (
389
+ (has_magic(path1) and source_is_file)
390
+ or (not has_magic(path1) and dest_is_dir and not trailing_sep(path1))
391
+ )
392
+ paths2 = other_paths(
393
+ paths1,
394
+ path2,
395
+ exists=exists,
396
+ flatten=not source_is_str,
397
+ )
398
+
399
+ batch_size = batch_size or self.batch_size
400
+ coros = [self._cp_file(p1, p2, **kwargs) for p1, p2 in zip(paths1, paths2)]
401
+ result = await _run_coros_in_chunks(
402
+ coros, batch_size=batch_size, return_exceptions=True, nofiles=True
403
+ )
404
+
405
+ for ex in filter(is_exception, result):
406
+ if on_error == "ignore" and isinstance(ex, FileNotFoundError):
407
+ continue
408
+ raise ex
409
+
410
+ async def _pipe_file(self, path, value, mode="overwrite", **kwargs):
411
+ raise NotImplementedError
412
+
413
+ async def _pipe(self, path, value=None, batch_size=None, **kwargs):
414
+ if isinstance(path, str):
415
+ path = {path: value}
416
+ batch_size = batch_size or self.batch_size
417
+ return await _run_coros_in_chunks(
418
+ [self._pipe_file(k, v, **kwargs) for k, v in path.items()],
419
+ batch_size=batch_size,
420
+ nofiles=True,
421
+ )
422
+
423
+ async def _process_limits(self, url, start, end):
424
+ """Helper for "Range"-based _cat_file"""
425
+ size = None
426
+ suff = False
427
+ if start is not None and start < 0:
428
+ # if start is negative and end None, end is the "suffix length"
429
+ if end is None:
430
+ end = -start
431
+ start = ""
432
+ suff = True
433
+ else:
434
+ size = size or (await self._info(url))["size"]
435
+ start = size + start
436
+ elif start is None:
437
+ start = 0
438
+ if not suff:
439
+ if end is not None and end < 0:
440
+ if start is not None:
441
+ size = size or (await self._info(url))["size"]
442
+ end = size + end
443
+ elif end is None:
444
+ end = ""
445
+ if isinstance(end, numbers.Integral):
446
+ end -= 1 # bytes range is inclusive
447
+ return f"bytes={start}-{end}"
448
+
449
+ async def _cat_file(self, path, start=None, end=None, **kwargs):
450
+ raise NotImplementedError
451
+
452
+ async def _cat(
453
+ self, path, recursive=False, on_error="raise", batch_size=None, **kwargs
454
+ ):
455
+ paths = await self._expand_path(path, recursive=recursive)
456
+ coros = [self._cat_file(path, **kwargs) for path in paths]
457
+ batch_size = batch_size or self.batch_size
458
+ out = await _run_coros_in_chunks(
459
+ coros, batch_size=batch_size, nofiles=True, return_exceptions=True
460
+ )
461
+ if on_error == "raise":
462
+ ex = next(filter(is_exception, out), False)
463
+ if ex:
464
+ raise ex
465
+ if (
466
+ len(paths) > 1
467
+ or isinstance(path, list)
468
+ or paths[0] != self._strip_protocol(path)
469
+ ):
470
+ return {
471
+ k: v
472
+ for k, v in zip(paths, out)
473
+ if on_error != "omit" or not is_exception(v)
474
+ }
475
+ else:
476
+ return out[0]
477
+
478
+ async def _cat_ranges(
479
+ self,
480
+ paths,
481
+ starts,
482
+ ends,
483
+ max_gap=None,
484
+ batch_size=None,
485
+ on_error="return",
486
+ **kwargs,
487
+ ):
488
+ """Get the contents of byte ranges from one or more files
489
+
490
+ Parameters
491
+ ----------
492
+ paths: list
493
+ A list of of filepaths on this filesystems
494
+ starts, ends: int or list
495
+ Bytes limits of the read. If using a single int, the same value will be
496
+ used to read all the specified files.
497
+ """
498
+ # TODO: on_error
499
+ if max_gap is not None:
500
+ # use utils.merge_offset_ranges
501
+ raise NotImplementedError
502
+ if not isinstance(paths, list):
503
+ raise TypeError
504
+ if not isinstance(starts, Iterable):
505
+ starts = [starts] * len(paths)
506
+ if not isinstance(ends, Iterable):
507
+ ends = [ends] * len(paths)
508
+ if len(starts) != len(paths) or len(ends) != len(paths):
509
+ raise ValueError
510
+ coros = [
511
+ self._cat_file(p, start=s, end=e, **kwargs)
512
+ for p, s, e in zip(paths, starts, ends)
513
+ ]
514
+ batch_size = batch_size or self.batch_size
515
+ return await _run_coros_in_chunks(
516
+ coros, batch_size=batch_size, nofiles=True, return_exceptions=True
517
+ )
518
+
519
+ async def _put_file(self, lpath, rpath, mode="overwrite", **kwargs):
520
+ raise NotImplementedError
521
+
522
+ async def _put(
523
+ self,
524
+ lpath,
525
+ rpath,
526
+ recursive=False,
527
+ callback=DEFAULT_CALLBACK,
528
+ batch_size=None,
529
+ maxdepth=None,
530
+ **kwargs,
531
+ ):
532
+ """Copy file(s) from local.
533
+
534
+ Copies a specific file or tree of files (if recursive=True). If rpath
535
+ ends with a "/", it will be assumed to be a directory, and target files
536
+ will go within.
537
+
538
+ The put_file method will be called concurrently on a batch of files. The
539
+ batch_size option can configure the amount of futures that can be executed
540
+ at the same time. If it is -1, then all the files will be uploaded concurrently.
541
+ The default can be set for this instance by passing "batch_size" in the
542
+ constructor, or for all instances by setting the "gather_batch_size" key
543
+ in ``fsspec.config.conf``, falling back to 1/8th of the system limit .
544
+ """
545
+ if isinstance(lpath, list) and isinstance(rpath, list):
546
+ # No need to expand paths when both source and destination
547
+ # are provided as lists
548
+ rpaths = rpath
549
+ lpaths = lpath
550
+ else:
551
+ source_is_str = isinstance(lpath, str)
552
+ if source_is_str:
553
+ lpath = make_path_posix(lpath)
554
+ fs = LocalFileSystem()
555
+ lpaths = fs.expand_path(lpath, recursive=recursive, maxdepth=maxdepth)
556
+ if source_is_str and (not recursive or maxdepth is not None):
557
+ # Non-recursive glob does not copy directories
558
+ lpaths = [p for p in lpaths if not (trailing_sep(p) or fs.isdir(p))]
559
+ if not lpaths:
560
+ return
561
+
562
+ source_is_file = len(lpaths) == 1
563
+ dest_is_dir = isinstance(rpath, str) and (
564
+ trailing_sep(rpath) or await self._isdir(rpath)
565
+ )
566
+
567
+ rpath = self._strip_protocol(rpath)
568
+ exists = source_is_str and (
569
+ (has_magic(lpath) and source_is_file)
570
+ or (not has_magic(lpath) and dest_is_dir and not trailing_sep(lpath))
571
+ )
572
+ rpaths = other_paths(
573
+ lpaths,
574
+ rpath,
575
+ exists=exists,
576
+ flatten=not source_is_str,
577
+ )
578
+
579
+ is_dir = {l: os.path.isdir(l) for l in lpaths}
580
+ rdirs = [r for l, r in zip(lpaths, rpaths) if is_dir[l]]
581
+ file_pairs = [(l, r) for l, r in zip(lpaths, rpaths) if not is_dir[l]]
582
+
583
+ await asyncio.gather(*[self._makedirs(d, exist_ok=True) for d in rdirs])
584
+ batch_size = batch_size or self.batch_size
585
+
586
+ coros = []
587
+ callback.set_size(len(file_pairs))
588
+ for lfile, rfile in file_pairs:
589
+ put_file = callback.branch_coro(self._put_file)
590
+ coros.append(put_file(lfile, rfile, **kwargs))
591
+
592
+ return await _run_coros_in_chunks(
593
+ coros, batch_size=batch_size, callback=callback
594
+ )
595
+
596
+ async def _get_file(self, rpath, lpath, **kwargs):
597
+ raise NotImplementedError
598
+
599
+ async def _get(
600
+ self,
601
+ rpath,
602
+ lpath,
603
+ recursive=False,
604
+ callback=DEFAULT_CALLBACK,
605
+ maxdepth=None,
606
+ **kwargs,
607
+ ):
608
+ """Copy file(s) to local.
609
+
610
+ Copies a specific file or tree of files (if recursive=True). If lpath
611
+ ends with a "/", it will be assumed to be a directory, and target files
612
+ will go within. Can submit a list of paths, which may be glob-patterns
613
+ and will be expanded.
614
+
615
+ The get_file method will be called concurrently on a batch of files. The
616
+ batch_size option can configure the amount of futures that can be executed
617
+ at the same time. If it is -1, then all the files will be uploaded concurrently.
618
+ The default can be set for this instance by passing "batch_size" in the
619
+ constructor, or for all instances by setting the "gather_batch_size" key
620
+ in ``fsspec.config.conf``, falling back to 1/8th of the system limit .
621
+ """
622
+ if isinstance(lpath, list) and isinstance(rpath, list):
623
+ # No need to expand paths when both source and destination
624
+ # are provided as lists
625
+ rpaths = rpath
626
+ lpaths = lpath
627
+ else:
628
+ source_is_str = isinstance(rpath, str)
629
+ # First check for rpath trailing slash as _strip_protocol removes it.
630
+ source_not_trailing_sep = source_is_str and not trailing_sep(rpath)
631
+ rpath = self._strip_protocol(rpath)
632
+ rpaths = await self._expand_path(
633
+ rpath, recursive=recursive, maxdepth=maxdepth
634
+ )
635
+ if source_is_str and (not recursive or maxdepth is not None):
636
+ # Non-recursive glob does not copy directories
637
+ rpaths = [
638
+ p for p in rpaths if not (trailing_sep(p) or await self._isdir(p))
639
+ ]
640
+ if not rpaths:
641
+ return
642
+
643
+ lpath = make_path_posix(lpath)
644
+ source_is_file = len(rpaths) == 1
645
+ dest_is_dir = isinstance(lpath, str) and (
646
+ trailing_sep(lpath) or LocalFileSystem().isdir(lpath)
647
+ )
648
+
649
+ exists = source_is_str and (
650
+ (has_magic(rpath) and source_is_file)
651
+ or (not has_magic(rpath) and dest_is_dir and source_not_trailing_sep)
652
+ )
653
+ lpaths = other_paths(
654
+ rpaths,
655
+ lpath,
656
+ exists=exists,
657
+ flatten=not source_is_str,
658
+ )
659
+
660
+ [os.makedirs(os.path.dirname(lp), exist_ok=True) for lp in lpaths]
661
+ batch_size = kwargs.pop("batch_size", self.batch_size)
662
+
663
+ coros = []
664
+ callback.set_size(len(lpaths))
665
+ for lpath, rpath in zip(lpaths, rpaths):
666
+ get_file = callback.branch_coro(self._get_file)
667
+ coros.append(get_file(rpath, lpath, **kwargs))
668
+ return await _run_coros_in_chunks(
669
+ coros, batch_size=batch_size, callback=callback
670
+ )
671
+
672
+ async def _isfile(self, path):
673
+ try:
674
+ return (await self._info(path))["type"] == "file"
675
+ except: # noqa: E722
676
+ return False
677
+
678
+ async def _isdir(self, path):
679
+ try:
680
+ return (await self._info(path))["type"] == "directory"
681
+ except OSError:
682
+ return False
683
+
684
+ async def _size(self, path):
685
+ return (await self._info(path)).get("size", None)
686
+
687
+ async def _sizes(self, paths, batch_size=None):
688
+ batch_size = batch_size or self.batch_size
689
+ return await _run_coros_in_chunks(
690
+ [self._size(p) for p in paths], batch_size=batch_size
691
+ )
692
+
693
+ async def _exists(self, path, **kwargs):
694
+ try:
695
+ await self._info(path, **kwargs)
696
+ return True
697
+ except FileNotFoundError:
698
+ return False
699
+
700
+ async def _info(self, path, **kwargs):
701
+ raise NotImplementedError
702
+
703
+ async def _ls(self, path, detail=True, **kwargs):
704
+ raise NotImplementedError
705
+
706
+ async def _walk(self, path, maxdepth=None, on_error="omit", **kwargs):
707
+ if maxdepth is not None and maxdepth < 1:
708
+ raise ValueError("maxdepth must be at least 1")
709
+
710
+ path = self._strip_protocol(path)
711
+ full_dirs = {}
712
+ dirs = {}
713
+ files = {}
714
+
715
+ detail = kwargs.pop("detail", False)
716
+ try:
717
+ listing = await self._ls(path, detail=True, **kwargs)
718
+ except (FileNotFoundError, OSError) as e:
719
+ if on_error == "raise":
720
+ raise
721
+ elif callable(on_error):
722
+ on_error(e)
723
+ if detail:
724
+ yield path, {}, {}
725
+ else:
726
+ yield path, [], []
727
+ return
728
+
729
+ for info in listing:
730
+ # each info name must be at least [path]/part , but here
731
+ # we check also for names like [path]/part/
732
+ pathname = info["name"].rstrip("/")
733
+ name = pathname.rsplit("/", 1)[-1]
734
+ if info["type"] == "directory" and pathname != path:
735
+ # do not include "self" path
736
+ full_dirs[name] = pathname
737
+ dirs[name] = info
738
+ elif pathname == path:
739
+ # file-like with same name as give path
740
+ files[""] = info
741
+ else:
742
+ files[name] = info
743
+
744
+ if detail:
745
+ yield path, dirs, files
746
+ else:
747
+ yield path, list(dirs), list(files)
748
+
749
+ if maxdepth is not None:
750
+ maxdepth -= 1
751
+ if maxdepth < 1:
752
+ return
753
+
754
+ for d in dirs:
755
+ async for _ in self._walk(
756
+ full_dirs[d], maxdepth=maxdepth, detail=detail, **kwargs
757
+ ):
758
+ yield _
759
+
760
+ async def _glob(self, path, maxdepth=None, **kwargs):
761
+ if maxdepth is not None and maxdepth < 1:
762
+ raise ValueError("maxdepth must be at least 1")
763
+
764
+ import re
765
+
766
+ seps = (os.path.sep, os.path.altsep) if os.path.altsep else (os.path.sep,)
767
+ ends_with_sep = path.endswith(seps) # _strip_protocol strips trailing slash
768
+ path = self._strip_protocol(path)
769
+ append_slash_to_dirname = ends_with_sep or path.endswith(
770
+ tuple(sep + "**" for sep in seps)
771
+ )
772
+ idx_star = path.find("*") if path.find("*") >= 0 else len(path)
773
+ idx_qmark = path.find("?") if path.find("?") >= 0 else len(path)
774
+ idx_brace = path.find("[") if path.find("[") >= 0 else len(path)
775
+
776
+ min_idx = min(idx_star, idx_qmark, idx_brace)
777
+
778
+ detail = kwargs.pop("detail", False)
779
+
780
+ if not has_magic(path):
781
+ if await self._exists(path, **kwargs):
782
+ if not detail:
783
+ return [path]
784
+ else:
785
+ return {path: await self._info(path, **kwargs)}
786
+ else:
787
+ if not detail:
788
+ return [] # glob of non-existent returns empty
789
+ else:
790
+ return {}
791
+ elif "/" in path[:min_idx]:
792
+ min_idx = path[:min_idx].rindex("/")
793
+ root = path[: min_idx + 1]
794
+ depth = path[min_idx + 1 :].count("/") + 1
795
+ else:
796
+ root = ""
797
+ depth = path[min_idx + 1 :].count("/") + 1
798
+
799
+ if "**" in path:
800
+ if maxdepth is not None:
801
+ idx_double_stars = path.find("**")
802
+ depth_double_stars = path[idx_double_stars:].count("/") + 1
803
+ depth = depth - depth_double_stars + maxdepth
804
+ else:
805
+ depth = None
806
+
807
+ allpaths = await self._find(
808
+ root, maxdepth=depth, withdirs=True, detail=True, **kwargs
809
+ )
810
+
811
+ pattern = glob_translate(path + ("/" if ends_with_sep else ""))
812
+ pattern = re.compile(pattern)
813
+
814
+ out = {
815
+ p: info
816
+ for p, info in sorted(allpaths.items())
817
+ if pattern.match(
818
+ p + "/"
819
+ if append_slash_to_dirname and info["type"] == "directory"
820
+ else p
821
+ )
822
+ }
823
+
824
+ if detail:
825
+ return out
826
+ else:
827
+ return list(out)
828
+
829
+ async def _du(self, path, total=True, maxdepth=None, **kwargs):
830
+ sizes = {}
831
+ # async for?
832
+ for f in await self._find(path, maxdepth=maxdepth, **kwargs):
833
+ info = await self._info(f)
834
+ sizes[info["name"]] = info["size"]
835
+ if total:
836
+ return sum(sizes.values())
837
+ else:
838
+ return sizes
839
+
840
+ async def _find(self, path, maxdepth=None, withdirs=False, **kwargs):
841
+ path = self._strip_protocol(path)
842
+ out = {}
843
+ detail = kwargs.pop("detail", False)
844
+
845
+ # Add the root directory if withdirs is requested
846
+ # This is needed for posix glob compliance
847
+ if withdirs and path != "" and await self._isdir(path):
848
+ out[path] = await self._info(path)
849
+
850
+ # async for?
851
+ async for _, dirs, files in self._walk(path, maxdepth, detail=True, **kwargs):
852
+ if withdirs:
853
+ files.update(dirs)
854
+ out.update({info["name"]: info for name, info in files.items()})
855
+ if not out and (await self._isfile(path)):
856
+ # walk works on directories, but find should also return [path]
857
+ # when path happens to be a file
858
+ out[path] = {}
859
+ names = sorted(out)
860
+ if not detail:
861
+ return names
862
+ else:
863
+ return {name: out[name] for name in names}
864
+
865
+ async def _expand_path(self, path, recursive=False, maxdepth=None):
866
+ if maxdepth is not None and maxdepth < 1:
867
+ raise ValueError("maxdepth must be at least 1")
868
+
869
+ if isinstance(path, str):
870
+ out = await self._expand_path([path], recursive, maxdepth)
871
+ else:
872
+ out = set()
873
+ path = [self._strip_protocol(p) for p in path]
874
+ for p in path: # can gather here
875
+ if has_magic(p):
876
+ bit = set(await self._glob(p, maxdepth=maxdepth))
877
+ out |= bit
878
+ if recursive:
879
+ # glob call above expanded one depth so if maxdepth is defined
880
+ # then decrement it in expand_path call below. If it is zero
881
+ # after decrementing then avoid expand_path call.
882
+ if maxdepth is not None and maxdepth <= 1:
883
+ continue
884
+ out |= set(
885
+ await self._expand_path(
886
+ list(bit),
887
+ recursive=recursive,
888
+ maxdepth=maxdepth - 1 if maxdepth is not None else None,
889
+ )
890
+ )
891
+ continue
892
+ elif recursive:
893
+ rec = set(await self._find(p, maxdepth=maxdepth, withdirs=True))
894
+ out |= rec
895
+ if p not in out and (recursive is False or (await self._exists(p))):
896
+ # should only check once, for the root
897
+ out.add(p)
898
+ if not out:
899
+ raise FileNotFoundError(path)
900
+ return sorted(out)
901
+
902
+ async def _mkdir(self, path, create_parents=True, **kwargs):
903
+ pass # not necessary to implement, may not have directories
904
+
905
+ async def _makedirs(self, path, exist_ok=False):
906
+ pass # not necessary to implement, may not have directories
907
+
908
+ async def open_async(self, path, mode="rb", **kwargs):
909
+ if "b" not in mode or kwargs.get("compression"):
910
+ raise ValueError
911
+ raise NotImplementedError
912
+
913
+
914
+ def mirror_sync_methods(obj):
915
+ """Populate sync and async methods for obj
916
+
917
+ For each method will create a sync version if the name refers to an async method
918
+ (coroutine) and there is no override in the child class; will create an async
919
+ method for the corresponding sync method if there is no implementation.
920
+
921
+ Uses the methods specified in
922
+ - async_methods: the set that an implementation is expected to provide
923
+ - default_async_methods: that can be derived from their sync version in
924
+ AbstractFileSystem
925
+ - AsyncFileSystem: async-specific default coroutines
926
+ """
927
+ from fsspec import AbstractFileSystem
928
+
929
+ for method in async_methods + dir(AsyncFileSystem):
930
+ if not method.startswith("_"):
931
+ continue
932
+ smethod = method[1:]
933
+ if private.match(method):
934
+ isco = inspect.iscoroutinefunction(getattr(obj, method, None))
935
+ unsync = getattr(getattr(obj, smethod, False), "__func__", None)
936
+ is_default = unsync is getattr(AbstractFileSystem, smethod, "")
937
+ if isco and is_default:
938
+ mth = sync_wrapper(getattr(obj, method), obj=obj)
939
+ setattr(obj, smethod, mth)
940
+ if not mth.__doc__:
941
+ mth.__doc__ = getattr(
942
+ getattr(AbstractFileSystem, smethod, None), "__doc__", ""
943
+ )
944
+
945
+
946
+ class FSSpecCoroutineCancel(Exception):
947
+ pass
948
+
949
+
950
+ def _dump_running_tasks(
951
+ printout=True, cancel=True, exc=FSSpecCoroutineCancel, with_task=False
952
+ ):
953
+ import traceback
954
+
955
+ tasks = [t for t in asyncio.tasks.all_tasks(loop[0]) if not t.done()]
956
+ if printout:
957
+ [task.print_stack() for task in tasks]
958
+ out = [
959
+ {
960
+ "locals": task._coro.cr_frame.f_locals,
961
+ "file": task._coro.cr_frame.f_code.co_filename,
962
+ "firstline": task._coro.cr_frame.f_code.co_firstlineno,
963
+ "linelo": task._coro.cr_frame.f_lineno,
964
+ "stack": traceback.format_stack(task._coro.cr_frame),
965
+ "task": task if with_task else None,
966
+ }
967
+ for task in tasks
968
+ ]
969
+ if cancel:
970
+ for t in tasks:
971
+ cbs = t._callbacks
972
+ t.cancel()
973
+ asyncio.futures.Future.set_exception(t, exc)
974
+ asyncio.futures.Future.cancel(t)
975
+ [cb[0](t) for cb in cbs] # cancels any dependent concurrent.futures
976
+ try:
977
+ t._coro.throw(exc) # exits coro, unless explicitly handled
978
+ except exc:
979
+ pass
980
+ return out
981
+
982
+
983
+ class AbstractAsyncStreamedFile(AbstractBufferedFile):
984
+ # no read buffering, and always auto-commit
985
+ # TODO: readahead might still be useful here, but needs async version
986
+
987
+ async def read(self, length=-1):
988
+ """
989
+ Return data from cache, or fetch pieces as necessary
990
+
991
+ Parameters
992
+ ----------
993
+ length: int (-1)
994
+ Number of bytes to read; if <0, all remaining bytes.
995
+ """
996
+ length = -1 if length is None else int(length)
997
+ if self.mode != "rb":
998
+ raise ValueError("File not in read mode")
999
+ if length < 0:
1000
+ length = self.size - self.loc
1001
+ if self.closed:
1002
+ raise ValueError("I/O operation on closed file.")
1003
+ if length == 0:
1004
+ # don't even bother calling fetch
1005
+ return b""
1006
+ out = await self._fetch_range(self.loc, self.loc + length)
1007
+ self.loc += len(out)
1008
+ return out
1009
+
1010
+ async def write(self, data):
1011
+ """
1012
+ Write data to buffer.
1013
+
1014
+ Buffer only sent on flush() or if buffer is greater than
1015
+ or equal to blocksize.
1016
+
1017
+ Parameters
1018
+ ----------
1019
+ data: bytes
1020
+ Set of bytes to be written.
1021
+ """
1022
+ if self.mode not in {"wb", "ab"}:
1023
+ raise ValueError("File not in write mode")
1024
+ if self.closed:
1025
+ raise ValueError("I/O operation on closed file.")
1026
+ if self.forced:
1027
+ raise ValueError("This file has been force-flushed, can only close")
1028
+ out = self.buffer.write(data)
1029
+ self.loc += out
1030
+ if self.buffer.tell() >= self.blocksize:
1031
+ await self.flush()
1032
+ return out
1033
+
1034
+ async def close(self):
1035
+ """Close file
1036
+
1037
+ Finalizes writes, discards cache
1038
+ """
1039
+ if getattr(self, "_unclosable", False):
1040
+ return
1041
+ if self.closed:
1042
+ return
1043
+ if self.mode == "rb":
1044
+ self.cache = None
1045
+ else:
1046
+ if not self.forced:
1047
+ await self.flush(force=True)
1048
+
1049
+ if self.fs is not None:
1050
+ self.fs.invalidate_cache(self.path)
1051
+ self.fs.invalidate_cache(self.fs._parent(self.path))
1052
+
1053
+ self.closed = True
1054
+
1055
+ async def flush(self, force=False):
1056
+ if self.closed:
1057
+ raise ValueError("Flush on closed file")
1058
+ if force and self.forced:
1059
+ raise ValueError("Force flush cannot be called more than once")
1060
+ if force:
1061
+ self.forced = True
1062
+
1063
+ if self.mode not in {"wb", "ab"}:
1064
+ # no-op to flush on read-mode
1065
+ return
1066
+
1067
+ if not force and self.buffer.tell() < self.blocksize:
1068
+ # Defer write on small block
1069
+ return
1070
+
1071
+ if self.offset is None:
1072
+ # Initialize a multipart upload
1073
+ self.offset = 0
1074
+ try:
1075
+ await self._initiate_upload()
1076
+ except:
1077
+ self.closed = True
1078
+ raise
1079
+
1080
+ if await self._upload_chunk(final=force) is not False:
1081
+ self.offset += self.buffer.seek(0, 2)
1082
+ self.buffer = io.BytesIO()
1083
+
1084
+ async def __aenter__(self):
1085
+ return self
1086
+
1087
+ async def __aexit__(self, exc_type, exc_val, exc_tb):
1088
+ await self.close()
1089
+
1090
+ async def _fetch_range(self, start, end):
1091
+ raise NotImplementedError
1092
+
1093
+ async def _initiate_upload(self):
1094
+ pass
1095
+
1096
+ async def _upload_chunk(self, final=False):
1097
+ raise NotImplementedError
phivenv/Lib/site-packages/fsspec/caching.py ADDED
@@ -0,0 +1,1004 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import collections
4
+ import functools
5
+ import logging
6
+ import math
7
+ import os
8
+ import threading
9
+ import warnings
10
+ from collections import OrderedDict
11
+ from concurrent.futures import Future, ThreadPoolExecutor
12
+ from itertools import groupby
13
+ from operator import itemgetter
14
+ from typing import (
15
+ TYPE_CHECKING,
16
+ Any,
17
+ Callable,
18
+ ClassVar,
19
+ Generic,
20
+ NamedTuple,
21
+ TypeVar,
22
+ )
23
+
24
+ if TYPE_CHECKING:
25
+ import mmap
26
+
27
+ from typing_extensions import ParamSpec
28
+
29
+ P = ParamSpec("P")
30
+ else:
31
+ P = TypeVar("P")
32
+
33
+ T = TypeVar("T")
34
+
35
+
36
+ logger = logging.getLogger("fsspec")
37
+
38
+ Fetcher = Callable[[int, int], bytes] # Maps (start, end) to bytes
39
+ MultiFetcher = Callable[[list[int, int]], bytes] # Maps [(start, end)] to bytes
40
+
41
+
42
+ class BaseCache:
43
+ """Pass-though cache: doesn't keep anything, calls every time
44
+
45
+ Acts as base class for other cachers
46
+
47
+ Parameters
48
+ ----------
49
+ blocksize: int
50
+ How far to read ahead in numbers of bytes
51
+ fetcher: func
52
+ Function of the form f(start, end) which gets bytes from remote as
53
+ specified
54
+ size: int
55
+ How big this file is
56
+ """
57
+
58
+ name: ClassVar[str] = "none"
59
+
60
+ def __init__(self, blocksize: int, fetcher: Fetcher, size: int) -> None:
61
+ self.blocksize = blocksize
62
+ self.nblocks = 0
63
+ self.fetcher = fetcher
64
+ self.size = size
65
+ self.hit_count = 0
66
+ self.miss_count = 0
67
+ # the bytes that we actually requested
68
+ self.total_requested_bytes = 0
69
+
70
+ def _fetch(self, start: int | None, stop: int | None) -> bytes:
71
+ if start is None:
72
+ start = 0
73
+ if stop is None:
74
+ stop = self.size
75
+ if start >= self.size or start >= stop:
76
+ return b""
77
+ return self.fetcher(start, stop)
78
+
79
+ def _reset_stats(self) -> None:
80
+ """Reset hit and miss counts for a more ganular report e.g. by file."""
81
+ self.hit_count = 0
82
+ self.miss_count = 0
83
+ self.total_requested_bytes = 0
84
+
85
+ def _log_stats(self) -> str:
86
+ """Return a formatted string of the cache statistics."""
87
+ if self.hit_count == 0 and self.miss_count == 0:
88
+ # a cache that does nothing, this is for logs only
89
+ return ""
90
+ return f" , {self.name}: {self.hit_count} hits, {self.miss_count} misses, {self.total_requested_bytes} total requested bytes"
91
+
92
+ def __repr__(self) -> str:
93
+ # TODO: use rich for better formatting
94
+ return f"""
95
+ <{self.__class__.__name__}:
96
+ block size : {self.blocksize}
97
+ block count : {self.nblocks}
98
+ file size : {self.size}
99
+ cache hits : {self.hit_count}
100
+ cache misses: {self.miss_count}
101
+ total requested bytes: {self.total_requested_bytes}>
102
+ """
103
+
104
+
105
+ class MMapCache(BaseCache):
106
+ """memory-mapped sparse file cache
107
+
108
+ Opens temporary file, which is filled blocks-wise when data is requested.
109
+ Ensure there is enough disc space in the temporary location.
110
+
111
+ This cache method might only work on posix
112
+
113
+ Parameters
114
+ ----------
115
+ blocksize: int
116
+ How far to read ahead in numbers of bytes
117
+ fetcher: Fetcher
118
+ Function of the form f(start, end) which gets bytes from remote as
119
+ specified
120
+ size: int
121
+ How big this file is
122
+ location: str
123
+ Where to create the temporary file. If None, a temporary file is
124
+ created using tempfile.TemporaryFile().
125
+ blocks: set[int]
126
+ Set of block numbers that have already been fetched. If None, an empty
127
+ set is created.
128
+ multi_fetcher: MultiFetcher
129
+ Function of the form f([(start, end)]) which gets bytes from remote
130
+ as specified. This function is used to fetch multiple blocks at once.
131
+ If not specified, the fetcher function is used instead.
132
+ """
133
+
134
+ name = "mmap"
135
+
136
+ def __init__(
137
+ self,
138
+ blocksize: int,
139
+ fetcher: Fetcher,
140
+ size: int,
141
+ location: str | None = None,
142
+ blocks: set[int] | None = None,
143
+ multi_fetcher: MultiFetcher | None = None,
144
+ ) -> None:
145
+ super().__init__(blocksize, fetcher, size)
146
+ self.blocks = set() if blocks is None else blocks
147
+ self.location = location
148
+ self.multi_fetcher = multi_fetcher
149
+ self.cache = self._makefile()
150
+
151
+ def _makefile(self) -> mmap.mmap | bytearray:
152
+ import mmap
153
+ import tempfile
154
+
155
+ if self.size == 0:
156
+ return bytearray()
157
+
158
+ # posix version
159
+ if self.location is None or not os.path.exists(self.location):
160
+ if self.location is None:
161
+ fd = tempfile.TemporaryFile()
162
+ self.blocks = set()
163
+ else:
164
+ fd = open(self.location, "wb+")
165
+ fd.seek(self.size - 1)
166
+ fd.write(b"1")
167
+ fd.flush()
168
+ else:
169
+ fd = open(self.location, "r+b")
170
+
171
+ return mmap.mmap(fd.fileno(), self.size)
172
+
173
+ def _fetch(self, start: int | None, end: int | None) -> bytes:
174
+ logger.debug(f"MMap cache fetching {start}-{end}")
175
+ if start is None:
176
+ start = 0
177
+ if end is None:
178
+ end = self.size
179
+ if start >= self.size or start >= end:
180
+ return b""
181
+ start_block = start // self.blocksize
182
+ end_block = end // self.blocksize
183
+ block_range = range(start_block, end_block + 1)
184
+ # Determine which blocks need to be fetched. This sequence is sorted by construction.
185
+ need = (i for i in block_range if i not in self.blocks)
186
+ # Count the number of blocks already cached
187
+ self.hit_count += sum(1 for i in block_range if i in self.blocks)
188
+
189
+ ranges = []
190
+
191
+ # Consolidate needed blocks.
192
+ # Algorithm adapted from Python 2.x itertools documentation.
193
+ # We are grouping an enumerated sequence of blocks. By comparing when the difference
194
+ # between an ascending range (provided by enumerate) and the needed block numbers
195
+ # we can detect when the block number skips values. The key computes this difference.
196
+ # Whenever the difference changes, we know that we have previously cached block(s),
197
+ # and a new group is started. In other words, this algorithm neatly groups
198
+ # runs of consecutive block numbers so they can be fetched together.
199
+ for _, _blocks in groupby(enumerate(need), key=lambda x: x[0] - x[1]):
200
+ # Extract the blocks from the enumerated sequence
201
+ _blocks = tuple(map(itemgetter(1), _blocks))
202
+ # Compute start of first block
203
+ sstart = _blocks[0] * self.blocksize
204
+ # Compute the end of the last block. Last block may not be full size.
205
+ send = min(_blocks[-1] * self.blocksize + self.blocksize, self.size)
206
+
207
+ # Fetch bytes (could be multiple consecutive blocks)
208
+ self.total_requested_bytes += send - sstart
209
+ logger.debug(
210
+ f"MMap get blocks {_blocks[0]}-{_blocks[-1]} ({sstart}-{send})"
211
+ )
212
+ ranges.append((sstart, send))
213
+
214
+ # Update set of cached blocks
215
+ self.blocks.update(_blocks)
216
+ # Update cache statistics with number of blocks we had to cache
217
+ self.miss_count += len(_blocks)
218
+
219
+ if not ranges:
220
+ return self.cache[start:end]
221
+
222
+ if self.multi_fetcher:
223
+ logger.debug(f"MMap get blocks {ranges}")
224
+ for idx, r in enumerate(self.multi_fetcher(ranges)):
225
+ (sstart, send) = ranges[idx]
226
+ logger.debug(f"MMap copy block ({sstart}-{send}")
227
+ self.cache[sstart:send] = r
228
+ else:
229
+ for sstart, send in ranges:
230
+ logger.debug(f"MMap get block ({sstart}-{send}")
231
+ self.cache[sstart:send] = self.fetcher(sstart, send)
232
+
233
+ return self.cache[start:end]
234
+
235
+ def __getstate__(self) -> dict[str, Any]:
236
+ state = self.__dict__.copy()
237
+ # Remove the unpicklable entries.
238
+ del state["cache"]
239
+ return state
240
+
241
+ def __setstate__(self, state: dict[str, Any]) -> None:
242
+ # Restore instance attributes
243
+ self.__dict__.update(state)
244
+ self.cache = self._makefile()
245
+
246
+
247
+ class ReadAheadCache(BaseCache):
248
+ """Cache which reads only when we get beyond a block of data
249
+
250
+ This is a much simpler version of BytesCache, and does not attempt to
251
+ fill holes in the cache or keep fragments alive. It is best suited to
252
+ many small reads in a sequential order (e.g., reading lines from a file).
253
+ """
254
+
255
+ name = "readahead"
256
+
257
+ def __init__(self, blocksize: int, fetcher: Fetcher, size: int) -> None:
258
+ super().__init__(blocksize, fetcher, size)
259
+ self.cache = b""
260
+ self.start = 0
261
+ self.end = 0
262
+
263
+ def _fetch(self, start: int | None, end: int | None) -> bytes:
264
+ if start is None:
265
+ start = 0
266
+ if end is None or end > self.size:
267
+ end = self.size
268
+ if start >= self.size or start >= end:
269
+ return b""
270
+ l = end - start
271
+ if start >= self.start and end <= self.end:
272
+ # cache hit
273
+ self.hit_count += 1
274
+ return self.cache[start - self.start : end - self.start]
275
+ elif self.start <= start < self.end:
276
+ # partial hit
277
+ self.miss_count += 1
278
+ part = self.cache[start - self.start :]
279
+ l -= len(part)
280
+ start = self.end
281
+ else:
282
+ # miss
283
+ self.miss_count += 1
284
+ part = b""
285
+ end = min(self.size, end + self.blocksize)
286
+ self.total_requested_bytes += end - start
287
+ self.cache = self.fetcher(start, end) # new block replaces old
288
+ self.start = start
289
+ self.end = self.start + len(self.cache)
290
+ return part + self.cache[:l]
291
+
292
+
293
+ class FirstChunkCache(BaseCache):
294
+ """Caches the first block of a file only
295
+
296
+ This may be useful for file types where the metadata is stored in the header,
297
+ but is randomly accessed.
298
+ """
299
+
300
+ name = "first"
301
+
302
+ def __init__(self, blocksize: int, fetcher: Fetcher, size: int) -> None:
303
+ if blocksize > size:
304
+ # this will buffer the whole thing
305
+ blocksize = size
306
+ super().__init__(blocksize, fetcher, size)
307
+ self.cache: bytes | None = None
308
+
309
+ def _fetch(self, start: int | None, end: int | None) -> bytes:
310
+ start = start or 0
311
+ if start > self.size:
312
+ logger.debug("FirstChunkCache: requested start > file size")
313
+ return b""
314
+
315
+ end = min(end, self.size)
316
+
317
+ if start < self.blocksize:
318
+ if self.cache is None:
319
+ self.miss_count += 1
320
+ if end > self.blocksize:
321
+ self.total_requested_bytes += end
322
+ data = self.fetcher(0, end)
323
+ self.cache = data[: self.blocksize]
324
+ return data[start:]
325
+ self.cache = self.fetcher(0, self.blocksize)
326
+ self.total_requested_bytes += self.blocksize
327
+ part = self.cache[start:end]
328
+ if end > self.blocksize:
329
+ self.total_requested_bytes += end - self.blocksize
330
+ part += self.fetcher(self.blocksize, end)
331
+ self.hit_count += 1
332
+ return part
333
+ else:
334
+ self.miss_count += 1
335
+ self.total_requested_bytes += end - start
336
+ return self.fetcher(start, end)
337
+
338
+
339
+ class BlockCache(BaseCache):
340
+ """
341
+ Cache holding memory as a set of blocks.
342
+
343
+ Requests are only ever made ``blocksize`` at a time, and are
344
+ stored in an LRU cache. The least recently accessed block is
345
+ discarded when more than ``maxblocks`` are stored.
346
+
347
+ Parameters
348
+ ----------
349
+ blocksize : int
350
+ The number of bytes to store in each block.
351
+ Requests are only ever made for ``blocksize``, so this
352
+ should balance the overhead of making a request against
353
+ the granularity of the blocks.
354
+ fetcher : Callable
355
+ size : int
356
+ The total size of the file being cached.
357
+ maxblocks : int
358
+ The maximum number of blocks to cache for. The maximum memory
359
+ use for this cache is then ``blocksize * maxblocks``.
360
+ """
361
+
362
+ name = "blockcache"
363
+
364
+ def __init__(
365
+ self, blocksize: int, fetcher: Fetcher, size: int, maxblocks: int = 32
366
+ ) -> None:
367
+ super().__init__(blocksize, fetcher, size)
368
+ self.nblocks = math.ceil(size / blocksize)
369
+ self.maxblocks = maxblocks
370
+ self._fetch_block_cached = functools.lru_cache(maxblocks)(self._fetch_block)
371
+
372
+ def cache_info(self):
373
+ """
374
+ The statistics on the block cache.
375
+
376
+ Returns
377
+ -------
378
+ NamedTuple
379
+ Returned directly from the LRU Cache used internally.
380
+ """
381
+ return self._fetch_block_cached.cache_info()
382
+
383
+ def __getstate__(self) -> dict[str, Any]:
384
+ state = self.__dict__
385
+ del state["_fetch_block_cached"]
386
+ return state
387
+
388
+ def __setstate__(self, state: dict[str, Any]) -> None:
389
+ self.__dict__.update(state)
390
+ self._fetch_block_cached = functools.lru_cache(state["maxblocks"])(
391
+ self._fetch_block
392
+ )
393
+
394
+ def _fetch(self, start: int | None, end: int | None) -> bytes:
395
+ if start is None:
396
+ start = 0
397
+ if end is None:
398
+ end = self.size
399
+ if start >= self.size or start >= end:
400
+ return b""
401
+
402
+ # byte position -> block numbers
403
+ start_block_number = start // self.blocksize
404
+ end_block_number = end // self.blocksize
405
+
406
+ # these are cached, so safe to do multiple calls for the same start and end.
407
+ for block_number in range(start_block_number, end_block_number + 1):
408
+ self._fetch_block_cached(block_number)
409
+
410
+ return self._read_cache(
411
+ start,
412
+ end,
413
+ start_block_number=start_block_number,
414
+ end_block_number=end_block_number,
415
+ )
416
+
417
+ def _fetch_block(self, block_number: int) -> bytes:
418
+ """
419
+ Fetch the block of data for `block_number`.
420
+ """
421
+ if block_number > self.nblocks:
422
+ raise ValueError(
423
+ f"'block_number={block_number}' is greater than "
424
+ f"the number of blocks ({self.nblocks})"
425
+ )
426
+
427
+ start = block_number * self.blocksize
428
+ end = start + self.blocksize
429
+ self.total_requested_bytes += end - start
430
+ self.miss_count += 1
431
+ logger.info("BlockCache fetching block %d", block_number)
432
+ block_contents = super()._fetch(start, end)
433
+ return block_contents
434
+
435
+ def _read_cache(
436
+ self, start: int, end: int, start_block_number: int, end_block_number: int
437
+ ) -> bytes:
438
+ """
439
+ Read from our block cache.
440
+
441
+ Parameters
442
+ ----------
443
+ start, end : int
444
+ The start and end byte positions.
445
+ start_block_number, end_block_number : int
446
+ The start and end block numbers.
447
+ """
448
+ start_pos = start % self.blocksize
449
+ end_pos = end % self.blocksize
450
+
451
+ self.hit_count += 1
452
+ if start_block_number == end_block_number:
453
+ block: bytes = self._fetch_block_cached(start_block_number)
454
+ return block[start_pos:end_pos]
455
+
456
+ else:
457
+ # read from the initial
458
+ out = [self._fetch_block_cached(start_block_number)[start_pos:]]
459
+
460
+ # intermediate blocks
461
+ # Note: it'd be nice to combine these into one big request. However
462
+ # that doesn't play nicely with our LRU cache.
463
+ out.extend(
464
+ map(
465
+ self._fetch_block_cached,
466
+ range(start_block_number + 1, end_block_number),
467
+ )
468
+ )
469
+
470
+ # final block
471
+ out.append(self._fetch_block_cached(end_block_number)[:end_pos])
472
+
473
+ return b"".join(out)
474
+
475
+
476
+ class BytesCache(BaseCache):
477
+ """Cache which holds data in a in-memory bytes object
478
+
479
+ Implements read-ahead by the block size, for semi-random reads progressing
480
+ through the file.
481
+
482
+ Parameters
483
+ ----------
484
+ trim: bool
485
+ As we read more data, whether to discard the start of the buffer when
486
+ we are more than a blocksize ahead of it.
487
+ """
488
+
489
+ name: ClassVar[str] = "bytes"
490
+
491
+ def __init__(
492
+ self, blocksize: int, fetcher: Fetcher, size: int, trim: bool = True
493
+ ) -> None:
494
+ super().__init__(blocksize, fetcher, size)
495
+ self.cache = b""
496
+ self.start: int | None = None
497
+ self.end: int | None = None
498
+ self.trim = trim
499
+
500
+ def _fetch(self, start: int | None, end: int | None) -> bytes:
501
+ # TODO: only set start/end after fetch, in case it fails?
502
+ # is this where retry logic might go?
503
+ if start is None:
504
+ start = 0
505
+ if end is None:
506
+ end = self.size
507
+ if start >= self.size or start >= end:
508
+ return b""
509
+ if (
510
+ self.start is not None
511
+ and start >= self.start
512
+ and self.end is not None
513
+ and end < self.end
514
+ ):
515
+ # cache hit: we have all the required data
516
+ offset = start - self.start
517
+ self.hit_count += 1
518
+ return self.cache[offset : offset + end - start]
519
+
520
+ if self.blocksize:
521
+ bend = min(self.size, end + self.blocksize)
522
+ else:
523
+ bend = end
524
+
525
+ if bend == start or start > self.size:
526
+ return b""
527
+
528
+ if (self.start is None or start < self.start) and (
529
+ self.end is None or end > self.end
530
+ ):
531
+ # First read, or extending both before and after
532
+ self.total_requested_bytes += bend - start
533
+ self.miss_count += 1
534
+ self.cache = self.fetcher(start, bend)
535
+ self.start = start
536
+ else:
537
+ assert self.start is not None
538
+ assert self.end is not None
539
+ self.miss_count += 1
540
+
541
+ if start < self.start:
542
+ if self.end is None or self.end - end > self.blocksize:
543
+ self.total_requested_bytes += bend - start
544
+ self.cache = self.fetcher(start, bend)
545
+ self.start = start
546
+ else:
547
+ self.total_requested_bytes += self.start - start
548
+ new = self.fetcher(start, self.start)
549
+ self.start = start
550
+ self.cache = new + self.cache
551
+ elif self.end is not None and bend > self.end:
552
+ if self.end > self.size:
553
+ pass
554
+ elif end - self.end > self.blocksize:
555
+ self.total_requested_bytes += bend - start
556
+ self.cache = self.fetcher(start, bend)
557
+ self.start = start
558
+ else:
559
+ self.total_requested_bytes += bend - self.end
560
+ new = self.fetcher(self.end, bend)
561
+ self.cache = self.cache + new
562
+
563
+ self.end = self.start + len(self.cache)
564
+ offset = start - self.start
565
+ out = self.cache[offset : offset + end - start]
566
+ if self.trim:
567
+ num = (self.end - self.start) // (self.blocksize + 1)
568
+ if num > 1:
569
+ self.start += self.blocksize * num
570
+ self.cache = self.cache[self.blocksize * num :]
571
+ return out
572
+
573
+ def __len__(self) -> int:
574
+ return len(self.cache)
575
+
576
+
577
+ class AllBytes(BaseCache):
578
+ """Cache entire contents of the file"""
579
+
580
+ name: ClassVar[str] = "all"
581
+
582
+ def __init__(
583
+ self,
584
+ blocksize: int | None = None,
585
+ fetcher: Fetcher | None = None,
586
+ size: int | None = None,
587
+ data: bytes | None = None,
588
+ ) -> None:
589
+ super().__init__(blocksize, fetcher, size) # type: ignore[arg-type]
590
+ if data is None:
591
+ self.miss_count += 1
592
+ self.total_requested_bytes += self.size
593
+ data = self.fetcher(0, self.size)
594
+ self.data = data
595
+
596
+ def _fetch(self, start: int | None, stop: int | None) -> bytes:
597
+ self.hit_count += 1
598
+ return self.data[start:stop]
599
+
600
+
601
+ class KnownPartsOfAFile(BaseCache):
602
+ """
603
+ Cache holding known file parts.
604
+
605
+ Parameters
606
+ ----------
607
+ blocksize: int
608
+ How far to read ahead in numbers of bytes
609
+ fetcher: func
610
+ Function of the form f(start, end) which gets bytes from remote as
611
+ specified
612
+ size: int
613
+ How big this file is
614
+ data: dict
615
+ A dictionary mapping explicit `(start, stop)` file-offset tuples
616
+ with known bytes.
617
+ strict: bool, default True
618
+ Whether to fetch reads that go beyond a known byte-range boundary.
619
+ If `False`, any read that ends outside a known part will be zero
620
+ padded. Note that zero padding will not be used for reads that
621
+ begin outside a known byte-range.
622
+ """
623
+
624
+ name: ClassVar[str] = "parts"
625
+
626
+ def __init__(
627
+ self,
628
+ blocksize: int,
629
+ fetcher: Fetcher,
630
+ size: int,
631
+ data: dict[tuple[int, int], bytes] | None = None,
632
+ strict: bool = True,
633
+ **_: Any,
634
+ ):
635
+ super().__init__(blocksize, fetcher, size)
636
+ self.strict = strict
637
+
638
+ # simple consolidation of contiguous blocks
639
+ if data:
640
+ old_offsets = sorted(data.keys())
641
+ offsets = [old_offsets[0]]
642
+ blocks = [data.pop(old_offsets[0])]
643
+ for start, stop in old_offsets[1:]:
644
+ start0, stop0 = offsets[-1]
645
+ if start == stop0:
646
+ offsets[-1] = (start0, stop)
647
+ blocks[-1] += data.pop((start, stop))
648
+ else:
649
+ offsets.append((start, stop))
650
+ blocks.append(data.pop((start, stop)))
651
+
652
+ self.data = dict(zip(offsets, blocks))
653
+ else:
654
+ self.data = {}
655
+
656
+ def _fetch(self, start: int | None, stop: int | None) -> bytes:
657
+ if start is None:
658
+ start = 0
659
+ if stop is None:
660
+ stop = self.size
661
+
662
+ out = b""
663
+ for (loc0, loc1), data in self.data.items():
664
+ # If self.strict=False, use zero-padded data
665
+ # for reads beyond the end of a "known" buffer
666
+ if loc0 <= start < loc1:
667
+ off = start - loc0
668
+ out = data[off : off + stop - start]
669
+ if not self.strict or loc0 <= stop <= loc1:
670
+ # The request is within a known range, or
671
+ # it begins within a known range, and we
672
+ # are allowed to pad reads beyond the
673
+ # buffer with zero
674
+ out += b"\x00" * (stop - start - len(out))
675
+ self.hit_count += 1
676
+ return out
677
+ else:
678
+ # The request ends outside a known range,
679
+ # and we are being "strict" about reads
680
+ # beyond the buffer
681
+ start = loc1
682
+ break
683
+
684
+ # We only get here if there is a request outside the
685
+ # known parts of the file. In an ideal world, this
686
+ # should never happen
687
+ if self.fetcher is None:
688
+ # We cannot fetch the data, so raise an error
689
+ raise ValueError(f"Read is outside the known file parts: {(start, stop)}. ")
690
+ # We can fetch the data, but should warn the user
691
+ # that this may be slow
692
+ warnings.warn(
693
+ f"Read is outside the known file parts: {(start, stop)}. "
694
+ f"IO/caching performance may be poor!"
695
+ )
696
+ logger.debug(f"KnownPartsOfAFile cache fetching {start}-{stop}")
697
+ self.total_requested_bytes += stop - start
698
+ self.miss_count += 1
699
+ return out + super()._fetch(start, stop)
700
+
701
+
702
+ class UpdatableLRU(Generic[P, T]):
703
+ """
704
+ Custom implementation of LRU cache that allows updating keys
705
+
706
+ Used by BackgroudBlockCache
707
+ """
708
+
709
+ class CacheInfo(NamedTuple):
710
+ hits: int
711
+ misses: int
712
+ maxsize: int
713
+ currsize: int
714
+
715
+ def __init__(self, func: Callable[P, T], max_size: int = 128) -> None:
716
+ self._cache: OrderedDict[Any, T] = collections.OrderedDict()
717
+ self._func = func
718
+ self._max_size = max_size
719
+ self._hits = 0
720
+ self._misses = 0
721
+ self._lock = threading.Lock()
722
+
723
+ def __call__(self, *args: P.args, **kwargs: P.kwargs) -> T:
724
+ if kwargs:
725
+ raise TypeError(f"Got unexpected keyword argument {kwargs.keys()}")
726
+ with self._lock:
727
+ if args in self._cache:
728
+ self._cache.move_to_end(args)
729
+ self._hits += 1
730
+ return self._cache[args]
731
+
732
+ result = self._func(*args, **kwargs)
733
+
734
+ with self._lock:
735
+ self._cache[args] = result
736
+ self._misses += 1
737
+ if len(self._cache) > self._max_size:
738
+ self._cache.popitem(last=False)
739
+
740
+ return result
741
+
742
+ def is_key_cached(self, *args: Any) -> bool:
743
+ with self._lock:
744
+ return args in self._cache
745
+
746
+ def add_key(self, result: T, *args: Any) -> None:
747
+ with self._lock:
748
+ self._cache[args] = result
749
+ if len(self._cache) > self._max_size:
750
+ self._cache.popitem(last=False)
751
+
752
+ def cache_info(self) -> UpdatableLRU.CacheInfo:
753
+ with self._lock:
754
+ return self.CacheInfo(
755
+ maxsize=self._max_size,
756
+ currsize=len(self._cache),
757
+ hits=self._hits,
758
+ misses=self._misses,
759
+ )
760
+
761
+
762
+ class BackgroundBlockCache(BaseCache):
763
+ """
764
+ Cache holding memory as a set of blocks with pre-loading of
765
+ the next block in the background.
766
+
767
+ Requests are only ever made ``blocksize`` at a time, and are
768
+ stored in an LRU cache. The least recently accessed block is
769
+ discarded when more than ``maxblocks`` are stored. If the
770
+ next block is not in cache, it is loaded in a separate thread
771
+ in non-blocking way.
772
+
773
+ Parameters
774
+ ----------
775
+ blocksize : int
776
+ The number of bytes to store in each block.
777
+ Requests are only ever made for ``blocksize``, so this
778
+ should balance the overhead of making a request against
779
+ the granularity of the blocks.
780
+ fetcher : Callable
781
+ size : int
782
+ The total size of the file being cached.
783
+ maxblocks : int
784
+ The maximum number of blocks to cache for. The maximum memory
785
+ use for this cache is then ``blocksize * maxblocks``.
786
+ """
787
+
788
+ name: ClassVar[str] = "background"
789
+
790
+ def __init__(
791
+ self, blocksize: int, fetcher: Fetcher, size: int, maxblocks: int = 32
792
+ ) -> None:
793
+ super().__init__(blocksize, fetcher, size)
794
+ self.nblocks = math.ceil(size / blocksize)
795
+ self.maxblocks = maxblocks
796
+ self._fetch_block_cached = UpdatableLRU(self._fetch_block, maxblocks)
797
+
798
+ self._thread_executor = ThreadPoolExecutor(max_workers=1)
799
+ self._fetch_future_block_number: int | None = None
800
+ self._fetch_future: Future[bytes] | None = None
801
+ self._fetch_future_lock = threading.Lock()
802
+
803
+ def cache_info(self) -> UpdatableLRU.CacheInfo:
804
+ """
805
+ The statistics on the block cache.
806
+
807
+ Returns
808
+ -------
809
+ NamedTuple
810
+ Returned directly from the LRU Cache used internally.
811
+ """
812
+ return self._fetch_block_cached.cache_info()
813
+
814
+ def __getstate__(self) -> dict[str, Any]:
815
+ state = self.__dict__
816
+ del state["_fetch_block_cached"]
817
+ del state["_thread_executor"]
818
+ del state["_fetch_future_block_number"]
819
+ del state["_fetch_future"]
820
+ del state["_fetch_future_lock"]
821
+ return state
822
+
823
+ def __setstate__(self, state) -> None:
824
+ self.__dict__.update(state)
825
+ self._fetch_block_cached = UpdatableLRU(self._fetch_block, state["maxblocks"])
826
+ self._thread_executor = ThreadPoolExecutor(max_workers=1)
827
+ self._fetch_future_block_number = None
828
+ self._fetch_future = None
829
+ self._fetch_future_lock = threading.Lock()
830
+
831
+ def _fetch(self, start: int | None, end: int | None) -> bytes:
832
+ if start is None:
833
+ start = 0
834
+ if end is None:
835
+ end = self.size
836
+ if start >= self.size or start >= end:
837
+ return b""
838
+
839
+ # byte position -> block numbers
840
+ start_block_number = start // self.blocksize
841
+ end_block_number = end // self.blocksize
842
+
843
+ fetch_future_block_number = None
844
+ fetch_future = None
845
+ with self._fetch_future_lock:
846
+ # Background thread is running. Check we we can or must join it.
847
+ if self._fetch_future is not None:
848
+ assert self._fetch_future_block_number is not None
849
+ if self._fetch_future.done():
850
+ logger.info("BlockCache joined background fetch without waiting.")
851
+ self._fetch_block_cached.add_key(
852
+ self._fetch_future.result(), self._fetch_future_block_number
853
+ )
854
+ # Cleanup the fetch variables. Done with fetching the block.
855
+ self._fetch_future_block_number = None
856
+ self._fetch_future = None
857
+ else:
858
+ # Must join if we need the block for the current fetch
859
+ must_join = bool(
860
+ start_block_number
861
+ <= self._fetch_future_block_number
862
+ <= end_block_number
863
+ )
864
+ if must_join:
865
+ # Copy to the local variables to release lock
866
+ # before waiting for result
867
+ fetch_future_block_number = self._fetch_future_block_number
868
+ fetch_future = self._fetch_future
869
+
870
+ # Cleanup the fetch variables. Have a local copy.
871
+ self._fetch_future_block_number = None
872
+ self._fetch_future = None
873
+
874
+ # Need to wait for the future for the current read
875
+ if fetch_future is not None:
876
+ logger.info("BlockCache waiting for background fetch.")
877
+ # Wait until result and put it in cache
878
+ self._fetch_block_cached.add_key(
879
+ fetch_future.result(), fetch_future_block_number
880
+ )
881
+
882
+ # these are cached, so safe to do multiple calls for the same start and end.
883
+ for block_number in range(start_block_number, end_block_number + 1):
884
+ self._fetch_block_cached(block_number)
885
+
886
+ # fetch next block in the background if nothing is running in the background,
887
+ # the block is within file and it is not already cached
888
+ end_block_plus_1 = end_block_number + 1
889
+ with self._fetch_future_lock:
890
+ if (
891
+ self._fetch_future is None
892
+ and end_block_plus_1 <= self.nblocks
893
+ and not self._fetch_block_cached.is_key_cached(end_block_plus_1)
894
+ ):
895
+ self._fetch_future_block_number = end_block_plus_1
896
+ self._fetch_future = self._thread_executor.submit(
897
+ self._fetch_block, end_block_plus_1, "async"
898
+ )
899
+
900
+ return self._read_cache(
901
+ start,
902
+ end,
903
+ start_block_number=start_block_number,
904
+ end_block_number=end_block_number,
905
+ )
906
+
907
+ def _fetch_block(self, block_number: int, log_info: str = "sync") -> bytes:
908
+ """
909
+ Fetch the block of data for `block_number`.
910
+ """
911
+ if block_number > self.nblocks:
912
+ raise ValueError(
913
+ f"'block_number={block_number}' is greater than "
914
+ f"the number of blocks ({self.nblocks})"
915
+ )
916
+
917
+ start = block_number * self.blocksize
918
+ end = start + self.blocksize
919
+ logger.info("BlockCache fetching block (%s) %d", log_info, block_number)
920
+ self.total_requested_bytes += end - start
921
+ self.miss_count += 1
922
+ block_contents = super()._fetch(start, end)
923
+ return block_contents
924
+
925
+ def _read_cache(
926
+ self, start: int, end: int, start_block_number: int, end_block_number: int
927
+ ) -> bytes:
928
+ """
929
+ Read from our block cache.
930
+
931
+ Parameters
932
+ ----------
933
+ start, end : int
934
+ The start and end byte positions.
935
+ start_block_number, end_block_number : int
936
+ The start and end block numbers.
937
+ """
938
+ start_pos = start % self.blocksize
939
+ end_pos = end % self.blocksize
940
+
941
+ # kind of pointless to count this as a hit, but it is
942
+ self.hit_count += 1
943
+
944
+ if start_block_number == end_block_number:
945
+ block = self._fetch_block_cached(start_block_number)
946
+ return block[start_pos:end_pos]
947
+
948
+ else:
949
+ # read from the initial
950
+ out = [self._fetch_block_cached(start_block_number)[start_pos:]]
951
+
952
+ # intermediate blocks
953
+ # Note: it'd be nice to combine these into one big request. However
954
+ # that doesn't play nicely with our LRU cache.
955
+ out.extend(
956
+ map(
957
+ self._fetch_block_cached,
958
+ range(start_block_number + 1, end_block_number),
959
+ )
960
+ )
961
+
962
+ # final block
963
+ out.append(self._fetch_block_cached(end_block_number)[:end_pos])
964
+
965
+ return b"".join(out)
966
+
967
+
968
+ caches: dict[str | None, type[BaseCache]] = {
969
+ # one custom case
970
+ None: BaseCache,
971
+ }
972
+
973
+
974
+ def register_cache(cls: type[BaseCache], clobber: bool = False) -> None:
975
+ """'Register' cache implementation.
976
+
977
+ Parameters
978
+ ----------
979
+ clobber: bool, optional
980
+ If set to True (default is False) - allow to overwrite existing
981
+ entry.
982
+
983
+ Raises
984
+ ------
985
+ ValueError
986
+ """
987
+ name = cls.name
988
+ if not clobber and name in caches:
989
+ raise ValueError(f"Cache with name {name!r} is already known: {caches[name]}")
990
+ caches[name] = cls
991
+
992
+
993
+ for c in (
994
+ BaseCache,
995
+ MMapCache,
996
+ BytesCache,
997
+ ReadAheadCache,
998
+ BlockCache,
999
+ FirstChunkCache,
1000
+ AllBytes,
1001
+ KnownPartsOfAFile,
1002
+ BackgroundBlockCache,
1003
+ ):
1004
+ register_cache(c)
phivenv/Lib/site-packages/fsspec/callbacks.py ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from functools import wraps
2
+
3
+
4
+ class Callback:
5
+ """
6
+ Base class and interface for callback mechanism
7
+
8
+ This class can be used directly for monitoring file transfers by
9
+ providing ``callback=Callback(hooks=...)`` (see the ``hooks`` argument,
10
+ below), or subclassed for more specialised behaviour.
11
+
12
+ Parameters
13
+ ----------
14
+ size: int (optional)
15
+ Nominal quantity for the value that corresponds to a complete
16
+ transfer, e.g., total number of tiles or total number of
17
+ bytes
18
+ value: int (0)
19
+ Starting internal counter value
20
+ hooks: dict or None
21
+ A dict of named functions to be called on each update. The signature
22
+ of these must be ``f(size, value, **kwargs)``
23
+ """
24
+
25
+ def __init__(self, size=None, value=0, hooks=None, **kwargs):
26
+ self.size = size
27
+ self.value = value
28
+ self.hooks = hooks or {}
29
+ self.kw = kwargs
30
+
31
+ def __enter__(self):
32
+ return self
33
+
34
+ def __exit__(self, *exc_args):
35
+ self.close()
36
+
37
+ def close(self):
38
+ """Close callback."""
39
+
40
+ def branched(self, path_1, path_2, **kwargs):
41
+ """
42
+ Return callback for child transfers
43
+
44
+ If this callback is operating at a higher level, e.g., put, which may
45
+ trigger transfers that can also be monitored. The function returns a callback
46
+ that has to be passed to the child method, e.g., put_file,
47
+ as `callback=` argument.
48
+
49
+ The implementation uses `callback.branch` for compatibility.
50
+ When implementing callbacks, it is recommended to override this function instead
51
+ of `branch` and avoid calling `super().branched(...)`.
52
+
53
+ Prefer using this function over `branch`.
54
+
55
+ Parameters
56
+ ----------
57
+ path_1: str
58
+ Child's source path
59
+ path_2: str
60
+ Child's destination path
61
+ **kwargs:
62
+ Arbitrary keyword arguments
63
+
64
+ Returns
65
+ -------
66
+ callback: Callback
67
+ A callback instance to be passed to the child method
68
+ """
69
+ self.branch(path_1, path_2, kwargs)
70
+ # mutate kwargs so that we can force the caller to pass "callback=" explicitly
71
+ return kwargs.pop("callback", DEFAULT_CALLBACK)
72
+
73
+ def branch_coro(self, fn):
74
+ """
75
+ Wraps a coroutine, and pass a new child callback to it.
76
+ """
77
+
78
+ @wraps(fn)
79
+ async def func(path1, path2: str, **kwargs):
80
+ with self.branched(path1, path2, **kwargs) as child:
81
+ return await fn(path1, path2, callback=child, **kwargs)
82
+
83
+ return func
84
+
85
+ def set_size(self, size):
86
+ """
87
+ Set the internal maximum size attribute
88
+
89
+ Usually called if not initially set at instantiation. Note that this
90
+ triggers a ``call()``.
91
+
92
+ Parameters
93
+ ----------
94
+ size: int
95
+ """
96
+ self.size = size
97
+ self.call()
98
+
99
+ def absolute_update(self, value):
100
+ """
101
+ Set the internal value state
102
+
103
+ Triggers ``call()``
104
+
105
+ Parameters
106
+ ----------
107
+ value: int
108
+ """
109
+ self.value = value
110
+ self.call()
111
+
112
+ def relative_update(self, inc=1):
113
+ """
114
+ Delta increment the internal counter
115
+
116
+ Triggers ``call()``
117
+
118
+ Parameters
119
+ ----------
120
+ inc: int
121
+ """
122
+ self.value += inc
123
+ self.call()
124
+
125
+ def call(self, hook_name=None, **kwargs):
126
+ """
127
+ Execute hook(s) with current state
128
+
129
+ Each function is passed the internal size and current value
130
+
131
+ Parameters
132
+ ----------
133
+ hook_name: str or None
134
+ If given, execute on this hook
135
+ kwargs: passed on to (all) hook(s)
136
+ """
137
+ if not self.hooks:
138
+ return
139
+ kw = self.kw.copy()
140
+ kw.update(kwargs)
141
+ if hook_name:
142
+ if hook_name not in self.hooks:
143
+ return
144
+ return self.hooks[hook_name](self.size, self.value, **kw)
145
+ for hook in self.hooks.values() or []:
146
+ hook(self.size, self.value, **kw)
147
+
148
+ def wrap(self, iterable):
149
+ """
150
+ Wrap an iterable to call ``relative_update`` on each iterations
151
+
152
+ Parameters
153
+ ----------
154
+ iterable: Iterable
155
+ The iterable that is being wrapped
156
+ """
157
+ for item in iterable:
158
+ self.relative_update()
159
+ yield item
160
+
161
+ def branch(self, path_1, path_2, kwargs):
162
+ """
163
+ Set callbacks for child transfers
164
+
165
+ If this callback is operating at a higher level, e.g., put, which may
166
+ trigger transfers that can also be monitored. The passed kwargs are
167
+ to be *mutated* to add ``callback=``, if this class supports branching
168
+ to children.
169
+
170
+ Parameters
171
+ ----------
172
+ path_1: str
173
+ Child's source path
174
+ path_2: str
175
+ Child's destination path
176
+ kwargs: dict
177
+ arguments passed to child method, e.g., put_file.
178
+
179
+ Returns
180
+ -------
181
+
182
+ """
183
+ return None
184
+
185
+ def no_op(self, *_, **__):
186
+ pass
187
+
188
+ def __getattr__(self, item):
189
+ """
190
+ If undefined methods are called on this class, nothing happens
191
+ """
192
+ return self.no_op
193
+
194
+ @classmethod
195
+ def as_callback(cls, maybe_callback=None):
196
+ """Transform callback=... into Callback instance
197
+
198
+ For the special value of ``None``, return the global instance of
199
+ ``NoOpCallback``. This is an alternative to including
200
+ ``callback=DEFAULT_CALLBACK`` directly in a method signature.
201
+ """
202
+ if maybe_callback is None:
203
+ return DEFAULT_CALLBACK
204
+ return maybe_callback
205
+
206
+
207
+ class NoOpCallback(Callback):
208
+ """
209
+ This implementation of Callback does exactly nothing
210
+ """
211
+
212
+ def call(self, *args, **kwargs):
213
+ return None
214
+
215
+
216
+ class DotPrinterCallback(Callback):
217
+ """
218
+ Simple example Callback implementation
219
+
220
+ Almost identical to Callback with a hook that prints a char; here we
221
+ demonstrate how the outer layer may print "#" and the inner layer "."
222
+ """
223
+
224
+ def __init__(self, chr_to_print="#", **kwargs):
225
+ self.chr = chr_to_print
226
+ super().__init__(**kwargs)
227
+
228
+ def branch(self, path_1, path_2, kwargs):
229
+ """Mutate kwargs to add new instance with different print char"""
230
+ kwargs["callback"] = DotPrinterCallback(".")
231
+
232
+ def call(self, **kwargs):
233
+ """Just outputs a character"""
234
+ print(self.chr, end="")
235
+
236
+
237
+ class TqdmCallback(Callback):
238
+ """
239
+ A callback to display a progress bar using tqdm
240
+
241
+ Parameters
242
+ ----------
243
+ tqdm_kwargs : dict, (optional)
244
+ Any argument accepted by the tqdm constructor.
245
+ See the `tqdm doc <https://tqdm.github.io/docs/tqdm/#__init__>`_.
246
+ Will be forwarded to `tqdm_cls`.
247
+ tqdm_cls: (optional)
248
+ subclass of `tqdm.tqdm`. If not passed, it will default to `tqdm.tqdm`.
249
+
250
+ Examples
251
+ --------
252
+ >>> import fsspec
253
+ >>> from fsspec.callbacks import TqdmCallback
254
+ >>> fs = fsspec.filesystem("memory")
255
+ >>> path2distant_data = "/your-path"
256
+ >>> fs.upload(
257
+ ".",
258
+ path2distant_data,
259
+ recursive=True,
260
+ callback=TqdmCallback(),
261
+ )
262
+
263
+ You can forward args to tqdm using the ``tqdm_kwargs`` parameter.
264
+
265
+ >>> fs.upload(
266
+ ".",
267
+ path2distant_data,
268
+ recursive=True,
269
+ callback=TqdmCallback(tqdm_kwargs={"desc": "Your tqdm description"}),
270
+ )
271
+
272
+ You can also customize the progress bar by passing a subclass of `tqdm`.
273
+
274
+ .. code-block:: python
275
+
276
+ class TqdmFormat(tqdm):
277
+ '''Provides a `total_time` format parameter'''
278
+ @property
279
+ def format_dict(self):
280
+ d = super().format_dict
281
+ total_time = d["elapsed"] * (d["total"] or 0) / max(d["n"], 1)
282
+ d.update(total_time=self.format_interval(total_time) + " in total")
283
+ return d
284
+
285
+ >>> with TqdmCallback(
286
+ tqdm_kwargs={
287
+ "desc": "desc",
288
+ "bar_format": "{total_time}: {percentage:.0f}%|{bar}{r_bar}",
289
+ },
290
+ tqdm_cls=TqdmFormat,
291
+ ) as callback:
292
+ fs.upload(".", path2distant_data, recursive=True, callback=callback)
293
+ """
294
+
295
+ def __init__(self, tqdm_kwargs=None, *args, **kwargs):
296
+ try:
297
+ from tqdm import tqdm
298
+
299
+ except ImportError as exce:
300
+ raise ImportError(
301
+ "Using TqdmCallback requires tqdm to be installed"
302
+ ) from exce
303
+
304
+ self._tqdm_cls = kwargs.pop("tqdm_cls", tqdm)
305
+ self._tqdm_kwargs = tqdm_kwargs or {}
306
+ self.tqdm = None
307
+ super().__init__(*args, **kwargs)
308
+
309
+ def call(self, *args, **kwargs):
310
+ if self.tqdm is None:
311
+ self.tqdm = self._tqdm_cls(total=self.size, **self._tqdm_kwargs)
312
+ self.tqdm.total = self.size
313
+ self.tqdm.update(self.value - self.tqdm.n)
314
+
315
+ def close(self):
316
+ if self.tqdm is not None:
317
+ self.tqdm.close()
318
+ self.tqdm = None
319
+
320
+ def __del__(self):
321
+ return self.close()
322
+
323
+
324
+ DEFAULT_CALLBACK = _DEFAULT_CALLBACK = NoOpCallback()
phivenv/Lib/site-packages/fsspec/compression.py ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Helper functions for a standard streaming compression API"""
2
+
3
+ from zipfile import ZipFile
4
+
5
+ import fsspec.utils
6
+ from fsspec.spec import AbstractBufferedFile
7
+
8
+
9
+ def noop_file(file, mode, **kwargs):
10
+ return file
11
+
12
+
13
+ # TODO: files should also be available as contexts
14
+ # should be functions of the form func(infile, mode=, **kwargs) -> file-like
15
+ compr = {None: noop_file}
16
+
17
+
18
+ def register_compression(name, callback, extensions, force=False):
19
+ """Register an "inferable" file compression type.
20
+
21
+ Registers transparent file compression type for use with fsspec.open.
22
+ Compression can be specified by name in open, or "infer"-ed for any files
23
+ ending with the given extensions.
24
+
25
+ Args:
26
+ name: (str) The compression type name. Eg. "gzip".
27
+ callback: A callable of form (infile, mode, **kwargs) -> file-like.
28
+ Accepts an input file-like object, the target mode and kwargs.
29
+ Returns a wrapped file-like object.
30
+ extensions: (str, Iterable[str]) A file extension, or list of file
31
+ extensions for which to infer this compression scheme. Eg. "gz".
32
+ force: (bool) Force re-registration of compression type or extensions.
33
+
34
+ Raises:
35
+ ValueError: If name or extensions already registered, and not force.
36
+
37
+ """
38
+ if isinstance(extensions, str):
39
+ extensions = [extensions]
40
+
41
+ # Validate registration
42
+ if name in compr and not force:
43
+ raise ValueError(f"Duplicate compression registration: {name}")
44
+
45
+ for ext in extensions:
46
+ if ext in fsspec.utils.compressions and not force:
47
+ raise ValueError(f"Duplicate compression file extension: {ext} ({name})")
48
+
49
+ compr[name] = callback
50
+
51
+ for ext in extensions:
52
+ fsspec.utils.compressions[ext] = name
53
+
54
+
55
+ def unzip(infile, mode="rb", filename=None, **kwargs):
56
+ if "r" not in mode:
57
+ filename = filename or "file"
58
+ z = ZipFile(infile, mode="w", **kwargs)
59
+ fo = z.open(filename, mode="w")
60
+ fo.close = lambda closer=fo.close: closer() or z.close()
61
+ return fo
62
+ z = ZipFile(infile)
63
+ if filename is None:
64
+ filename = z.namelist()[0]
65
+ return z.open(filename, mode="r", **kwargs)
66
+
67
+
68
+ register_compression("zip", unzip, "zip")
69
+
70
+ try:
71
+ from bz2 import BZ2File
72
+ except ImportError:
73
+ pass
74
+ else:
75
+ register_compression("bz2", BZ2File, "bz2")
76
+
77
+ try: # pragma: no cover
78
+ from isal import igzip
79
+
80
+ def isal(infile, mode="rb", **kwargs):
81
+ return igzip.IGzipFile(fileobj=infile, mode=mode, **kwargs)
82
+
83
+ register_compression("gzip", isal, "gz")
84
+ except ImportError:
85
+ from gzip import GzipFile
86
+
87
+ register_compression(
88
+ "gzip", lambda f, **kwargs: GzipFile(fileobj=f, **kwargs), "gz"
89
+ )
90
+
91
+ try:
92
+ from lzma import LZMAFile
93
+
94
+ register_compression("lzma", LZMAFile, "lzma")
95
+ register_compression("xz", LZMAFile, "xz")
96
+ except ImportError:
97
+ pass
98
+
99
+ try:
100
+ import lzmaffi
101
+
102
+ register_compression("lzma", lzmaffi.LZMAFile, "lzma", force=True)
103
+ register_compression("xz", lzmaffi.LZMAFile, "xz", force=True)
104
+ except ImportError:
105
+ pass
106
+
107
+
108
+ class SnappyFile(AbstractBufferedFile):
109
+ def __init__(self, infile, mode, **kwargs):
110
+ import snappy
111
+
112
+ super().__init__(
113
+ fs=None, path="snappy", mode=mode.strip("b") + "b", size=999999999, **kwargs
114
+ )
115
+ self.infile = infile
116
+ if "r" in mode:
117
+ self.codec = snappy.StreamDecompressor()
118
+ else:
119
+ self.codec = snappy.StreamCompressor()
120
+
121
+ def _upload_chunk(self, final=False):
122
+ self.buffer.seek(0)
123
+ out = self.codec.add_chunk(self.buffer.read())
124
+ self.infile.write(out)
125
+ return True
126
+
127
+ def seek(self, loc, whence=0):
128
+ raise NotImplementedError("SnappyFile is not seekable")
129
+
130
+ def seekable(self):
131
+ return False
132
+
133
+ def _fetch_range(self, start, end):
134
+ """Get the specified set of bytes from remote"""
135
+ data = self.infile.read(end - start)
136
+ return self.codec.decompress(data)
137
+
138
+
139
+ try:
140
+ import snappy
141
+
142
+ snappy.compress(b"")
143
+ # Snappy may use the .sz file extension, but this is not part of the
144
+ # standard implementation.
145
+ register_compression("snappy", SnappyFile, [])
146
+
147
+ except (ImportError, NameError, AttributeError):
148
+ pass
149
+
150
+ try:
151
+ import lz4.frame
152
+
153
+ register_compression("lz4", lz4.frame.open, "lz4")
154
+ except ImportError:
155
+ pass
156
+
157
+ try:
158
+ # zstd in the standard library for python >= 3.14
159
+ from compression.zstd import ZstdFile
160
+
161
+ register_compression("zstd", ZstdFile, "zst")
162
+
163
+ except ImportError:
164
+ try:
165
+ import zstandard as zstd
166
+
167
+ def zstandard_file(infile, mode="rb"):
168
+ if "r" in mode:
169
+ cctx = zstd.ZstdDecompressor()
170
+ return cctx.stream_reader(infile)
171
+ else:
172
+ cctx = zstd.ZstdCompressor(level=10)
173
+ return cctx.stream_writer(infile)
174
+
175
+ register_compression("zstd", zstandard_file, "zst")
176
+ except ImportError:
177
+ pass
178
+
179
+
180
+ def available_compressions():
181
+ """Return a list of the implemented compressions."""
182
+ return list(compr)
phivenv/Lib/site-packages/fsspec/config.py ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import configparser
4
+ import json
5
+ import os
6
+ import warnings
7
+ from typing import Any
8
+
9
+ conf: dict[str, dict[str, Any]] = {}
10
+ default_conf_dir = os.path.join(os.path.expanduser("~"), ".config/fsspec")
11
+ conf_dir = os.environ.get("FSSPEC_CONFIG_DIR", default_conf_dir)
12
+
13
+
14
+ def set_conf_env(conf_dict, envdict=os.environ):
15
+ """Set config values from environment variables
16
+
17
+ Looks for variables of the form ``FSSPEC_<protocol>`` and
18
+ ``FSSPEC_<protocol>_<kwarg>``. For ``FSSPEC_<protocol>`` the value is parsed
19
+ as a json dictionary and used to ``update`` the config of the
20
+ corresponding protocol. For ``FSSPEC_<protocol>_<kwarg>`` there is no
21
+ attempt to convert the string value, but the kwarg keys will be lower-cased.
22
+
23
+ The ``FSSPEC_<protocol>_<kwarg>`` variables are applied after the
24
+ ``FSSPEC_<protocol>`` ones.
25
+
26
+ Parameters
27
+ ----------
28
+ conf_dict : dict(str, dict)
29
+ This dict will be mutated
30
+ envdict : dict-like(str, str)
31
+ Source for the values - usually the real environment
32
+ """
33
+ kwarg_keys = []
34
+ for key in envdict:
35
+ if key.startswith("FSSPEC_") and len(key) > 7 and key[7] != "_":
36
+ if key.count("_") > 1:
37
+ kwarg_keys.append(key)
38
+ continue
39
+ try:
40
+ value = json.loads(envdict[key])
41
+ except json.decoder.JSONDecodeError as ex:
42
+ warnings.warn(
43
+ f"Ignoring environment variable {key} due to a parse failure: {ex}"
44
+ )
45
+ else:
46
+ if isinstance(value, dict):
47
+ _, proto = key.split("_", 1)
48
+ conf_dict.setdefault(proto.lower(), {}).update(value)
49
+ else:
50
+ warnings.warn(
51
+ f"Ignoring environment variable {key} due to not being a dict:"
52
+ f" {type(value)}"
53
+ )
54
+ elif key.startswith("FSSPEC"):
55
+ warnings.warn(
56
+ f"Ignoring environment variable {key} due to having an unexpected name"
57
+ )
58
+
59
+ for key in kwarg_keys:
60
+ _, proto, kwarg = key.split("_", 2)
61
+ conf_dict.setdefault(proto.lower(), {})[kwarg.lower()] = envdict[key]
62
+
63
+
64
+ def set_conf_files(cdir, conf_dict):
65
+ """Set config values from files
66
+
67
+ Scans for INI and JSON files in the given dictionary, and uses their
68
+ contents to set the config. In case of repeated values, later values
69
+ win.
70
+
71
+ In the case of INI files, all values are strings, and these will not
72
+ be converted.
73
+
74
+ Parameters
75
+ ----------
76
+ cdir : str
77
+ Directory to search
78
+ conf_dict : dict(str, dict)
79
+ This dict will be mutated
80
+ """
81
+ if not os.path.isdir(cdir):
82
+ return
83
+ allfiles = sorted(os.listdir(cdir))
84
+ for fn in allfiles:
85
+ if fn.endswith(".ini"):
86
+ ini = configparser.ConfigParser()
87
+ ini.read(os.path.join(cdir, fn))
88
+ for key in ini:
89
+ if key == "DEFAULT":
90
+ continue
91
+ conf_dict.setdefault(key, {}).update(dict(ini[key]))
92
+ if fn.endswith(".json"):
93
+ with open(os.path.join(cdir, fn)) as f:
94
+ js = json.load(f)
95
+ for key in js:
96
+ conf_dict.setdefault(key, {}).update(dict(js[key]))
97
+
98
+
99
+ def apply_config(cls, kwargs, conf_dict=None):
100
+ """Supply default values for kwargs when instantiating class
101
+
102
+ Augments the passed kwargs, by finding entries in the config dict
103
+ which match the classes ``.protocol`` attribute (one or more str)
104
+
105
+ Parameters
106
+ ----------
107
+ cls : file system implementation
108
+ kwargs : dict
109
+ conf_dict : dict of dict
110
+ Typically this is the global configuration
111
+
112
+ Returns
113
+ -------
114
+ dict : the modified set of kwargs
115
+ """
116
+ if conf_dict is None:
117
+ conf_dict = conf
118
+ protos = cls.protocol if isinstance(cls.protocol, (tuple, list)) else [cls.protocol]
119
+ kw = {}
120
+ for proto in protos:
121
+ # default kwargs from the current state of the config
122
+ if proto in conf_dict:
123
+ kw.update(conf_dict[proto])
124
+ # explicit kwargs always win
125
+ kw.update(**kwargs)
126
+ kwargs = kw
127
+ return kwargs
128
+
129
+
130
+ set_conf_files(conf_dir, conf)
131
+ set_conf_env(conf)
phivenv/Lib/site-packages/fsspec/conftest.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import shutil
3
+ import subprocess
4
+ import sys
5
+ import time
6
+
7
+ import pytest
8
+
9
+ import fsspec
10
+ from fsspec.implementations.cached import CachingFileSystem
11
+
12
+
13
+ @pytest.fixture()
14
+ def m():
15
+ """
16
+ Fixture providing a memory filesystem.
17
+ """
18
+ m = fsspec.filesystem("memory")
19
+ m.store.clear()
20
+ m.pseudo_dirs.clear()
21
+ m.pseudo_dirs.append("")
22
+ try:
23
+ yield m
24
+ finally:
25
+ m.store.clear()
26
+ m.pseudo_dirs.clear()
27
+ m.pseudo_dirs.append("")
28
+
29
+
30
+ @pytest.fixture
31
+ def ftp_writable(tmpdir):
32
+ """
33
+ Fixture providing a writable FTP filesystem.
34
+ """
35
+ pytest.importorskip("pyftpdlib")
36
+ from fsspec.implementations.ftp import FTPFileSystem
37
+
38
+ FTPFileSystem.clear_instance_cache() # remove lingering connections
39
+ CachingFileSystem.clear_instance_cache()
40
+ d = str(tmpdir)
41
+ with open(os.path.join(d, "out"), "wb") as f:
42
+ f.write(b"hello" * 10000)
43
+ P = subprocess.Popen(
44
+ [sys.executable, "-m", "pyftpdlib", "-d", d, "-u", "user", "-P", "pass", "-w"]
45
+ )
46
+ try:
47
+ time.sleep(1)
48
+ yield "localhost", 2121, "user", "pass"
49
+ finally:
50
+ P.terminate()
51
+ P.wait()
52
+ try:
53
+ shutil.rmtree(tmpdir)
54
+ except Exception:
55
+ pass
phivenv/Lib/site-packages/fsspec/core.py ADDED
@@ -0,0 +1,743 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import io
4
+ import logging
5
+ import os
6
+ import re
7
+ from glob import has_magic
8
+ from pathlib import Path
9
+
10
+ # for backwards compat, we export cache things from here too
11
+ from fsspec.caching import ( # noqa: F401
12
+ BaseCache,
13
+ BlockCache,
14
+ BytesCache,
15
+ MMapCache,
16
+ ReadAheadCache,
17
+ caches,
18
+ )
19
+ from fsspec.compression import compr
20
+ from fsspec.config import conf
21
+ from fsspec.registry import filesystem, get_filesystem_class
22
+ from fsspec.utils import (
23
+ _unstrip_protocol,
24
+ build_name_function,
25
+ infer_compression,
26
+ stringify_path,
27
+ )
28
+
29
+ logger = logging.getLogger("fsspec")
30
+
31
+
32
+ class OpenFile:
33
+ """
34
+ File-like object to be used in a context
35
+
36
+ Can layer (buffered) text-mode and compression over any file-system, which
37
+ are typically binary-only.
38
+
39
+ These instances are safe to serialize, as the low-level file object
40
+ is not created until invoked using ``with``.
41
+
42
+ Parameters
43
+ ----------
44
+ fs: FileSystem
45
+ The file system to use for opening the file. Should be a subclass or duck-type
46
+ with ``fsspec.spec.AbstractFileSystem``
47
+ path: str
48
+ Location to open
49
+ mode: str like 'rb', optional
50
+ Mode of the opened file
51
+ compression: str or None, optional
52
+ Compression to apply
53
+ encoding: str or None, optional
54
+ The encoding to use if opened in text mode.
55
+ errors: str or None, optional
56
+ How to handle encoding errors if opened in text mode.
57
+ newline: None or str
58
+ Passed to TextIOWrapper in text mode, how to handle line endings.
59
+ autoopen: bool
60
+ If True, calls open() immediately. Mostly used by pickle
61
+ pos: int
62
+ If given and autoopen is True, seek to this location immediately
63
+ """
64
+
65
+ def __init__(
66
+ self,
67
+ fs,
68
+ path,
69
+ mode="rb",
70
+ compression=None,
71
+ encoding=None,
72
+ errors=None,
73
+ newline=None,
74
+ ):
75
+ self.fs = fs
76
+ self.path = path
77
+ self.mode = mode
78
+ self.compression = get_compression(path, compression)
79
+ self.encoding = encoding
80
+ self.errors = errors
81
+ self.newline = newline
82
+ self.fobjects = []
83
+
84
+ def __reduce__(self):
85
+ return (
86
+ OpenFile,
87
+ (
88
+ self.fs,
89
+ self.path,
90
+ self.mode,
91
+ self.compression,
92
+ self.encoding,
93
+ self.errors,
94
+ self.newline,
95
+ ),
96
+ )
97
+
98
+ def __repr__(self):
99
+ return f"<OpenFile '{self.path}'>"
100
+
101
+ def __enter__(self):
102
+ mode = self.mode.replace("t", "").replace("b", "") + "b"
103
+
104
+ try:
105
+ f = self.fs.open(self.path, mode=mode)
106
+ except FileNotFoundError as e:
107
+ if has_magic(self.path):
108
+ raise FileNotFoundError(
109
+ "%s not found. The URL contains glob characters: you maybe needed\n"
110
+ "to pass expand=True in fsspec.open() or the storage_options of \n"
111
+ "your library. You can also set the config value 'open_expand'\n"
112
+ "before import, or fsspec.core.DEFAULT_EXPAND at runtime, to True.",
113
+ self.path,
114
+ ) from e
115
+ raise
116
+
117
+ self.fobjects = [f]
118
+
119
+ if self.compression is not None:
120
+ compress = compr[self.compression]
121
+ f = compress(f, mode=mode[0])
122
+ self.fobjects.append(f)
123
+
124
+ if "b" not in self.mode:
125
+ # assume, for example, that 'r' is equivalent to 'rt' as in builtin
126
+ f = PickleableTextIOWrapper(
127
+ f, encoding=self.encoding, errors=self.errors, newline=self.newline
128
+ )
129
+ self.fobjects.append(f)
130
+
131
+ return self.fobjects[-1]
132
+
133
+ def __exit__(self, *args):
134
+ self.close()
135
+
136
+ @property
137
+ def full_name(self):
138
+ return _unstrip_protocol(self.path, self.fs)
139
+
140
+ def open(self):
141
+ """Materialise this as a real open file without context
142
+
143
+ The OpenFile object should be explicitly closed to avoid enclosed file
144
+ instances persisting. You must, therefore, keep a reference to the OpenFile
145
+ during the life of the file-like it generates.
146
+ """
147
+ return self.__enter__()
148
+
149
+ def close(self):
150
+ """Close all encapsulated file objects"""
151
+ for f in reversed(self.fobjects):
152
+ if "r" not in self.mode and not f.closed:
153
+ f.flush()
154
+ f.close()
155
+ self.fobjects.clear()
156
+
157
+
158
+ class OpenFiles(list):
159
+ """List of OpenFile instances
160
+
161
+ Can be used in a single context, which opens and closes all of the
162
+ contained files. Normal list access to get the elements works as
163
+ normal.
164
+
165
+ A special case is made for caching filesystems - the files will
166
+ be down/uploaded together at the start or end of the context, and
167
+ this may happen concurrently, if the target filesystem supports it.
168
+ """
169
+
170
+ def __init__(self, *args, mode="rb", fs=None):
171
+ self.mode = mode
172
+ self.fs = fs
173
+ self.files = []
174
+ super().__init__(*args)
175
+
176
+ def __enter__(self):
177
+ if self.fs is None:
178
+ raise ValueError("Context has already been used")
179
+
180
+ fs = self.fs
181
+ while True:
182
+ if hasattr(fs, "open_many"):
183
+ # check for concurrent cache download; or set up for upload
184
+ self.files = fs.open_many(self)
185
+ return self.files
186
+ if hasattr(fs, "fs") and fs.fs is not None:
187
+ fs = fs.fs
188
+ else:
189
+ break
190
+ return [s.__enter__() for s in self]
191
+
192
+ def __exit__(self, *args):
193
+ fs = self.fs
194
+ [s.__exit__(*args) for s in self]
195
+ if "r" not in self.mode:
196
+ while True:
197
+ if hasattr(fs, "open_many"):
198
+ # check for concurrent cache upload
199
+ fs.commit_many(self.files)
200
+ return
201
+ if hasattr(fs, "fs") and fs.fs is not None:
202
+ fs = fs.fs
203
+ else:
204
+ break
205
+
206
+ def __getitem__(self, item):
207
+ out = super().__getitem__(item)
208
+ if isinstance(item, slice):
209
+ return OpenFiles(out, mode=self.mode, fs=self.fs)
210
+ return out
211
+
212
+ def __repr__(self):
213
+ return f"<List of {len(self)} OpenFile instances>"
214
+
215
+
216
+ def open_files(
217
+ urlpath,
218
+ mode="rb",
219
+ compression=None,
220
+ encoding="utf8",
221
+ errors=None,
222
+ name_function=None,
223
+ num=1,
224
+ protocol=None,
225
+ newline=None,
226
+ auto_mkdir=True,
227
+ expand=True,
228
+ **kwargs,
229
+ ):
230
+ """Given a path or paths, return a list of ``OpenFile`` objects.
231
+
232
+ For writing, a str path must contain the "*" character, which will be filled
233
+ in by increasing numbers, e.g., "part*" -> "part1", "part2" if num=2.
234
+
235
+ For either reading or writing, can instead provide explicit list of paths.
236
+
237
+ Parameters
238
+ ----------
239
+ urlpath: string or list
240
+ Absolute or relative filepath(s). Prefix with a protocol like ``s3://``
241
+ to read from alternative filesystems. To read from multiple files you
242
+ can pass a globstring or a list of paths, with the caveat that they
243
+ must all have the same protocol.
244
+ mode: 'rb', 'wt', etc.
245
+ compression: string or None
246
+ If given, open file using compression codec. Can either be a compression
247
+ name (a key in ``fsspec.compression.compr``) or "infer" to guess the
248
+ compression from the filename suffix.
249
+ encoding: str
250
+ For text mode only
251
+ errors: None or str
252
+ Passed to TextIOWrapper in text mode
253
+ name_function: function or None
254
+ if opening a set of files for writing, those files do not yet exist,
255
+ so we need to generate their names by formatting the urlpath for
256
+ each sequence number
257
+ num: int [1]
258
+ if writing mode, number of files we expect to create (passed to
259
+ name+function)
260
+ protocol: str or None
261
+ If given, overrides the protocol found in the URL.
262
+ newline: bytes or None
263
+ Used for line terminator in text mode. If None, uses system default;
264
+ if blank, uses no translation.
265
+ auto_mkdir: bool (True)
266
+ If in write mode, this will ensure the target directory exists before
267
+ writing, by calling ``fs.mkdirs(exist_ok=True)``.
268
+ expand: bool
269
+ **kwargs: dict
270
+ Extra options that make sense to a particular storage connection, e.g.
271
+ host, port, username, password, etc.
272
+
273
+ Examples
274
+ --------
275
+ >>> files = open_files('2015-*-*.csv') # doctest: +SKIP
276
+ >>> files = open_files(
277
+ ... 's3://bucket/2015-*-*.csv.gz', compression='gzip'
278
+ ... ) # doctest: +SKIP
279
+
280
+ Returns
281
+ -------
282
+ An ``OpenFiles`` instance, which is a list of ``OpenFile`` objects that can
283
+ be used as a single context
284
+
285
+ Notes
286
+ -----
287
+ For a full list of the available protocols and the implementations that
288
+ they map across to see the latest online documentation:
289
+
290
+ - For implementations built into ``fsspec`` see
291
+ https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
292
+ - For implementations in separate packages see
293
+ https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations
294
+ """
295
+ fs, fs_token, paths = get_fs_token_paths(
296
+ urlpath,
297
+ mode,
298
+ num=num,
299
+ name_function=name_function,
300
+ storage_options=kwargs,
301
+ protocol=protocol,
302
+ expand=expand,
303
+ )
304
+ if fs.protocol == "file":
305
+ fs.auto_mkdir = auto_mkdir
306
+ elif "r" not in mode and auto_mkdir:
307
+ parents = {fs._parent(path) for path in paths}
308
+ for parent in parents:
309
+ try:
310
+ fs.makedirs(parent, exist_ok=True)
311
+ except PermissionError:
312
+ pass
313
+ return OpenFiles(
314
+ [
315
+ OpenFile(
316
+ fs,
317
+ path,
318
+ mode=mode,
319
+ compression=compression,
320
+ encoding=encoding,
321
+ errors=errors,
322
+ newline=newline,
323
+ )
324
+ for path in paths
325
+ ],
326
+ mode=mode,
327
+ fs=fs,
328
+ )
329
+
330
+
331
+ def _un_chain(path, kwargs):
332
+ # Avoid a circular import
333
+ from fsspec.implementations.cached import CachingFileSystem
334
+
335
+ if "::" in path:
336
+ x = re.compile(".*[^a-z]+.*") # test for non protocol-like single word
337
+ bits = []
338
+ for p in path.split("::"):
339
+ if "://" in p or x.match(p):
340
+ bits.append(p)
341
+ else:
342
+ bits.append(p + "://")
343
+ else:
344
+ bits = [path]
345
+ # [[url, protocol, kwargs], ...]
346
+ out = []
347
+ previous_bit = None
348
+ kwargs = kwargs.copy()
349
+ for bit in reversed(bits):
350
+ protocol = kwargs.pop("protocol", None) or split_protocol(bit)[0] or "file"
351
+ cls = get_filesystem_class(protocol)
352
+ extra_kwargs = cls._get_kwargs_from_urls(bit)
353
+ kws = kwargs.pop(protocol, {})
354
+ if bit is bits[0]:
355
+ kws.update(kwargs)
356
+ kw = dict(
357
+ **{k: v for k, v in extra_kwargs.items() if k not in kws or v != kws[k]},
358
+ **kws,
359
+ )
360
+ bit = cls._strip_protocol(bit)
361
+ if "target_protocol" not in kw and issubclass(cls, CachingFileSystem):
362
+ bit = previous_bit
363
+ out.append((bit, protocol, kw))
364
+ previous_bit = bit
365
+ out.reverse()
366
+ return out
367
+
368
+
369
+ def url_to_fs(url, **kwargs):
370
+ """
371
+ Turn fully-qualified and potentially chained URL into filesystem instance
372
+
373
+ Parameters
374
+ ----------
375
+ url : str
376
+ The fsspec-compatible URL
377
+ **kwargs: dict
378
+ Extra options that make sense to a particular storage connection, e.g.
379
+ host, port, username, password, etc.
380
+
381
+ Returns
382
+ -------
383
+ filesystem : FileSystem
384
+ The new filesystem discovered from ``url`` and created with
385
+ ``**kwargs``.
386
+ urlpath : str
387
+ The file-systems-specific URL for ``url``.
388
+ """
389
+ url = stringify_path(url)
390
+ # non-FS arguments that appear in fsspec.open()
391
+ # inspect could keep this in sync with open()'s signature
392
+ known_kwargs = {
393
+ "compression",
394
+ "encoding",
395
+ "errors",
396
+ "expand",
397
+ "mode",
398
+ "name_function",
399
+ "newline",
400
+ "num",
401
+ }
402
+ kwargs = {k: v for k, v in kwargs.items() if k not in known_kwargs}
403
+ chain = _un_chain(url, kwargs)
404
+ inkwargs = {}
405
+ # Reverse iterate the chain, creating a nested target_* structure
406
+ for i, ch in enumerate(reversed(chain)):
407
+ urls, protocol, kw = ch
408
+ if i == len(chain) - 1:
409
+ inkwargs = dict(**kw, **inkwargs)
410
+ continue
411
+ inkwargs["target_options"] = dict(**kw, **inkwargs)
412
+ inkwargs["target_protocol"] = protocol
413
+ inkwargs["fo"] = urls
414
+ urlpath, protocol, _ = chain[0]
415
+ fs = filesystem(protocol, **inkwargs)
416
+ return fs, urlpath
417
+
418
+
419
+ DEFAULT_EXPAND = conf.get("open_expand", False)
420
+
421
+
422
+ def open(
423
+ urlpath,
424
+ mode="rb",
425
+ compression=None,
426
+ encoding="utf8",
427
+ errors=None,
428
+ protocol=None,
429
+ newline=None,
430
+ expand=None,
431
+ **kwargs,
432
+ ):
433
+ """Given a path or paths, return one ``OpenFile`` object.
434
+
435
+ Parameters
436
+ ----------
437
+ urlpath: string or list
438
+ Absolute or relative filepath. Prefix with a protocol like ``s3://``
439
+ to read from alternative filesystems. Should not include glob
440
+ character(s).
441
+ mode: 'rb', 'wt', etc.
442
+ compression: string or None
443
+ If given, open file using compression codec. Can either be a compression
444
+ name (a key in ``fsspec.compression.compr``) or "infer" to guess the
445
+ compression from the filename suffix.
446
+ encoding: str
447
+ For text mode only
448
+ errors: None or str
449
+ Passed to TextIOWrapper in text mode
450
+ protocol: str or None
451
+ If given, overrides the protocol found in the URL.
452
+ newline: bytes or None
453
+ Used for line terminator in text mode. If None, uses system default;
454
+ if blank, uses no translation.
455
+ expand: bool or None
456
+ Whether to regard file paths containing special glob characters as needing
457
+ expansion (finding the first match) or absolute. Setting False allows using
458
+ paths which do embed such characters. If None (default), this argument
459
+ takes its value from the DEFAULT_EXPAND module variable, which takes
460
+ its initial value from the "open_expand" config value at startup, which will
461
+ be False if not set.
462
+ **kwargs: dict
463
+ Extra options that make sense to a particular storage connection, e.g.
464
+ host, port, username, password, etc.
465
+
466
+ Examples
467
+ --------
468
+ >>> openfile = open('2015-01-01.csv') # doctest: +SKIP
469
+ >>> openfile = open(
470
+ ... 's3://bucket/2015-01-01.csv.gz', compression='gzip'
471
+ ... ) # doctest: +SKIP
472
+ >>> with openfile as f:
473
+ ... df = pd.read_csv(f) # doctest: +SKIP
474
+ ...
475
+
476
+ Returns
477
+ -------
478
+ ``OpenFile`` object.
479
+
480
+ Notes
481
+ -----
482
+ For a full list of the available protocols and the implementations that
483
+ they map across to see the latest online documentation:
484
+
485
+ - For implementations built into ``fsspec`` see
486
+ https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
487
+ - For implementations in separate packages see
488
+ https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations
489
+ """
490
+ expand = DEFAULT_EXPAND if expand is None else expand
491
+ out = open_files(
492
+ urlpath=[urlpath],
493
+ mode=mode,
494
+ compression=compression,
495
+ encoding=encoding,
496
+ errors=errors,
497
+ protocol=protocol,
498
+ newline=newline,
499
+ expand=expand,
500
+ **kwargs,
501
+ )
502
+ if not out:
503
+ raise FileNotFoundError(urlpath)
504
+ return out[0]
505
+
506
+
507
+ def open_local(
508
+ url: str | list[str] | Path | list[Path],
509
+ mode: str = "rb",
510
+ **storage_options: dict,
511
+ ) -> str | list[str]:
512
+ """Open file(s) which can be resolved to local
513
+
514
+ For files which either are local, or get downloaded upon open
515
+ (e.g., by file caching)
516
+
517
+ Parameters
518
+ ----------
519
+ url: str or list(str)
520
+ mode: str
521
+ Must be read mode
522
+ storage_options:
523
+ passed on to FS for or used by open_files (e.g., compression)
524
+ """
525
+ if "r" not in mode:
526
+ raise ValueError("Can only ensure local files when reading")
527
+ of = open_files(url, mode=mode, **storage_options)
528
+ if not getattr(of[0].fs, "local_file", False):
529
+ raise ValueError(
530
+ "open_local can only be used on a filesystem which"
531
+ " has attribute local_file=True"
532
+ )
533
+ with of as files:
534
+ paths = [f.name for f in files]
535
+ if (isinstance(url, str) and not has_magic(url)) or isinstance(url, Path):
536
+ return paths[0]
537
+ return paths
538
+
539
+
540
+ def get_compression(urlpath, compression):
541
+ if compression == "infer":
542
+ compression = infer_compression(urlpath)
543
+ if compression is not None and compression not in compr:
544
+ raise ValueError(f"Compression type {compression} not supported")
545
+ return compression
546
+
547
+
548
+ def split_protocol(urlpath):
549
+ """Return protocol, path pair"""
550
+ urlpath = stringify_path(urlpath)
551
+ if "://" in urlpath:
552
+ protocol, path = urlpath.split("://", 1)
553
+ if len(protocol) > 1:
554
+ # excludes Windows paths
555
+ return protocol, path
556
+ if urlpath.startswith("data:"):
557
+ return urlpath.split(":", 1)
558
+ return None, urlpath
559
+
560
+
561
+ def strip_protocol(urlpath):
562
+ """Return only path part of full URL, according to appropriate backend"""
563
+ protocol, _ = split_protocol(urlpath)
564
+ cls = get_filesystem_class(protocol)
565
+ return cls._strip_protocol(urlpath)
566
+
567
+
568
+ def expand_paths_if_needed(paths, mode, num, fs, name_function):
569
+ """Expand paths if they have a ``*`` in them (write mode) or any of ``*?[]``
570
+ in them (read mode).
571
+
572
+ :param paths: list of paths
573
+ mode: str
574
+ Mode in which to open files.
575
+ num: int
576
+ If opening in writing mode, number of files we expect to create.
577
+ fs: filesystem object
578
+ name_function: callable
579
+ If opening in writing mode, this callable is used to generate path
580
+ names. Names are generated for each partition by
581
+ ``urlpath.replace('*', name_function(partition_index))``.
582
+ :return: list of paths
583
+ """
584
+ expanded_paths = []
585
+ paths = list(paths)
586
+
587
+ if "w" in mode: # read mode
588
+ if sum(1 for p in paths if "*" in p) > 1:
589
+ raise ValueError(
590
+ "When writing data, only one filename mask can be specified."
591
+ )
592
+ num = max(num, len(paths))
593
+
594
+ for curr_path in paths:
595
+ if "*" in curr_path:
596
+ # expand using name_function
597
+ expanded_paths.extend(_expand_paths(curr_path, name_function, num))
598
+ else:
599
+ expanded_paths.append(curr_path)
600
+ # if we generated more paths that asked for, trim the list
601
+ if len(expanded_paths) > num:
602
+ expanded_paths = expanded_paths[:num]
603
+
604
+ else: # read mode
605
+ for curr_path in paths:
606
+ if has_magic(curr_path):
607
+ # expand using glob
608
+ expanded_paths.extend(fs.glob(curr_path))
609
+ else:
610
+ expanded_paths.append(curr_path)
611
+
612
+ return expanded_paths
613
+
614
+
615
+ def get_fs_token_paths(
616
+ urlpath,
617
+ mode="rb",
618
+ num=1,
619
+ name_function=None,
620
+ storage_options=None,
621
+ protocol=None,
622
+ expand=True,
623
+ ):
624
+ """Filesystem, deterministic token, and paths from a urlpath and options.
625
+
626
+ Parameters
627
+ ----------
628
+ urlpath: string or iterable
629
+ Absolute or relative filepath, URL (may include protocols like
630
+ ``s3://``), or globstring pointing to data.
631
+ mode: str, optional
632
+ Mode in which to open files.
633
+ num: int, optional
634
+ If opening in writing mode, number of files we expect to create.
635
+ name_function: callable, optional
636
+ If opening in writing mode, this callable is used to generate path
637
+ names. Names are generated for each partition by
638
+ ``urlpath.replace('*', name_function(partition_index))``.
639
+ storage_options: dict, optional
640
+ Additional keywords to pass to the filesystem class.
641
+ protocol: str or None
642
+ To override the protocol specifier in the URL
643
+ expand: bool
644
+ Expand string paths for writing, assuming the path is a directory
645
+ """
646
+ if isinstance(urlpath, (list, tuple, set)):
647
+ if not urlpath:
648
+ raise ValueError("empty urlpath sequence")
649
+ urlpath0 = stringify_path(next(iter(urlpath)))
650
+ else:
651
+ urlpath0 = stringify_path(urlpath)
652
+ storage_options = storage_options or {}
653
+ if protocol:
654
+ storage_options["protocol"] = protocol
655
+ chain = _un_chain(urlpath0, storage_options or {})
656
+ inkwargs = {}
657
+ # Reverse iterate the chain, creating a nested target_* structure
658
+ for i, ch in enumerate(reversed(chain)):
659
+ urls, nested_protocol, kw = ch
660
+ if i == len(chain) - 1:
661
+ inkwargs = dict(**kw, **inkwargs)
662
+ continue
663
+ inkwargs["target_options"] = dict(**kw, **inkwargs)
664
+ inkwargs["target_protocol"] = nested_protocol
665
+ inkwargs["fo"] = urls
666
+ paths, protocol, _ = chain[0]
667
+ fs = filesystem(protocol, **inkwargs)
668
+ if isinstance(urlpath, (list, tuple, set)):
669
+ pchains = [
670
+ _un_chain(stringify_path(u), storage_options or {})[0] for u in urlpath
671
+ ]
672
+ if len({pc[1] for pc in pchains}) > 1:
673
+ raise ValueError("Protocol mismatch getting fs from %s", urlpath)
674
+ paths = [pc[0] for pc in pchains]
675
+ else:
676
+ paths = fs._strip_protocol(paths)
677
+ if isinstance(paths, (list, tuple, set)):
678
+ if expand:
679
+ paths = expand_paths_if_needed(paths, mode, num, fs, name_function)
680
+ elif not isinstance(paths, list):
681
+ paths = list(paths)
682
+ else:
683
+ if ("w" in mode or "x" in mode) and expand:
684
+ paths = _expand_paths(paths, name_function, num)
685
+ elif "*" in paths:
686
+ paths = [f for f in sorted(fs.glob(paths)) if not fs.isdir(f)]
687
+ else:
688
+ paths = [paths]
689
+
690
+ return fs, fs._fs_token, paths
691
+
692
+
693
+ def _expand_paths(path, name_function, num):
694
+ if isinstance(path, str):
695
+ if path.count("*") > 1:
696
+ raise ValueError("Output path spec must contain exactly one '*'.")
697
+ elif "*" not in path:
698
+ path = os.path.join(path, "*.part")
699
+
700
+ if name_function is None:
701
+ name_function = build_name_function(num - 1)
702
+
703
+ paths = [path.replace("*", name_function(i)) for i in range(num)]
704
+ if paths != sorted(paths):
705
+ logger.warning(
706
+ "In order to preserve order between partitions"
707
+ " paths created with ``name_function`` should "
708
+ "sort to partition order"
709
+ )
710
+ elif isinstance(path, (tuple, list)):
711
+ assert len(path) == num
712
+ paths = list(path)
713
+ else:
714
+ raise ValueError(
715
+ "Path should be either\n"
716
+ "1. A list of paths: ['foo.json', 'bar.json', ...]\n"
717
+ "2. A directory: 'foo/\n"
718
+ "3. A path with a '*' in it: 'foo.*.json'"
719
+ )
720
+ return paths
721
+
722
+
723
+ class PickleableTextIOWrapper(io.TextIOWrapper):
724
+ """TextIOWrapper cannot be pickled. This solves it.
725
+
726
+ Requires that ``buffer`` be pickleable, which all instances of
727
+ AbstractBufferedFile are.
728
+ """
729
+
730
+ def __init__(
731
+ self,
732
+ buffer,
733
+ encoding=None,
734
+ errors=None,
735
+ newline=None,
736
+ line_buffering=False,
737
+ write_through=False,
738
+ ):
739
+ self.args = buffer, encoding, errors, newline, line_buffering, write_through
740
+ super().__init__(*self.args)
741
+
742
+ def __reduce__(self):
743
+ return PickleableTextIOWrapper, self.args
phivenv/Lib/site-packages/fsspec/dircache.py ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+ from collections.abc import MutableMapping
3
+ from functools import lru_cache
4
+
5
+
6
+ class DirCache(MutableMapping):
7
+ """
8
+ Caching of directory listings, in a structure like::
9
+
10
+ {"path0": [
11
+ {"name": "path0/file0",
12
+ "size": 123,
13
+ "type": "file",
14
+ ...
15
+ },
16
+ {"name": "path0/file1",
17
+ },
18
+ ...
19
+ ],
20
+ "path1": [...]
21
+ }
22
+
23
+ Parameters to this class control listing expiry or indeed turn
24
+ caching off
25
+ """
26
+
27
+ def __init__(
28
+ self,
29
+ use_listings_cache=True,
30
+ listings_expiry_time=None,
31
+ max_paths=None,
32
+ **kwargs,
33
+ ):
34
+ """
35
+
36
+ Parameters
37
+ ----------
38
+ use_listings_cache: bool
39
+ If False, this cache never returns items, but always reports KeyError,
40
+ and setting items has no effect
41
+ listings_expiry_time: int or float (optional)
42
+ Time in seconds that a listing is considered valid. If None,
43
+ listings do not expire.
44
+ max_paths: int (optional)
45
+ The number of most recent listings that are considered valid; 'recent'
46
+ refers to when the entry was set.
47
+ """
48
+ self._cache = {}
49
+ self._times = {}
50
+ if max_paths:
51
+ self._q = lru_cache(max_paths + 1)(lambda key: self._cache.pop(key, None))
52
+ self.use_listings_cache = use_listings_cache
53
+ self.listings_expiry_time = listings_expiry_time
54
+ self.max_paths = max_paths
55
+
56
+ def __getitem__(self, item):
57
+ if self.listings_expiry_time is not None:
58
+ if self._times.get(item, 0) - time.time() < -self.listings_expiry_time:
59
+ del self._cache[item]
60
+ if self.max_paths:
61
+ self._q(item)
62
+ return self._cache[item] # maybe raises KeyError
63
+
64
+ def clear(self):
65
+ self._cache.clear()
66
+
67
+ def __len__(self):
68
+ return len(self._cache)
69
+
70
+ def __contains__(self, item):
71
+ try:
72
+ self[item]
73
+ return True
74
+ except KeyError:
75
+ return False
76
+
77
+ def __setitem__(self, key, value):
78
+ if not self.use_listings_cache:
79
+ return
80
+ if self.max_paths:
81
+ self._q(key)
82
+ self._cache[key] = value
83
+ if self.listings_expiry_time is not None:
84
+ self._times[key] = time.time()
85
+
86
+ def __delitem__(self, key):
87
+ del self._cache[key]
88
+
89
+ def __iter__(self):
90
+ entries = list(self._cache)
91
+
92
+ return (k for k in entries if k in self)
93
+
94
+ def __reduce__(self):
95
+ return (
96
+ DirCache,
97
+ (self.use_listings_cache, self.listings_expiry_time, self.max_paths),
98
+ )
phivenv/Lib/site-packages/fsspec/exceptions.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ fsspec user-defined exception classes
3
+ """
4
+
5
+ import asyncio
6
+
7
+
8
+ class BlocksizeMismatchError(ValueError):
9
+ """
10
+ Raised when a cached file is opened with a different blocksize than it was
11
+ written with
12
+ """
13
+
14
+
15
+ class FSTimeoutError(asyncio.TimeoutError):
16
+ """
17
+ Raised when a fsspec function timed out occurs
18
+ """
phivenv/Lib/site-packages/fsspec/fuse.py ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import logging
3
+ import os
4
+ import stat
5
+ import threading
6
+ import time
7
+ from errno import EIO, ENOENT
8
+
9
+ from fuse import FUSE, FuseOSError, LoggingMixIn, Operations
10
+
11
+ from fsspec import __version__
12
+ from fsspec.core import url_to_fs
13
+
14
+ logger = logging.getLogger("fsspec.fuse")
15
+
16
+
17
+ class FUSEr(Operations):
18
+ def __init__(self, fs, path, ready_file=False):
19
+ self.fs = fs
20
+ self.cache = {}
21
+ self.root = path.rstrip("/") + "/"
22
+ self.counter = 0
23
+ logger.info("Starting FUSE at %s", path)
24
+ self._ready_file = ready_file
25
+
26
+ def getattr(self, path, fh=None):
27
+ logger.debug("getattr %s", path)
28
+ if self._ready_file and path in ["/.fuse_ready", ".fuse_ready"]:
29
+ return {"type": "file", "st_size": 5}
30
+
31
+ path = "".join([self.root, path.lstrip("/")]).rstrip("/")
32
+ try:
33
+ info = self.fs.info(path)
34
+ except FileNotFoundError as exc:
35
+ raise FuseOSError(ENOENT) from exc
36
+
37
+ data = {"st_uid": info.get("uid", 1000), "st_gid": info.get("gid", 1000)}
38
+ perm = info.get("mode", 0o777)
39
+
40
+ if info["type"] != "file":
41
+ data["st_mode"] = stat.S_IFDIR | perm
42
+ data["st_size"] = 0
43
+ data["st_blksize"] = 0
44
+ else:
45
+ data["st_mode"] = stat.S_IFREG | perm
46
+ data["st_size"] = info["size"]
47
+ data["st_blksize"] = 5 * 2**20
48
+ data["st_nlink"] = 1
49
+ data["st_atime"] = info["atime"] if "atime" in info else time.time()
50
+ data["st_ctime"] = info["ctime"] if "ctime" in info else time.time()
51
+ data["st_mtime"] = info["mtime"] if "mtime" in info else time.time()
52
+ return data
53
+
54
+ def readdir(self, path, fh):
55
+ logger.debug("readdir %s", path)
56
+ path = "".join([self.root, path.lstrip("/")])
57
+ files = self.fs.ls(path, False)
58
+ files = [os.path.basename(f.rstrip("/")) for f in files]
59
+ return [".", ".."] + files
60
+
61
+ def mkdir(self, path, mode):
62
+ path = "".join([self.root, path.lstrip("/")])
63
+ self.fs.mkdir(path)
64
+ return 0
65
+
66
+ def rmdir(self, path):
67
+ path = "".join([self.root, path.lstrip("/")])
68
+ self.fs.rmdir(path)
69
+ return 0
70
+
71
+ def read(self, path, size, offset, fh):
72
+ logger.debug("read %s", (path, size, offset))
73
+ if self._ready_file and path in ["/.fuse_ready", ".fuse_ready"]:
74
+ # status indicator
75
+ return b"ready"
76
+
77
+ f = self.cache[fh]
78
+ f.seek(offset)
79
+ out = f.read(size)
80
+ return out
81
+
82
+ def write(self, path, data, offset, fh):
83
+ logger.debug("write %s", (path, offset))
84
+ f = self.cache[fh]
85
+ f.seek(offset)
86
+ f.write(data)
87
+ return len(data)
88
+
89
+ def create(self, path, flags, fi=None):
90
+ logger.debug("create %s", (path, flags))
91
+ fn = "".join([self.root, path.lstrip("/")])
92
+ self.fs.touch(fn) # OS will want to get attributes immediately
93
+ f = self.fs.open(fn, "wb")
94
+ self.cache[self.counter] = f
95
+ self.counter += 1
96
+ return self.counter - 1
97
+
98
+ def open(self, path, flags):
99
+ logger.debug("open %s", (path, flags))
100
+ fn = "".join([self.root, path.lstrip("/")])
101
+ if flags % 2 == 0:
102
+ # read
103
+ mode = "rb"
104
+ else:
105
+ # write/create
106
+ mode = "wb"
107
+ self.cache[self.counter] = self.fs.open(fn, mode)
108
+ self.counter += 1
109
+ return self.counter - 1
110
+
111
+ def truncate(self, path, length, fh=None):
112
+ fn = "".join([self.root, path.lstrip("/")])
113
+ if length != 0:
114
+ raise NotImplementedError
115
+ # maybe should be no-op since open with write sets size to zero anyway
116
+ self.fs.touch(fn)
117
+
118
+ def unlink(self, path):
119
+ fn = "".join([self.root, path.lstrip("/")])
120
+ try:
121
+ self.fs.rm(fn, False)
122
+ except (OSError, FileNotFoundError) as exc:
123
+ raise FuseOSError(EIO) from exc
124
+
125
+ def release(self, path, fh):
126
+ try:
127
+ if fh in self.cache:
128
+ f = self.cache[fh]
129
+ f.close()
130
+ self.cache.pop(fh)
131
+ except Exception as e:
132
+ print(e)
133
+ return 0
134
+
135
+ def chmod(self, path, mode):
136
+ if hasattr(self.fs, "chmod"):
137
+ path = "".join([self.root, path.lstrip("/")])
138
+ return self.fs.chmod(path, mode)
139
+ raise NotImplementedError
140
+
141
+
142
+ def run(
143
+ fs,
144
+ path,
145
+ mount_point,
146
+ foreground=True,
147
+ threads=False,
148
+ ready_file=False,
149
+ ops_class=FUSEr,
150
+ ):
151
+ """Mount stuff in a local directory
152
+
153
+ This uses fusepy to make it appear as if a given path on an fsspec
154
+ instance is in fact resident within the local file-system.
155
+
156
+ This requires that fusepy by installed, and that FUSE be available on
157
+ the system (typically requiring a package to be installed with
158
+ apt, yum, brew, etc.).
159
+
160
+ Parameters
161
+ ----------
162
+ fs: file-system instance
163
+ From one of the compatible implementations
164
+ path: str
165
+ Location on that file-system to regard as the root directory to
166
+ mount. Note that you typically should include the terminating "/"
167
+ character.
168
+ mount_point: str
169
+ An empty directory on the local file-system where the contents of
170
+ the remote path will appear.
171
+ foreground: bool
172
+ Whether or not calling this function will block. Operation will
173
+ typically be more stable if True.
174
+ threads: bool
175
+ Whether or not to create threads when responding to file operations
176
+ within the mounter directory. Operation will typically be more
177
+ stable if False.
178
+ ready_file: bool
179
+ Whether the FUSE process is ready. The ``.fuse_ready`` file will
180
+ exist in the ``mount_point`` directory if True. Debugging purpose.
181
+ ops_class: FUSEr or Subclass of FUSEr
182
+ To override the default behavior of FUSEr. For Example, logging
183
+ to file.
184
+
185
+ """
186
+ func = lambda: FUSE(
187
+ ops_class(fs, path, ready_file=ready_file),
188
+ mount_point,
189
+ nothreads=not threads,
190
+ foreground=foreground,
191
+ )
192
+ if not foreground:
193
+ th = threading.Thread(target=func)
194
+ th.daemon = True
195
+ th.start()
196
+ return th
197
+ else: # pragma: no cover
198
+ try:
199
+ func()
200
+ except KeyboardInterrupt:
201
+ pass
202
+
203
+
204
+ def main(args):
205
+ """Mount filesystem from chained URL to MOUNT_POINT.
206
+
207
+ Examples:
208
+
209
+ python3 -m fsspec.fuse memory /usr/share /tmp/mem
210
+
211
+ python3 -m fsspec.fuse local /tmp/source /tmp/local \\
212
+ -l /tmp/fsspecfuse.log
213
+
214
+ You can also mount chained-URLs and use special settings:
215
+
216
+ python3 -m fsspec.fuse 'filecache::zip::file://data.zip' \\
217
+ / /tmp/zip \\
218
+ -o 'filecache-cache_storage=/tmp/simplecache'
219
+
220
+ You can specify the type of the setting by using `[int]` or `[bool]`,
221
+ (`true`, `yes`, `1` represents the Boolean value `True`):
222
+
223
+ python3 -m fsspec.fuse 'simplecache::ftp://ftp1.at.proftpd.org' \\
224
+ /historic/packages/RPMS /tmp/ftp \\
225
+ -o 'simplecache-cache_storage=/tmp/simplecache' \\
226
+ -o 'simplecache-check_files=false[bool]' \\
227
+ -o 'ftp-listings_expiry_time=60[int]' \\
228
+ -o 'ftp-username=anonymous' \\
229
+ -o 'ftp-password=xieyanbo'
230
+ """
231
+
232
+ class RawDescriptionArgumentParser(argparse.ArgumentParser):
233
+ def format_help(self):
234
+ usage = super().format_help()
235
+ parts = usage.split("\n\n")
236
+ parts[1] = self.description.rstrip()
237
+ return "\n\n".join(parts)
238
+
239
+ parser = RawDescriptionArgumentParser(prog="fsspec.fuse", description=main.__doc__)
240
+ parser.add_argument("--version", action="version", version=__version__)
241
+ parser.add_argument("url", type=str, help="fs url")
242
+ parser.add_argument("source_path", type=str, help="source directory in fs")
243
+ parser.add_argument("mount_point", type=str, help="local directory")
244
+ parser.add_argument(
245
+ "-o",
246
+ "--option",
247
+ action="append",
248
+ help="Any options of protocol included in the chained URL",
249
+ )
250
+ parser.add_argument(
251
+ "-l", "--log-file", type=str, help="Logging FUSE debug info (Default: '')"
252
+ )
253
+ parser.add_argument(
254
+ "-f",
255
+ "--foreground",
256
+ action="store_false",
257
+ help="Running in foreground or not (Default: False)",
258
+ )
259
+ parser.add_argument(
260
+ "-t",
261
+ "--threads",
262
+ action="store_false",
263
+ help="Running with threads support (Default: False)",
264
+ )
265
+ parser.add_argument(
266
+ "-r",
267
+ "--ready-file",
268
+ action="store_false",
269
+ help="The `.fuse_ready` file will exist after FUSE is ready. "
270
+ "(Debugging purpose, Default: False)",
271
+ )
272
+ args = parser.parse_args(args)
273
+
274
+ kwargs = {}
275
+ for item in args.option or []:
276
+ key, sep, value = item.partition("=")
277
+ if not sep:
278
+ parser.error(message=f"Wrong option: {item!r}")
279
+ val = value.lower()
280
+ if val.endswith("[int]"):
281
+ value = int(value[: -len("[int]")])
282
+ elif val.endswith("[bool]"):
283
+ value = val[: -len("[bool]")] in ["1", "yes", "true"]
284
+
285
+ if "-" in key:
286
+ fs_name, setting_name = key.split("-", 1)
287
+ if fs_name in kwargs:
288
+ kwargs[fs_name][setting_name] = value
289
+ else:
290
+ kwargs[fs_name] = {setting_name: value}
291
+ else:
292
+ kwargs[key] = value
293
+
294
+ if args.log_file:
295
+ logging.basicConfig(
296
+ level=logging.DEBUG,
297
+ filename=args.log_file,
298
+ format="%(asctime)s %(message)s",
299
+ )
300
+
301
+ class LoggingFUSEr(FUSEr, LoggingMixIn):
302
+ pass
303
+
304
+ fuser = LoggingFUSEr
305
+ else:
306
+ fuser = FUSEr
307
+
308
+ fs, url_path = url_to_fs(args.url, **kwargs)
309
+ logger.debug("Mounting %s to %s", url_path, str(args.mount_point))
310
+ run(
311
+ fs,
312
+ args.source_path,
313
+ args.mount_point,
314
+ foreground=args.foreground,
315
+ threads=args.threads,
316
+ ready_file=args.ready_file,
317
+ ops_class=fuser,
318
+ )
319
+
320
+
321
+ if __name__ == "__main__":
322
+ import sys
323
+
324
+ main(sys.argv[1:])
phivenv/Lib/site-packages/fsspec/generic.py ADDED
@@ -0,0 +1,394 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import inspect
4
+ import logging
5
+ import os
6
+ import shutil
7
+ import uuid
8
+
9
+ from .asyn import AsyncFileSystem, _run_coros_in_chunks, sync_wrapper
10
+ from .callbacks import DEFAULT_CALLBACK
11
+ from .core import filesystem, get_filesystem_class, split_protocol, url_to_fs
12
+
13
+ _generic_fs = {}
14
+ logger = logging.getLogger("fsspec.generic")
15
+
16
+
17
+ def set_generic_fs(protocol, **storage_options):
18
+ """Populate the dict used for method=="generic" lookups"""
19
+ _generic_fs[protocol] = filesystem(protocol, **storage_options)
20
+
21
+
22
+ def _resolve_fs(url, method, protocol=None, storage_options=None):
23
+ """Pick instance of backend FS"""
24
+ url = url[0] if isinstance(url, (list, tuple)) else url
25
+ protocol = protocol or split_protocol(url)[0]
26
+ storage_options = storage_options or {}
27
+ if method == "default":
28
+ return filesystem(protocol)
29
+ if method == "generic":
30
+ return _generic_fs[protocol]
31
+ if method == "current":
32
+ cls = get_filesystem_class(protocol)
33
+ return cls.current()
34
+ if method == "options":
35
+ fs, _ = url_to_fs(url, **storage_options.get(protocol, {}))
36
+ return fs
37
+ raise ValueError(f"Unknown FS resolution method: {method}")
38
+
39
+
40
+ def rsync(
41
+ source,
42
+ destination,
43
+ delete_missing=False,
44
+ source_field="size",
45
+ dest_field="size",
46
+ update_cond="different",
47
+ inst_kwargs=None,
48
+ fs=None,
49
+ **kwargs,
50
+ ):
51
+ """Sync files between two directory trees
52
+
53
+ (experimental)
54
+
55
+ Parameters
56
+ ----------
57
+ source: str
58
+ Root of the directory tree to take files from. This must be a directory, but
59
+ do not include any terminating "/" character
60
+ destination: str
61
+ Root path to copy into. The contents of this location should be
62
+ identical to the contents of ``source`` when done. This will be made a
63
+ directory, and the terminal "/" should not be included.
64
+ delete_missing: bool
65
+ If there are paths in the destination that don't exist in the
66
+ source and this is True, delete them. Otherwise, leave them alone.
67
+ source_field: str | callable
68
+ If ``update_field`` is "different", this is the key in the info
69
+ of source files to consider for difference. Maybe a function of the
70
+ info dict.
71
+ dest_field: str | callable
72
+ If ``update_field`` is "different", this is the key in the info
73
+ of destination files to consider for difference. May be a function of
74
+ the info dict.
75
+ update_cond: "different"|"always"|"never"
76
+ If "always", every file is copied, regardless of whether it exists in
77
+ the destination. If "never", files that exist in the destination are
78
+ not copied again. If "different" (default), only copy if the info
79
+ fields given by ``source_field`` and ``dest_field`` (usually "size")
80
+ are different. Other comparisons may be added in the future.
81
+ inst_kwargs: dict|None
82
+ If ``fs`` is None, use this set of keyword arguments to make a
83
+ GenericFileSystem instance
84
+ fs: GenericFileSystem|None
85
+ Instance to use if explicitly given. The instance defines how to
86
+ to make downstream file system instances from paths.
87
+
88
+ Returns
89
+ -------
90
+ dict of the copy operations that were performed, {source: destination}
91
+ """
92
+ fs = fs or GenericFileSystem(**(inst_kwargs or {}))
93
+ source = fs._strip_protocol(source)
94
+ destination = fs._strip_protocol(destination)
95
+ allfiles = fs.find(source, withdirs=True, detail=True)
96
+ if not fs.isdir(source):
97
+ raise ValueError("Can only rsync on a directory")
98
+ otherfiles = fs.find(destination, withdirs=True, detail=True)
99
+ dirs = [
100
+ a
101
+ for a, v in allfiles.items()
102
+ if v["type"] == "directory" and a.replace(source, destination) not in otherfiles
103
+ ]
104
+ logger.debug(f"{len(dirs)} directories to create")
105
+ if dirs:
106
+ fs.make_many_dirs(
107
+ [dirn.replace(source, destination) for dirn in dirs], exist_ok=True
108
+ )
109
+ allfiles = {a: v for a, v in allfiles.items() if v["type"] == "file"}
110
+ logger.debug(f"{len(allfiles)} files to consider for copy")
111
+ to_delete = [
112
+ o
113
+ for o, v in otherfiles.items()
114
+ if o.replace(destination, source) not in allfiles and v["type"] == "file"
115
+ ]
116
+ for k, v in allfiles.copy().items():
117
+ otherfile = k.replace(source, destination)
118
+ if otherfile in otherfiles:
119
+ if update_cond == "always":
120
+ allfiles[k] = otherfile
121
+ elif update_cond == "different":
122
+ inf1 = source_field(v) if callable(source_field) else v[source_field]
123
+ v2 = otherfiles[otherfile]
124
+ inf2 = dest_field(v2) if callable(dest_field) else v2[dest_field]
125
+ if inf1 != inf2:
126
+ # details mismatch, make copy
127
+ allfiles[k] = otherfile
128
+ else:
129
+ # details match, don't copy
130
+ allfiles.pop(k)
131
+ else:
132
+ # file not in target yet
133
+ allfiles[k] = otherfile
134
+ logger.debug(f"{len(allfiles)} files to copy")
135
+ if allfiles:
136
+ source_files, target_files = zip(*allfiles.items())
137
+ fs.cp(source_files, target_files, **kwargs)
138
+ logger.debug(f"{len(to_delete)} files to delete")
139
+ if delete_missing and to_delete:
140
+ fs.rm(to_delete)
141
+ return allfiles
142
+
143
+
144
+ class GenericFileSystem(AsyncFileSystem):
145
+ """Wrapper over all other FS types
146
+
147
+ <experimental!>
148
+
149
+ This implementation is a single unified interface to be able to run FS operations
150
+ over generic URLs, and dispatch to the specific implementations using the URL
151
+ protocol prefix.
152
+
153
+ Note: instances of this FS are always async, even if you never use it with any async
154
+ backend.
155
+ """
156
+
157
+ protocol = "generic" # there is no real reason to ever use a protocol with this FS
158
+
159
+ def __init__(self, default_method="default", storage_options=None, **kwargs):
160
+ """
161
+
162
+ Parameters
163
+ ----------
164
+ default_method: str (optional)
165
+ Defines how to configure backend FS instances. Options are:
166
+ - "default": instantiate like FSClass(), with no
167
+ extra arguments; this is the default instance of that FS, and can be
168
+ configured via the config system
169
+ - "generic": takes instances from the `_generic_fs` dict in this module,
170
+ which you must populate before use. Keys are by protocol
171
+ - "options": expects storage_options, a dict mapping protocol to
172
+ kwargs to use when constructing the filesystem
173
+ - "current": takes the most recently instantiated version of each FS
174
+ """
175
+ self.method = default_method
176
+ self.st_opts = storage_options
177
+ super().__init__(**kwargs)
178
+
179
+ def _parent(self, path):
180
+ fs = _resolve_fs(path, self.method, storage_options=self.st_opts)
181
+ return fs.unstrip_protocol(fs._parent(path))
182
+
183
+ def _strip_protocol(self, path):
184
+ # normalization only
185
+ fs = _resolve_fs(path, self.method, storage_options=self.st_opts)
186
+ return fs.unstrip_protocol(fs._strip_protocol(path))
187
+
188
+ async def _find(self, path, maxdepth=None, withdirs=False, detail=False, **kwargs):
189
+ fs = _resolve_fs(path, self.method, storage_options=self.st_opts)
190
+ if fs.async_impl:
191
+ out = await fs._find(
192
+ path, maxdepth=maxdepth, withdirs=withdirs, detail=True, **kwargs
193
+ )
194
+ else:
195
+ out = fs.find(
196
+ path, maxdepth=maxdepth, withdirs=withdirs, detail=True, **kwargs
197
+ )
198
+ result = {}
199
+ for k, v in out.items():
200
+ v = v.copy() # don't corrupt target FS dircache
201
+ name = fs.unstrip_protocol(k)
202
+ v["name"] = name
203
+ result[name] = v
204
+ if detail:
205
+ return result
206
+ return list(result)
207
+
208
+ async def _info(self, url, **kwargs):
209
+ fs = _resolve_fs(url, self.method)
210
+ if fs.async_impl:
211
+ out = await fs._info(url, **kwargs)
212
+ else:
213
+ out = fs.info(url, **kwargs)
214
+ out = out.copy() # don't edit originals
215
+ out["name"] = fs.unstrip_protocol(out["name"])
216
+ return out
217
+
218
+ async def _ls(
219
+ self,
220
+ url,
221
+ detail=True,
222
+ **kwargs,
223
+ ):
224
+ fs = _resolve_fs(url, self.method)
225
+ if fs.async_impl:
226
+ out = await fs._ls(url, detail=True, **kwargs)
227
+ else:
228
+ out = fs.ls(url, detail=True, **kwargs)
229
+ out = [o.copy() for o in out] # don't edit originals
230
+ for o in out:
231
+ o["name"] = fs.unstrip_protocol(o["name"])
232
+ if detail:
233
+ return out
234
+ else:
235
+ return [o["name"] for o in out]
236
+
237
+ async def _cat_file(
238
+ self,
239
+ url,
240
+ **kwargs,
241
+ ):
242
+ fs = _resolve_fs(url, self.method)
243
+ if fs.async_impl:
244
+ return await fs._cat_file(url, **kwargs)
245
+ else:
246
+ return fs.cat_file(url, **kwargs)
247
+
248
+ async def _pipe_file(
249
+ self,
250
+ path,
251
+ value,
252
+ **kwargs,
253
+ ):
254
+ fs = _resolve_fs(path, self.method, storage_options=self.st_opts)
255
+ if fs.async_impl:
256
+ return await fs._pipe_file(path, value, **kwargs)
257
+ else:
258
+ return fs.pipe_file(path, value, **kwargs)
259
+
260
+ async def _rm(self, url, **kwargs):
261
+ urls = url
262
+ if isinstance(urls, str):
263
+ urls = [urls]
264
+ fs = _resolve_fs(urls[0], self.method)
265
+ if fs.async_impl:
266
+ await fs._rm(urls, **kwargs)
267
+ else:
268
+ fs.rm(url, **kwargs)
269
+
270
+ async def _makedirs(self, path, exist_ok=False):
271
+ logger.debug("Make dir %s", path)
272
+ fs = _resolve_fs(path, self.method, storage_options=self.st_opts)
273
+ if fs.async_impl:
274
+ await fs._makedirs(path, exist_ok=exist_ok)
275
+ else:
276
+ fs.makedirs(path, exist_ok=exist_ok)
277
+
278
+ def rsync(self, source, destination, **kwargs):
279
+ """Sync files between two directory trees
280
+
281
+ See `func:rsync` for more details.
282
+ """
283
+ rsync(source, destination, fs=self, **kwargs)
284
+
285
+ async def _cp_file(
286
+ self,
287
+ url,
288
+ url2,
289
+ blocksize=2**20,
290
+ callback=DEFAULT_CALLBACK,
291
+ tempdir: str | None = None,
292
+ **kwargs,
293
+ ):
294
+ fs = _resolve_fs(url, self.method)
295
+ fs2 = _resolve_fs(url2, self.method)
296
+ if fs is fs2:
297
+ # pure remote
298
+ if fs.async_impl:
299
+ return await fs._copy(url, url2, **kwargs)
300
+ else:
301
+ return fs.copy(url, url2, **kwargs)
302
+ await copy_file_op(fs, [url], fs2, [url2], tempdir, 1, on_error="raise")
303
+
304
+ async def _make_many_dirs(self, urls, exist_ok=True):
305
+ fs = _resolve_fs(urls[0], self.method)
306
+ if fs.async_impl:
307
+ coros = [fs._makedirs(u, exist_ok=exist_ok) for u in urls]
308
+ await _run_coros_in_chunks(coros)
309
+ else:
310
+ for u in urls:
311
+ fs.makedirs(u, exist_ok=exist_ok)
312
+
313
+ make_many_dirs = sync_wrapper(_make_many_dirs)
314
+
315
+ async def _copy(
316
+ self,
317
+ path1: list[str],
318
+ path2: list[str],
319
+ recursive: bool = False,
320
+ on_error: str = "ignore",
321
+ maxdepth: int | None = None,
322
+ batch_size: int | None = None,
323
+ tempdir: str | None = None,
324
+ **kwargs,
325
+ ):
326
+ # TODO: special case for one FS being local, which can use get/put
327
+ # TODO: special case for one being memFS, which can use cat/pipe
328
+ if recursive:
329
+ raise NotImplementedError("Please use fsspec.generic.rsync")
330
+ path1 = [path1] if isinstance(path1, str) else path1
331
+ path2 = [path2] if isinstance(path2, str) else path2
332
+
333
+ fs = _resolve_fs(path1, self.method)
334
+ fs2 = _resolve_fs(path2, self.method)
335
+
336
+ if fs is fs2:
337
+ if fs.async_impl:
338
+ return await fs._copy(path1, path2, **kwargs)
339
+ else:
340
+ return fs.copy(path1, path2, **kwargs)
341
+
342
+ await copy_file_op(
343
+ fs, path1, fs2, path2, tempdir, batch_size, on_error=on_error
344
+ )
345
+
346
+
347
+ async def copy_file_op(
348
+ fs1, url1, fs2, url2, tempdir=None, batch_size=20, on_error="ignore"
349
+ ):
350
+ import tempfile
351
+
352
+ tempdir = tempdir or tempfile.mkdtemp()
353
+ try:
354
+ coros = [
355
+ _copy_file_op(
356
+ fs1,
357
+ u1,
358
+ fs2,
359
+ u2,
360
+ os.path.join(tempdir, uuid.uuid4().hex),
361
+ )
362
+ for u1, u2 in zip(url1, url2)
363
+ ]
364
+ out = await _run_coros_in_chunks(
365
+ coros, batch_size=batch_size, return_exceptions=True
366
+ )
367
+ finally:
368
+ shutil.rmtree(tempdir)
369
+ if on_error == "return":
370
+ return out
371
+ elif on_error == "raise":
372
+ for o in out:
373
+ if isinstance(o, Exception):
374
+ raise o
375
+
376
+
377
+ async def _copy_file_op(fs1, url1, fs2, url2, local, on_error="ignore"):
378
+ if fs1.async_impl:
379
+ await fs1._get_file(url1, local)
380
+ else:
381
+ fs1.get_file(url1, local)
382
+ if fs2.async_impl:
383
+ await fs2._put_file(local, url2)
384
+ else:
385
+ fs2.put_file(local, url2)
386
+ os.unlink(local)
387
+ logger.debug("Copy %s -> %s; done", url1, url2)
388
+
389
+
390
+ async def maybe_await(cor):
391
+ if inspect.iscoroutine(cor):
392
+ return await cor
393
+ else:
394
+ return cor
phivenv/Lib/site-packages/fsspec/gui.py ADDED
@@ -0,0 +1,417 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ast
2
+ import contextlib
3
+ import logging
4
+ import os
5
+ import re
6
+ from collections.abc import Sequence
7
+ from typing import ClassVar
8
+
9
+ import panel as pn
10
+
11
+ from .core import OpenFile, get_filesystem_class, split_protocol
12
+ from .registry import known_implementations
13
+
14
+ pn.extension()
15
+ logger = logging.getLogger("fsspec.gui")
16
+
17
+
18
+ class SigSlot:
19
+ """Signal-slot mixin, for Panel event passing
20
+
21
+ Include this class in a widget manager's superclasses to be able to
22
+ register events and callbacks on Panel widgets managed by that class.
23
+
24
+ The method ``_register`` should be called as widgets are added, and external
25
+ code should call ``connect`` to associate callbacks.
26
+
27
+ By default, all signals emit a DEBUG logging statement.
28
+ """
29
+
30
+ # names of signals that this class may emit each of which must be
31
+ # set by _register for any new instance
32
+ signals: ClassVar[Sequence[str]] = []
33
+ # names of actions that this class may respond to
34
+ slots: ClassVar[Sequence[str]] = []
35
+
36
+ # each of which must be a method name
37
+
38
+ def __init__(self):
39
+ self._ignoring_events = False
40
+ self._sigs = {}
41
+ self._map = {}
42
+ self._setup()
43
+
44
+ def _setup(self):
45
+ """Create GUI elements and register signals"""
46
+ self.panel = pn.pane.PaneBase()
47
+ # no signals to set up in the base class
48
+
49
+ def _register(
50
+ self, widget, name, thing="value", log_level=logging.DEBUG, auto=False
51
+ ):
52
+ """Watch the given attribute of a widget and assign it a named event
53
+
54
+ This is normally called at the time a widget is instantiated, in the
55
+ class which owns it.
56
+
57
+ Parameters
58
+ ----------
59
+ widget : pn.layout.Panel or None
60
+ Widget to watch. If None, an anonymous signal not associated with
61
+ any widget.
62
+ name : str
63
+ Name of this event
64
+ thing : str
65
+ Attribute of the given widget to watch
66
+ log_level : int
67
+ When the signal is triggered, a logging event of the given level
68
+ will be fired in the dfviz logger.
69
+ auto : bool
70
+ If True, automatically connects with a method in this class of the
71
+ same name.
72
+ """
73
+ if name not in self.signals:
74
+ raise ValueError(f"Attempt to assign an undeclared signal: {name}")
75
+ self._sigs[name] = {
76
+ "widget": widget,
77
+ "callbacks": [],
78
+ "thing": thing,
79
+ "log": log_level,
80
+ }
81
+ wn = "-".join(
82
+ [
83
+ getattr(widget, "name", str(widget)) if widget is not None else "none",
84
+ thing,
85
+ ]
86
+ )
87
+ self._map[wn] = name
88
+ if widget is not None:
89
+ widget.param.watch(self._signal, thing, onlychanged=True)
90
+ if auto and hasattr(self, name):
91
+ self.connect(name, getattr(self, name))
92
+
93
+ def _repr_mimebundle_(self, *args, **kwargs):
94
+ """Display in a notebook or a server"""
95
+ try:
96
+ return self.panel._repr_mimebundle_(*args, **kwargs)
97
+ except (ValueError, AttributeError) as exc:
98
+ raise NotImplementedError(
99
+ "Panel does not seem to be set up properly"
100
+ ) from exc
101
+
102
+ def connect(self, signal, slot):
103
+ """Associate call back with given event
104
+
105
+ The callback must be a function which takes the "new" value of the
106
+ watched attribute as the only parameter. If the callback return False,
107
+ this cancels any further processing of the given event.
108
+
109
+ Alternatively, the callback can be a string, in which case it means
110
+ emitting the correspondingly-named event (i.e., connect to self)
111
+ """
112
+ self._sigs[signal]["callbacks"].append(slot)
113
+
114
+ def _signal(self, event):
115
+ """This is called by a an action on a widget
116
+
117
+ Within an self.ignore_events context, nothing happens.
118
+
119
+ Tests can execute this method by directly changing the values of
120
+ widget components.
121
+ """
122
+ if not self._ignoring_events:
123
+ wn = "-".join([event.obj.name, event.name])
124
+ if wn in self._map and self._map[wn] in self._sigs:
125
+ self._emit(self._map[wn], event.new)
126
+
127
+ @contextlib.contextmanager
128
+ def ignore_events(self):
129
+ """Temporarily turn off events processing in this instance
130
+
131
+ (does not propagate to children)
132
+ """
133
+ self._ignoring_events = True
134
+ try:
135
+ yield
136
+ finally:
137
+ self._ignoring_events = False
138
+
139
+ def _emit(self, sig, value=None):
140
+ """An event happened, call its callbacks
141
+
142
+ This method can be used in tests to simulate message passing without
143
+ directly changing visual elements.
144
+
145
+ Calling of callbacks will halt whenever one returns False.
146
+ """
147
+ logger.log(self._sigs[sig]["log"], f"{sig}: {value}")
148
+ for callback in self._sigs[sig]["callbacks"]:
149
+ if isinstance(callback, str):
150
+ self._emit(callback)
151
+ else:
152
+ try:
153
+ # running callbacks should not break the interface
154
+ ret = callback(value)
155
+ if ret is False:
156
+ break
157
+ except Exception as e:
158
+ logger.exception(
159
+ "Exception (%s) while executing callback for signal: %s",
160
+ e,
161
+ sig,
162
+ )
163
+
164
+ def show(self, threads=False):
165
+ """Open a new browser tab and display this instance's interface"""
166
+ self.panel.show(threads=threads, verbose=False)
167
+ return self
168
+
169
+
170
+ class SingleSelect(SigSlot):
171
+ """A multiselect which only allows you to select one item for an event"""
172
+
173
+ signals = ["_selected", "selected"] # the first is internal
174
+ slots = ["set_options", "set_selection", "add", "clear", "select"]
175
+
176
+ def __init__(self, **kwargs):
177
+ self.kwargs = kwargs
178
+ super().__init__()
179
+
180
+ def _setup(self):
181
+ self.panel = pn.widgets.MultiSelect(**self.kwargs)
182
+ self._register(self.panel, "_selected", "value")
183
+ self._register(None, "selected")
184
+ self.connect("_selected", self.select_one)
185
+
186
+ def _signal(self, *args, **kwargs):
187
+ super()._signal(*args, **kwargs)
188
+
189
+ def select_one(self, *_):
190
+ with self.ignore_events():
191
+ val = [self.panel.value[-1]] if self.panel.value else []
192
+ self.panel.value = val
193
+ self._emit("selected", self.panel.value)
194
+
195
+ def set_options(self, options):
196
+ self.panel.options = options
197
+
198
+ def clear(self):
199
+ self.panel.options = []
200
+
201
+ @property
202
+ def value(self):
203
+ return self.panel.value
204
+
205
+ def set_selection(self, selection):
206
+ self.panel.value = [selection]
207
+
208
+
209
+ class FileSelector(SigSlot):
210
+ """Panel-based graphical file selector widget
211
+
212
+ Instances of this widget are interactive and can be displayed in jupyter by having
213
+ them as the output of a cell, or in a separate browser tab using ``.show()``.
214
+ """
215
+
216
+ signals = [
217
+ "protocol_changed",
218
+ "selection_changed",
219
+ "directory_entered",
220
+ "home_clicked",
221
+ "up_clicked",
222
+ "go_clicked",
223
+ "filters_changed",
224
+ ]
225
+ slots = ["set_filters", "go_home"]
226
+
227
+ def __init__(self, url=None, filters=None, ignore=None, kwargs=None):
228
+ """
229
+
230
+ Parameters
231
+ ----------
232
+ url : str (optional)
233
+ Initial value of the URL to populate the dialog; should include protocol
234
+ filters : list(str) (optional)
235
+ File endings to include in the listings. If not included, all files are
236
+ allowed. Does not affect directories.
237
+ If given, the endings will appear as checkboxes in the interface
238
+ ignore : list(str) (optional)
239
+ Regex(s) of file basename patterns to ignore, e.g., "\\." for typical
240
+ hidden files on posix
241
+ kwargs : dict (optional)
242
+ To pass to file system instance
243
+ """
244
+ if url:
245
+ self.init_protocol, url = split_protocol(url)
246
+ else:
247
+ self.init_protocol, url = "file", os.getcwd()
248
+ self.init_url = url
249
+ self.init_kwargs = (kwargs if isinstance(kwargs, str) else str(kwargs)) or "{}"
250
+ self.filters = filters
251
+ self.ignore = [re.compile(i) for i in ignore or []]
252
+ self._fs = None
253
+ super().__init__()
254
+
255
+ def _setup(self):
256
+ self.url = pn.widgets.TextInput(
257
+ name="url",
258
+ value=self.init_url,
259
+ align="end",
260
+ sizing_mode="stretch_width",
261
+ width_policy="max",
262
+ )
263
+ self.protocol = pn.widgets.Select(
264
+ options=sorted(known_implementations),
265
+ value=self.init_protocol,
266
+ name="protocol",
267
+ align="center",
268
+ )
269
+ self.kwargs = pn.widgets.TextInput(
270
+ name="kwargs", value=self.init_kwargs, align="center"
271
+ )
272
+ self.go = pn.widgets.Button(name="⇨", align="end", width=45)
273
+ self.main = SingleSelect(size=10)
274
+ self.home = pn.widgets.Button(name="🏠", width=40, height=30, align="end")
275
+ self.up = pn.widgets.Button(name="‹", width=30, height=30, align="end")
276
+
277
+ self._register(self.protocol, "protocol_changed", auto=True)
278
+ self._register(self.go, "go_clicked", "clicks", auto=True)
279
+ self._register(self.up, "up_clicked", "clicks", auto=True)
280
+ self._register(self.home, "home_clicked", "clicks", auto=True)
281
+ self._register(None, "selection_changed")
282
+ self.main.connect("selected", self.selection_changed)
283
+ self._register(None, "directory_entered")
284
+ self.prev_protocol = self.protocol.value
285
+ self.prev_kwargs = self.storage_options
286
+
287
+ self.filter_sel = pn.widgets.CheckBoxGroup(
288
+ value=[], options=[], inline=False, align="end", width_policy="min"
289
+ )
290
+ self._register(self.filter_sel, "filters_changed", auto=True)
291
+
292
+ self.panel = pn.Column(
293
+ pn.Row(self.protocol, self.kwargs),
294
+ pn.Row(self.home, self.up, self.url, self.go, self.filter_sel),
295
+ self.main.panel,
296
+ )
297
+ self.set_filters(self.filters)
298
+ self.go_clicked()
299
+
300
+ def set_filters(self, filters=None):
301
+ self.filters = filters
302
+ if filters:
303
+ self.filter_sel.options = filters
304
+ self.filter_sel.value = filters
305
+ else:
306
+ self.filter_sel.options = []
307
+ self.filter_sel.value = []
308
+
309
+ @property
310
+ def storage_options(self):
311
+ """Value of the kwargs box as a dictionary"""
312
+ return ast.literal_eval(self.kwargs.value) or {}
313
+
314
+ @property
315
+ def fs(self):
316
+ """Current filesystem instance"""
317
+ if self._fs is None:
318
+ cls = get_filesystem_class(self.protocol.value)
319
+ self._fs = cls(**self.storage_options)
320
+ return self._fs
321
+
322
+ @property
323
+ def urlpath(self):
324
+ """URL of currently selected item"""
325
+ return (
326
+ (f"{self.protocol.value}://{self.main.value[0]}")
327
+ if self.main.value
328
+ else None
329
+ )
330
+
331
+ def open_file(self, mode="rb", compression=None, encoding=None):
332
+ """Create OpenFile instance for the currently selected item
333
+
334
+ For example, in a notebook you might do something like
335
+
336
+ .. code-block::
337
+
338
+ [ ]: sel = FileSelector(); sel
339
+
340
+ # user selects their file
341
+
342
+ [ ]: with sel.open_file('rb') as f:
343
+ ... out = f.read()
344
+
345
+ Parameters
346
+ ----------
347
+ mode: str (optional)
348
+ Open mode for the file.
349
+ compression: str (optional)
350
+ The interact with the file as compressed. Set to 'infer' to guess
351
+ compression from the file ending
352
+ encoding: str (optional)
353
+ If using text mode, use this encoding; defaults to UTF8.
354
+ """
355
+ if self.urlpath is None:
356
+ raise ValueError("No file selected")
357
+ return OpenFile(self.fs, self.urlpath, mode, compression, encoding)
358
+
359
+ def filters_changed(self, values):
360
+ self.filters = values
361
+ self.go_clicked()
362
+
363
+ def selection_changed(self, *_):
364
+ if self.urlpath is None:
365
+ return
366
+ if self.fs.isdir(self.urlpath):
367
+ self.url.value = self.fs._strip_protocol(self.urlpath)
368
+ self.go_clicked()
369
+
370
+ def go_clicked(self, *_):
371
+ if (
372
+ self.prev_protocol != self.protocol.value
373
+ or self.prev_kwargs != self.storage_options
374
+ ):
375
+ self._fs = None # causes fs to be recreated
376
+ self.prev_protocol = self.protocol.value
377
+ self.prev_kwargs = self.storage_options
378
+ listing = sorted(
379
+ self.fs.ls(self.url.value, detail=True), key=lambda x: x["name"]
380
+ )
381
+ listing = [
382
+ l
383
+ for l in listing
384
+ if not any(i.match(l["name"].rsplit("/", 1)[-1]) for i in self.ignore)
385
+ ]
386
+ folders = {
387
+ "📁 " + o["name"].rsplit("/", 1)[-1]: o["name"]
388
+ for o in listing
389
+ if o["type"] == "directory"
390
+ }
391
+ files = {
392
+ "📄 " + o["name"].rsplit("/", 1)[-1]: o["name"]
393
+ for o in listing
394
+ if o["type"] == "file"
395
+ }
396
+ if self.filters:
397
+ files = {
398
+ k: v
399
+ for k, v in files.items()
400
+ if any(v.endswith(ext) for ext in self.filters)
401
+ }
402
+ self.main.set_options(dict(**folders, **files))
403
+
404
+ def protocol_changed(self, *_):
405
+ self._fs = None
406
+ self.main.options = []
407
+ self.url.value = ""
408
+
409
+ def home_clicked(self, *_):
410
+ self.protocol.value = self.init_protocol
411
+ self.kwargs.value = self.init_kwargs
412
+ self.url.value = self.init_url
413
+ self.go_clicked()
414
+
415
+ def up_clicked(self, *_):
416
+ self.url.value = self.fs._parent(self.url.value)
417
+ self.go_clicked()
phivenv/Lib/site-packages/fsspec/implementations/__init__.py ADDED
File without changes
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/__init__.cpython-39.pyc ADDED
Binary file (166 Bytes). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/arrow.cpython-39.pyc ADDED
Binary file (9.05 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/asyn_wrapper.cpython-39.pyc ADDED
Binary file (4.12 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/cache_mapper.cpython-39.pyc ADDED
Binary file (3.2 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/cache_metadata.cpython-39.pyc ADDED
Binary file (7.39 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/cached.cpython-39.pyc ADDED
Binary file (30.8 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/dask.cpython-39.pyc ADDED
Binary file (4.89 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/data.cpython-39.pyc ADDED
Binary file (2.25 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/dbfs.cpython-39.pyc ADDED
Binary file (14.4 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/dirfs.cpython-39.pyc ADDED
Binary file (15 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/ftp.cpython-39.pyc ADDED
Binary file (11.1 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/gist.cpython-39.pyc ADDED
Binary file (6.95 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/git.cpython-39.pyc ADDED
Binary file (4.12 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/github.cpython-39.pyc ADDED
Binary file (10.2 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/http.cpython-39.pyc ADDED
Binary file (24.7 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/http_sync.cpython-39.pyc ADDED
Binary file (25.5 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/jupyter.cpython-39.pyc ADDED
Binary file (4.19 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/libarchive.cpython-39.pyc ADDED
Binary file (6.05 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/local.cpython-39.pyc ADDED
Binary file (15.5 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/memory.cpython-39.pyc ADDED
Binary file (8.57 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/reference.cpython-39.pyc ADDED
Binary file (40.4 kB). View file
 
phivenv/Lib/site-packages/fsspec/implementations/__pycache__/sftp.cpython-39.pyc ADDED
Binary file (6.33 kB). View file