Spaces:
Running
Running
| """Joblib is a set of tools to provide **lightweight pipelining in | |
| Python**. In particular: | |
| 1. transparent disk-caching of functions and lazy re-evaluation | |
| (memoize pattern) | |
| 2. easy simple parallel computing | |
| Joblib is optimized to be **fast** and **robust** on large | |
| data in particular and has specific optimizations for `numpy` arrays. It is | |
| **BSD-licensed**. | |
| ==================== =============================================== | |
| **Documentation:** https://joblib.readthedocs.io | |
| **Download:** https://pypi.python.org/pypi/joblib#downloads | |
| **Source code:** https://github.com/joblib/joblib | |
| **Report issues:** https://github.com/joblib/joblib/issues | |
| ==================== =============================================== | |
| Vision | |
| -------- | |
| The vision is to provide tools to easily achieve better performance and | |
| reproducibility when working with long running jobs. | |
| * **Avoid computing the same thing twice**: code is often rerun again and | |
| again, for instance when prototyping computational-heavy jobs (as in | |
| scientific development), but hand-crafted solutions to alleviate this | |
| issue are error-prone and often lead to unreproducible results. | |
| * **Persist to disk transparently**: efficiently persisting | |
| arbitrary objects containing large data is hard. Using | |
| joblib's caching mechanism avoids hand-written persistence and | |
| implicitly links the file on disk to the execution context of | |
| the original Python object. As a result, joblib's persistence is | |
| good for resuming an application status or computational job, eg | |
| after a crash. | |
| Joblib addresses these problems while **leaving your code and your flow | |
| control as unmodified as possible** (no framework, no new paradigms). | |
| Main features | |
| ------------------ | |
| 1) **Transparent and fast disk-caching of output value:** a memoize or | |
| make-like functionality for Python functions that works well for | |
| arbitrary Python objects, including very large numpy arrays. Separate | |
| persistence and flow-execution logic from domain logic or algorithmic | |
| code by writing the operations as a set of steps with well-defined | |
| inputs and outputs: Python functions. Joblib can save their | |
| computation to disk and rerun it only if necessary:: | |
| >>> from joblib import Memory | |
| >>> location = 'your_cache_dir_goes_here' | |
| >>> mem = Memory(location, verbose=1) | |
| >>> import numpy as np | |
| >>> a = np.vander(np.arange(3)).astype(float) | |
| >>> square = mem.cache(np.square) | |
| >>> b = square(a) # doctest: +ELLIPSIS | |
| ______________________________________________________________________... | |
| [Memory] Calling ...square... | |
| square(array([[0., 0., 1.], | |
| [1., 1., 1.], | |
| [4., 2., 1.]])) | |
| _________________________________________________...square - ...s, 0.0min | |
| >>> c = square(a) | |
| >>> # The above call did not trigger an evaluation | |
| 2) **Embarrassingly parallel helper:** to make it easy to write readable | |
| parallel code and debug it quickly:: | |
| >>> from joblib import Parallel, delayed | |
| >>> from math import sqrt | |
| >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) | |
| [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] | |
| 3) **Fast compressed Persistence**: a replacement for pickle to work | |
| efficiently on Python objects containing large data ( | |
| *joblib.dump* & *joblib.load* ). | |
| .. | |
| >>> import shutil ; shutil.rmtree(location) | |
| """ | |
| # PEP0440 compatible formatted version, see: | |
| # https://www.python.org/dev/peps/pep-0440/ | |
| # | |
| # Generic release markers: | |
| # X.Y | |
| # X.Y.Z # For bugfix releases | |
| # | |
| # Admissible pre-release markers: | |
| # X.YaN # Alpha release | |
| # X.YbN # Beta release | |
| # X.YrcN # Release Candidate | |
| # X.Y # Final release | |
| # | |
| # Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer. | |
| # 'X.Y.dev0' is the canonical version of 'X.Y.dev' | |
| # | |
| __version__ = "1.5.3" | |
| import os | |
| from ._cloudpickle_wrapper import wrap_non_picklable_objects | |
| from ._parallel_backends import ParallelBackendBase | |
| from ._store_backends import StoreBackendBase | |
| from .compressor import register_compressor | |
| from .hashing import hash | |
| from .logger import Logger, PrintTime | |
| from .memory import MemorizedResult, Memory, expires_after, register_store_backend | |
| from .numpy_pickle import dump, load | |
| from .parallel import ( | |
| Parallel, | |
| cpu_count, | |
| delayed, | |
| effective_n_jobs, | |
| parallel_backend, | |
| parallel_config, | |
| register_parallel_backend, | |
| ) | |
| __all__ = [ | |
| # On-disk result caching | |
| "Memory", | |
| "MemorizedResult", | |
| "expires_after", | |
| # Parallel code execution | |
| "Parallel", | |
| "delayed", | |
| "cpu_count", | |
| "effective_n_jobs", | |
| "wrap_non_picklable_objects", | |
| # Context to change the backend globally | |
| "parallel_config", | |
| "parallel_backend", | |
| # Helpers to define and register store/parallel backends | |
| "ParallelBackendBase", | |
| "StoreBackendBase", | |
| "register_compressor", | |
| "register_parallel_backend", | |
| "register_store_backend", | |
| # Helpers kept for backward compatibility | |
| "PrintTime", | |
| "Logger", | |
| "hash", | |
| "dump", | |
| "load", | |
| ] | |
| # Workaround issue discovered in intel-openmp 2019.5: | |
| # https://github.com/ContinuumIO/anaconda-issues/issues/11294 | |
| os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE") | |