tmp
/
pip-install-ghxuqwgs
/numpy_78e94bf2b6094bf9a1f3d92042f9bf46
/doc
/source
/reference
/maskedarray.generic.rst
| .. currentmodule:: numpy.ma | |
| .. _maskedarray.generic: | |
| The :mod:`numpy.ma` module | |
| ========================== | |
| Rationale | |
| --------- | |
| Masked arrays are arrays that may have missing or invalid entries. | |
| The :mod:`numpy.ma` module provides a nearly work-alike replacement for numpy | |
| that supports data arrays with masks. | |
| What is a masked array? | |
| ----------------------- | |
| In many circumstances, datasets can be incomplete or tainted by the presence | |
| of invalid data. For example, a sensor may have failed to record a data, or | |
| recorded an invalid value. The :mod:`numpy.ma` module provides a convenient | |
| way to address this issue, by introducing masked arrays. | |
| A masked array is the combination of a standard :class:`numpy.ndarray` and a | |
| mask. A mask is either :attr:`nomask`, indicating that no value of the | |
| associated array is invalid, or an array of booleans that determines for each | |
| element of the associated array whether the value is valid or not. When an | |
| element of the mask is ``False``, the corresponding element of the associated | |
| array is valid and is said to be unmasked. When an element of the mask is | |
| ``True``, the corresponding element of the associated array is said to be | |
| masked (invalid). | |
| The package ensures that masked entries are not used in computations. | |
| As an illustration, let's consider the following dataset:: | |
| >>> import numpy as np | |
| >>> import numpy.ma as ma | |
| >>> x = np.array([1, 2, 3, -1, 5]) | |
| We wish to mark the fourth entry as invalid. The easiest is to create a masked | |
| array:: | |
| >>> mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0]) | |
| We can now compute the mean of the dataset, without taking the invalid data | |
| into account:: | |
| >>> mx.mean() | |
| 2.75 | |
| The :mod:`numpy.ma` module | |
| -------------------------- | |
| The main feature of the :mod:`numpy.ma` module is the :class:`MaskedArray` | |
| class, which is a subclass of :class:`numpy.ndarray`. The class, its | |
| attributes and methods are described in more details in the | |
| :ref:`MaskedArray class <maskedarray.baseclass>` section. | |
| The :mod:`numpy.ma` module can be used as an addition to :mod:`numpy`: :: | |
| >>> import numpy as np | |
| >>> import numpy.ma as ma | |
| To create an array with the second element invalid, we would do:: | |
| >>> y = ma.array([1, 2, 3], mask = [0, 1, 0]) | |
| To create a masked array where all values close to 1.e20 are invalid, we would | |
| do:: | |
| >>> z = masked_values([1.0, 1.e20, 3.0, 4.0], 1.e20) | |
| For a complete discussion of creation methods for masked arrays please see | |
| section :ref:`Constructing masked arrays <maskedarray.generic.constructing>`. | |
| Using numpy.ma | |
| ============== | |
| .. _maskedarray.generic.constructing: | |
| Constructing masked arrays | |
| -------------------------- | |
| There are several ways to construct a masked array. | |
| * A first possibility is to directly invoke the :class:`MaskedArray` class. | |
| * A second possibility is to use the two masked array constructors, | |
| :func:`array` and :func:`masked_array`. | |
| .. autosummary:: | |
| :toctree: generated/ | |
| array | |
| masked_array | |
| * A third option is to take the view of an existing array. In that case, the | |
| mask of the view is set to :attr:`nomask` if the array has no named fields, | |
| or an array of boolean with the same structure as the array otherwise. | |
| >>> x = np.array([1, 2, 3]) | |
| >>> x.view(ma.MaskedArray) | |
| masked_array(data = [1 2 3], | |
| mask = False, | |
| fill_value = 999999) | |
| >>> x = np.array([(1, 1.), (2, 2.)], dtype=[('a',int), ('b', float)]) | |
| >>> x.view(ma.MaskedArray) | |
| masked_array(data = [(1, 1.0) (2, 2.0)], | |
| mask = [(False, False) (False, False)], | |
| fill_value = (999999, 1e+20), | |
| dtype = [('a', '<i4'), ('b', '<f8')]) | |
| * Yet another possibility is to use any of the following functions: | |
| .. autosummary:: | |
| :toctree: generated/ | |
| asarray | |
| asanyarray | |
| fix_invalid | |
| masked_equal | |
| masked_greater | |
| masked_greater_equal | |
| masked_inside | |
| masked_invalid | |
| masked_less | |
| masked_less_equal | |
| masked_not_equal | |
| masked_object | |
| masked_outside | |
| masked_values | |
| masked_where | |
| Accessing the data | |
| ------------------ | |
| The underlying data of a masked array can be accessed in several ways: | |
| * through the :attr:`~MaskedArray.data` attribute. The output is a view of the | |
| array as a :class:`numpy.ndarray` or one of its subclasses, depending on the | |
| type of the underlying data at the masked array creation. | |
| * through the :meth:`~MaskedArray.__array__` method. The output is then a | |
| :class:`numpy.ndarray`. | |
| * by directly taking a view of the masked array as a :class:`numpy.ndarray` | |
| or one of its subclass (which is actually what using the | |
| :attr:`~MaskedArray.data` attribute does). | |
| * by using the :func:`getdata` function. | |
| None of these methods is completely satisfactory if some entries have been | |
| marked as invalid. As a general rule, where a representation of the array is | |
| required without any masked entries, it is recommended to fill the array with | |
| the :meth:`filled` method. | |
| Accessing the mask | |
| ------------------ | |
| The mask of a masked array is accessible through its :attr:`~MaskedArray.mask` | |
| attribute. We must keep in mind that a ``True`` entry in the mask indicates an | |
| *invalid* data. | |
| Another possibility is to use the :func:`getmask` and :func:`getmaskarray` | |
| functions. :func:`getmask(x)` outputs the mask of ``x`` if ``x`` is a masked | |
| array, and the special value :data:`nomask` otherwise. :func:`getmaskarray(x)` | |
| outputs the mask of ``x`` if ``x`` is a masked array. If ``x`` has no invalid | |
| entry or is not a masked array, the function outputs a boolean array of | |
| ``False`` with as many elements as ``x``. | |
| Accessing only the valid entries | |
| --------------------------------- | |
| To retrieve only the valid entries, we can use the inverse of the mask as an | |
| index. The inverse of the mask can be calculated with the | |
| :func:`numpy.logical_not` function or simply with the ``~`` operator:: | |
| >>> x = ma.array([[1, 2], [3, 4]], mask=[[0, 1], [1, 0]]) | |
| >>> x[~x.mask] | |
| masked_array(data = [1 4], | |
| mask = [False False], | |
| fill_value = 999999) | |
| Another way to retrieve the valid data is to use the :meth:`compressed` | |
| method, which returns a one-dimensional :class:`~numpy.ndarray` (or one of its | |
| subclasses, depending on the value of the :attr:`~MaskedArray.baseclass` | |
| attribute):: | |
| >>> x.compressed() | |
| array([1, 4]) | |
| Note that the output of :meth:`compressed` is always 1D. | |
| Modifying the mask | |
| ------------------ | |
| Masking an entry | |
| ~~~~~~~~~~~~~~~~ | |
| The recommended way to mark one or several specific entries of a masked array | |
| as invalid is to assign the special value :attr:`masked` to them:: | |
| >>> x = ma.array([1, 2, 3]) | |
| >>> x[0] = ma.masked | |
| >>> x | |
| masked_array(data = [-- 2 3], | |
| mask = [ True False False], | |
| fill_value = 999999) | |
| >>> y = ma.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) | |
| >>> y[(0, 1, 2), (1, 2, 0)] = ma.masked | |
| >>> y | |
| masked_array(data = | |
| [[1 -- 3] | |
| [4 5 --] | |
| [-- 8 9]], | |
| mask = | |
| [[False True False] | |
| [False False True] | |
| [ True False False]], | |
| fill_value = 999999) | |
| >>> z = ma.array([1, 2, 3, 4]) | |
| >>> z[:-2] = ma.masked | |
| >>> z | |
| masked_array(data = [-- -- 3 4], | |
| mask = [ True True False False], | |
| fill_value = 999999) | |
| A second possibility is to modify the :attr:`~MaskedArray.mask` directly, | |
| but this usage is discouraged. | |
| .. note:: | |
| When creating a new masked array with a simple, non-structured datatype, | |
| the mask is initially set to the special value :attr:`nomask`, that | |
| corresponds roughly to the boolean ``False``. Trying to set an element of | |
| :attr:`nomask` will fail with a :exc:`TypeError` exception, as a boolean | |
| does not support item assignment. | |
| All the entries of an array can be masked at once by assigning ``True`` to the | |
| mask:: | |
| >>> x = ma.array([1, 2, 3], mask=[0, 0, 1]) | |
| >>> x.mask = True | |
| >>> x | |
| masked_array(data = [-- -- --], | |
| mask = [ True True True], | |
| fill_value = 999999) | |
| Finally, specific entries can be masked and/or unmasked by assigning to the | |
| mask a sequence of booleans:: | |
| >>> x = ma.array([1, 2, 3]) | |
| >>> x.mask = [0, 1, 0] | |
| >>> x | |
| masked_array(data = [1 -- 3], | |
| mask = [False True False], | |
| fill_value = 999999) | |
| Unmasking an entry | |
| ~~~~~~~~~~~~~~~~~~ | |
| To unmask one or several specific entries, we can just assign one or several | |
| new valid values to them:: | |
| >>> x = ma.array([1, 2, 3], mask=[0, 0, 1]) | |
| >>> x | |
| masked_array(data = [1 2 --], | |
| mask = [False False True], | |
| fill_value = 999999) | |
| >>> x[-1] = 5 | |
| >>> x | |
| masked_array(data = [1 2 5], | |
| mask = [False False False], | |
| fill_value = 999999) | |
| .. note:: | |
| Unmasking an entry by direct assignment will silently fail if the masked | |
| array has a *hard* mask, as shown by the :attr:`hardmask` attribute. This | |
| feature was introduced to prevent overwriting the mask. To force the | |
| unmasking of an entry where the array has a hard mask, the mask must first | |
| to be softened using the :meth:`soften_mask` method before the allocation. | |
| It can be re-hardened with :meth:`harden_mask`:: | |
| >>> x = ma.array([1, 2, 3], mask=[0, 0, 1], hard_mask=True) | |
| >>> x | |
| masked_array(data = [1 2 --], | |
| mask = [False False True], | |
| fill_value = 999999) | |
| >>> x[-1] = 5 | |
| >>> x | |
| masked_array(data = [1 2 --], | |
| mask = [False False True], | |
| fill_value = 999999) | |
| >>> x.soften_mask() | |
| >>> x[-1] = 5 | |
| >>> x | |
| masked_array(data = [1 2 5], | |
| mask = [False False False], | |
| fill_value = 999999) | |
| >>> x.harden_mask() | |
| To unmask all masked entries of a masked array (provided the mask isn't a hard | |
| mask), the simplest solution is to assign the constant :attr:`nomask` to the | |
| mask:: | |
| >>> x = ma.array([1, 2, 3], mask=[0, 0, 1]) | |
| >>> x | |
| masked_array(data = [1 2 --], | |
| mask = [False False True], | |
| fill_value = 999999) | |
| >>> x.mask = ma.nomask | |
| >>> x | |
| masked_array(data = [1 2 3], | |
| mask = [False False False], | |
| fill_value = 999999) | |
| Indexing and slicing | |
| -------------------- | |
| As a :class:`MaskedArray` is a subclass of :class:`numpy.ndarray`, it inherits | |
| its mechanisms for indexing and slicing. | |
| When accessing a single entry of a masked array with no named fields, the | |
| output is either a scalar (if the corresponding entry of the mask is | |
| ``False``) or the special value :attr:`masked` (if the corresponding entry of | |
| the mask is ``True``):: | |
| >>> x = ma.array([1, 2, 3], mask=[0, 0, 1]) | |
| >>> x[0] | |
| 1 | |
| >>> x[-1] | |
| masked_array(data = --, | |
| mask = True, | |
| fill_value = 1e+20) | |
| >>> x[-1] is ma.masked | |
| True | |
| If the masked array has named fields, accessing a single entry returns a | |
| :class:`numpy.void` object if none of the fields are masked, or a 0d masked | |
| array with the same dtype as the initial array if at least one of the fields | |
| is masked. | |
| >>> y = ma.masked_array([(1,2), (3, 4)], | |
| ... mask=[(0, 0), (0, 1)], | |
| ... dtype=[('a', int), ('b', int)]) | |
| >>> y[0] | |
| (1, 2) | |
| >>> y[-1] | |
| masked_array(data = (3, --), | |
| mask = (False, True), | |
| fill_value = (999999, 999999), | |
| dtype = [('a', '<i4'), ('b', '<i4')]) | |
| When accessing a slice, the output is a masked array whose | |
| :attr:`~MaskedArray.data` attribute is a view of the original data, and whose | |
| mask is either :attr:`nomask` (if there was no invalid entries in the original | |
| array) or a copy of the corresponding slice of the original mask. The copy is | |
| required to avoid propagation of any modification of the mask to the original. | |
| >>> x = ma.array([1, 2, 3, 4, 5], mask=[0, 1, 0, 0, 1]) | |
| >>> mx = x[:3] | |
| >>> mx | |
| masked_array(data = [1 -- 3], | |
| mask = [False True False], | |
| fill_value = 999999) | |
| >>> mx[1] = -1 | |
| >>> mx | |
| masked_array(data = [1 -1 3], | |
| mask = [False True False], | |
| fill_value = 999999) | |
| >>> x.mask | |
| array([False, True, False, False, True], dtype=bool) | |
| >>> x.data | |
| array([ 1, -1, 3, 4, 5]) | |
| Accessing a field of a masked array with structured datatype returns a | |
| :class:`MaskedArray`. | |
| Operations on masked arrays | |
| --------------------------- | |
| Arithmetic and comparison operations are supported by masked arrays. | |
| As much as possible, invalid entries of a masked array are not processed, | |
| meaning that the corresponding :attr:`data` entries *should* be the same | |
| before and after the operation. | |
| .. warning:: | |
| We need to stress that this behavior may not be systematic, that masked | |
| data may be affected by the operation in some cases and therefore users | |
| should not rely on this data remaining unchanged. | |
| The :mod:`numpy.ma` module comes with a specific implementation of most | |
| ufuncs. Unary and binary functions that have a validity domain (such as | |
| :func:`~numpy.log` or :func:`~numpy.divide`) return the :data:`masked` | |
| constant whenever the input is masked or falls outside the validity domain:: | |
| >>> ma.log([-1, 0, 1, 2]) | |
| masked_array(data = [-- -- 0.0 0.69314718056], | |
| mask = [ True True False False], | |
| fill_value = 1e+20) | |
| Masked arrays also support standard numpy ufuncs. The output is then a masked | |
| array. The result of a unary ufunc is masked wherever the input is masked. The | |
| result of a binary ufunc is masked wherever any of the input is masked. If the | |
| ufunc also returns the optional context output (a 3-element tuple containing | |
| the name of the ufunc, its arguments and its domain), the context is processed | |
| and entries of the output masked array are masked wherever the corresponding | |
| input fall outside the validity domain:: | |
| >>> x = ma.array([-1, 1, 0, 2, 3], mask=[0, 0, 0, 0, 1]) | |
| >>> np.log(x) | |
| masked_array(data = [-- -- 0.0 0.69314718056 --], | |
| mask = [ True True False False True], | |
| fill_value = 1e+20) | |
| Examples | |
| ======== | |
| Data with a given value representing missing data | |
| ------------------------------------------------- | |
| Let's consider a list of elements, ``x``, where values of -9999. represent | |
| missing data. We wish to compute the average value of the data and the vector | |
| of anomalies (deviations from the average):: | |
| >>> import numpy.ma as ma | |
| >>> x = [0.,1.,-9999.,3.,4.] | |
| >>> mx = ma.masked_values (x, -9999.) | |
| >>> print mx.mean() | |
| 2.0 | |
| >>> print mx - mx.mean() | |
| [-2.0 -1.0 -- 1.0 2.0] | |
| >>> print mx.anom() | |
| [-2.0 -1.0 -- 1.0 2.0] | |
| Filling in the missing data | |
| --------------------------- | |
| Suppose now that we wish to print that same data, but with the missing values | |
| replaced by the average value. | |
| >>> print mx.filled(mx.mean()) | |
| [ 0. 1. 2. 3. 4.] | |
| Numerical operations | |
| -------------------- | |
| Numerical operations can be easily performed without worrying about missing | |
| values, dividing by zero, square roots of negative numbers, etc.:: | |
| >>> import numpy as np, numpy.ma as ma | |
| >>> x = ma.array([1., -1., 3., 4., 5., 6.], mask=[0,0,0,0,1,0]) | |
| >>> y = ma.array([1., 2., 0., 4., 5., 6.], mask=[0,0,0,0,0,1]) | |
| >>> print np.sqrt(x/y) | |
| [1.0 -- -- 1.0 -- --] | |
| Four values of the output are invalid: the first one comes from taking the | |
| square root of a negative number, the second from the division by zero, and | |
| the last two where the inputs were masked. | |
| Ignoring extreme values | |
| ----------------------- | |
| Let's consider an array ``d`` of random floats between 0 and 1. We wish to | |
| compute the average of the values of ``d`` while ignoring any data outside | |
| the range ``[0.1, 0.9]``:: | |
| >>> print ma.masked_outside(d, 0.1, 0.9).mean() | |