tmp
/
pip-install-ghxuqwgs
/numpy_78e94bf2b6094bf9a1f3d92042f9bf46
/doc
/source
/user
/c-info.beyond-basics.rst
| ***************** | |
| Beyond the Basics | |
| ***************** | |
| | The voyage of discovery is not in seeking new landscapes but in having | |
| | new eyes. | |
| | --- *Marcel Proust* | |
| | Discovery is seeing what everyone else has seen and thinking what no | |
| | one else has thought. | |
| | --- *Albert Szent-Gyorgi* | |
| Iterating over elements in the array | |
| ==================================== | |
| .. _`sec:array_iterator`: | |
| Basic Iteration | |
| --------------- | |
| One common algorithmic requirement is to be able to walk over all | |
| elements in a multidimensional array. The array iterator object makes | |
| this easy to do in a generic way that works for arrays of any | |
| dimension. Naturally, if you know the number of dimensions you will be | |
| using, then you can always write nested for loops to accomplish the | |
| iteration. If, however, you want to write code that works with any | |
| number of dimensions, then you can make use of the array iterator. An | |
| array iterator object is returned when accessing the .flat attribute | |
| of an array. | |
| .. index:: | |
| single: array iterator | |
| Basic usage is to call :cfunc:`PyArray_IterNew` ( ``array`` ) where array | |
| is an ndarray object (or one of its sub-classes). The returned object | |
| is an array-iterator object (the same object returned by the .flat | |
| attribute of the ndarray). This object is usually cast to | |
| PyArrayIterObject* so that its members can be accessed. The only | |
| members that are needed are ``iter->size`` which contains the total | |
| size of the array, ``iter->index``, which contains the current 1-d | |
| index into the array, and ``iter->dataptr`` which is a pointer to the | |
| data for the current element of the array. Sometimes it is also | |
| useful to access ``iter->ao`` which is a pointer to the underlying | |
| ndarray object. | |
| After processing data at the current element of the array, the next | |
| element of the array can be obtained using the macro | |
| :cfunc:`PyArray_ITER_NEXT` ( ``iter`` ). The iteration always proceeds in a | |
| C-style contiguous fashion (last index varying the fastest). The | |
| :cfunc:`PyArray_ITER_GOTO` ( ``iter``, ``destination`` ) can be used to | |
| jump to a particular point in the array, where ``destination`` is an | |
| array of npy_intp data-type with space to handle at least the number | |
| of dimensions in the underlying array. Occasionally it is useful to | |
| use :cfunc:`PyArray_ITER_GOTO1D` ( ``iter``, ``index`` ) which will jump | |
| to the 1-d index given by the value of ``index``. The most common | |
| usage, however, is given in the following example. | |
| .. code-block:: c | |
| PyObject *obj; /* assumed to be some ndarray object */ | |
| PyArrayIterObject *iter; | |
| ... | |
| iter = (PyArrayIterObject *)PyArray_IterNew(obj); | |
| if (iter == NULL) goto fail; /* Assume fail has clean-up code */ | |
| while (iter->index < iter->size) { | |
| /* do something with the data at it->dataptr */ | |
| PyArray_ITER_NEXT(it); | |
| } | |
| ... | |
| You can also use :cfunc:`PyArrayIter_Check` ( ``obj`` ) to ensure you have | |
| an iterator object and :cfunc:`PyArray_ITER_RESET` ( ``iter`` ) to reset an | |
| iterator object back to the beginning of the array. | |
| It should be emphasized at this point that you may not need the array | |
| iterator if your array is already contiguous (using an array iterator | |
| will work but will be slower than the fastest code you could write). | |
| The major purpose of array iterators is to encapsulate iteration over | |
| N-dimensional arrays with arbitrary strides. They are used in many, | |
| many places in the NumPy source code itself. If you already know your | |
| array is contiguous (Fortran or C), then simply adding the element- | |
| size to a running pointer variable will step you through the array | |
| very efficiently. In other words, code like this will probably be | |
| faster for you in the contiguous case (assuming doubles). | |
| .. code-block:: c | |
| npy_intp size; | |
| double *dptr; /* could make this any variable type */ | |
| size = PyArray_SIZE(obj); | |
| dptr = PyArray_DATA(obj); | |
| while(size--) { | |
| /* do something with the data at dptr */ | |
| dptr++; | |
| } | |
| Iterating over all but one axis | |
| ------------------------------- | |
| A common algorithm is to loop over all elements of an array and | |
| perform some function with each element by issuing a function call. As | |
| function calls can be time consuming, one way to speed up this kind of | |
| algorithm is to write the function so it takes a vector of data and | |
| then write the iteration so the function call is performed for an | |
| entire dimension of data at a time. This increases the amount of work | |
| done per function call, thereby reducing the function-call over-head | |
| to a small(er) fraction of the total time. Even if the interior of the | |
| loop is performed without a function call it can be advantageous to | |
| perform the inner loop over the dimension with the highest number of | |
| elements to take advantage of speed enhancements available on micro- | |
| processors that use pipelining to enhance fundmental operations. | |
| The :cfunc:`PyArray_IterAllButAxis` ( ``array``, ``&dim`` ) constructs an | |
| iterator object that is modified so that it will not iterate over the | |
| dimension indicated by dim. The only restriction on this iterator | |
| object, is that the :cfunc:`PyArray_Iter_GOTO1D` ( ``it``, ``ind`` ) macro | |
| cannot be used (thus flat indexing won't work either if you pass this | |
| object back to Python --- so you shouldn't do this). Note that the | |
| returned object from this routine is still usually cast to | |
| PyArrayIterObject \*. All that's been done is to modify the strides | |
| and dimensions of the returned iterator to simulate iterating over | |
| array[...,0,...] where 0 is placed on the | |
| :math:`\textrm{dim}^{\textrm{th}}` dimension. If dim is negative, then | |
| the dimension with the largest axis is found and used. | |
| Iterating over multiple arrays | |
| ------------------------------ | |
| Very often, it is desireable to iterate over several arrays at the | |
| same time. The universal functions are an example of this kind of | |
| behavior. If all you want to do is iterate over arrays with the same | |
| shape, then simply creating several iterator objects is the standard | |
| procedure. For example, the following code iterates over two arrays | |
| assumed to be the same shape and size (actually obj1 just has to have | |
| at least as many total elements as does obj2): | |
| .. code-block:: c | |
| /* It is already assumed that obj1 and obj2 | |
| are ndarrays of the same shape and size. | |
| */ | |
| iter1 = (PyArrayIterObject *)PyArray_IterNew(obj1); | |
| if (iter1 == NULL) goto fail; | |
| iter2 = (PyArrayIterObject *)PyArray_IterNew(obj2); | |
| if (iter2 == NULL) goto fail; /* assume iter1 is DECREF'd at fail */ | |
| while (iter2->index < iter2->size) { | |
| /* process with iter1->dataptr and iter2->dataptr */ | |
| PyArray_ITER_NEXT(iter1); | |
| PyArray_ITER_NEXT(iter2); | |
| } | |
| Broadcasting over multiple arrays | |
| --------------------------------- | |
| .. index:: | |
| single: broadcasting | |
| When multiple arrays are involved in an operation, you may want to use the | |
| same broadcasting rules that the math operations (*i.e.* the ufuncs) use. | |
| This can be done easily using the :ctype:`PyArrayMultiIterObject`. This is | |
| the object returned from the Python command numpy.broadcast and it is almost | |
| as easy to use from C. The function | |
| :cfunc:`PyArray_MultiIterNew` ( ``n``, ``...`` ) is used (with ``n`` input | |
| objects in place of ``...`` ). The input objects can be arrays or anything | |
| that can be converted into an array. A pointer to a PyArrayMultiIterObject is | |
| returned. Broadcasting has already been accomplished which adjusts the | |
| iterators so that all that needs to be done to advance to the next element in | |
| each array is for PyArray_ITER_NEXT to be called for each of the inputs. This | |
| incrementing is automatically performed by | |
| :cfunc:`PyArray_MultiIter_NEXT` ( ``obj`` ) macro (which can handle a | |
| multiterator ``obj`` as either a :ctype:`PyArrayMultiObject *` or a | |
| :ctype:`PyObject *`). The data from input number ``i`` is available using | |
| :cfunc:`PyArray_MultiIter_DATA` ( ``obj``, ``i`` ) and the total (broadcasted) | |
| size as :cfunc:`PyArray_MultiIter_SIZE` ( ``obj``). An example of using this | |
| feature follows. | |
| .. code-block:: c | |
| mobj = PyArray_MultiIterNew(2, obj1, obj2); | |
| size = PyArray_MultiIter_SIZE(obj); | |
| while(size--) { | |
| ptr1 = PyArray_MultiIter_DATA(mobj, 0); | |
| ptr2 = PyArray_MultiIter_DATA(mobj, 1); | |
| /* code using contents of ptr1 and ptr2 */ | |
| PyArray_MultiIter_NEXT(mobj); | |
| } | |
| The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to | |
| take a multi-iterator object and adjust all the iterators so that | |
| iteration does not take place over the largest dimension (it makes | |
| that dimension of size 1). The code being looped over that makes use | |
| of the pointers will very-likely also need the strides data for each | |
| of the iterators. This information is stored in | |
| multi->iters[i]->strides. | |
| .. index:: | |
| single: array iterator | |
| There are several examples of using the multi-iterator in the NumPy | |
| source code as it makes N-dimensional broadcasting-code very simple to | |
| write. Browse the source for more examples. | |
| .. _user.user-defined-data-types: | |
| User-defined data-types | |
| ======================= | |
| NumPy comes with 24 builtin data-types. While this covers a large | |
| majority of possible use cases, it is conceivable that a user may have | |
| a need for an additional data-type. There is some support for adding | |
| an additional data-type into the NumPy system. This additional data- | |
| type will behave much like a regular data-type except ufuncs must have | |
| 1-d loops registered to handle it separately. Also checking for | |
| whether or not other data-types can be cast "safely" to and from this | |
| new type or not will always return "can cast" unless you also register | |
| which types your new data-type can be cast to and from. Adding | |
| data-types is one of the less well-tested areas for NumPy 1.0, so | |
| there may be bugs remaining in the approach. Only add a new data-type | |
| if you can't do what you want to do using the OBJECT or VOID | |
| data-types that are already available. As an example of what I | |
| consider a useful application of the ability to add data-types is the | |
| possibility of adding a data-type of arbitrary precision floats to | |
| NumPy. | |
| .. index:: | |
| pair: dtype; adding new | |
| Adding the new data-type | |
| ------------------------ | |
| To begin to make use of the new data-type, you need to first define a | |
| new Python type to hold the scalars of your new data-type. It should | |
| be acceptable to inherit from one of the array scalars if your new | |
| type has a binary compatible layout. This will allow your new data | |
| type to have the methods and attributes of array scalars. New data- | |
| types must have a fixed memory size (if you want to define a data-type | |
| that needs a flexible representation, like a variable-precision | |
| number, then use a pointer to the object as the data-type). The memory | |
| layout of the object structure for the new Python type must be | |
| PyObject_HEAD followed by the fixed-size memory needed for the data- | |
| type. For example, a suitable structure for the new Python type is: | |
| .. code-block:: c | |
| typedef struct { | |
| PyObject_HEAD; | |
| some_data_type obval; | |
| /* the name can be whatever you want */ | |
| } PySomeDataTypeObject; | |
| After you have defined a new Python type object, you must then define | |
| a new :ctype:`PyArray_Descr` structure whose typeobject member will contain a | |
| pointer to the data-type you've just defined. In addition, the | |
| required functions in the ".f" member must be defined: nonzero, | |
| copyswap, copyswapn, setitem, getitem, and cast. The more functions in | |
| the ".f" member you define, however, the more useful the new data-type | |
| will be. It is very important to intialize unused functions to NULL. | |
| This can be achieved using :cfunc:`PyArray_InitArrFuncs` (f). | |
| Once a new :ctype:`PyArray_Descr` structure is created and filled with the | |
| needed information and useful functions you call | |
| :cfunc:`PyArray_RegisterDataType` (new_descr). The return value from this | |
| call is an integer providing you with a unique type_number that | |
| specifies your data-type. This type number should be stored and made | |
| available by your module so that other modules can use it to recognize | |
| your data-type (the other mechanism for finding a user-defined | |
| data-type number is to search based on the name of the type-object | |
| associated with the data-type using :cfunc:`PyArray_TypeNumFromName` ). | |
| Registering a casting function | |
| ------------------------------ | |
| You may want to allow builtin (and other user-defined) data-types to | |
| be cast automatically to your data-type. In order to make this | |
| possible, you must register a casting function with the data-type you | |
| want to be able to cast from. This requires writing low-level casting | |
| functions for each conversion you want to support and then registering | |
| these functions with the data-type descriptor. A low-level casting | |
| function has the signature. | |
| .. cfunction:: void castfunc( void* from, void* to, npy_intp n, void* fromarr, | |
| void* toarr) | |
| Cast ``n`` elements ``from`` one type ``to`` another. The data to | |
| cast from is in a contiguous, correctly-swapped and aligned chunk | |
| of memory pointed to by from. The buffer to cast to is also | |
| contiguous, correctly-swapped and aligned. The fromarr and toarr | |
| arguments should only be used for flexible-element-sized arrays | |
| (string, unicode, void). | |
| An example castfunc is: | |
| .. code-block:: c | |
| static void | |
| double_to_float(double *from, float* to, npy_intp n, | |
| void* ig1, void* ig2); | |
| while (n--) { | |
| (*to++) = (double) *(from++); | |
| } | |
| This could then be registered to convert doubles to floats using the | |
| code: | |
| .. code-block:: c | |
| doub = PyArray_DescrFromType(NPY_DOUBLE); | |
| PyArray_RegisterCastFunc(doub, NPY_FLOAT, | |
| (PyArray_VectorUnaryFunc *)double_to_float); | |
| Py_DECREF(doub); | |
| Registering coercion rules | |
| -------------------------- | |
| By default, all user-defined data-types are not presumed to be safely | |
| castable to any builtin data-types. In addition builtin data-types are | |
| not presumed to be safely castable to user-defined data-types. This | |
| situation limits the ability of user-defined data-types to participate | |
| in the coercion system used by ufuncs and other times when automatic | |
| coercion takes place in NumPy. This can be changed by registering | |
| data-types as safely castable from a particlar data-type object. The | |
| function :cfunc:`PyArray_RegisterCanCast` (from_descr, totype_number, | |
| scalarkind) should be used to specify that the data-type object | |
| from_descr can be cast to the data-type with type number | |
| totype_number. If you are not trying to alter scalar coercion rules, | |
| then use :cdata:`NPY_NOSCALAR` for the scalarkind argument. | |
| If you want to allow your new data-type to also be able to share in | |
| the scalar coercion rules, then you need to specify the scalarkind | |
| function in the data-type object's ".f" member to return the kind of | |
| scalar the new data-type should be seen as (the value of the scalar is | |
| available to that function). Then, you can register data-types that | |
| can be cast to separately for each scalar kind that may be returned | |
| from your user-defined data-type. If you don't register scalar | |
| coercion handling, then all of your user-defined data-types will be | |
| seen as :cdata:`NPY_NOSCALAR`. | |
| Registering a ufunc loop | |
| ------------------------ | |
| You may also want to register low-level ufunc loops for your data-type | |
| so that an ndarray of your data-type can have math applied to it | |
| seamlessly. Registering a new loop with exactly the same arg_types | |
| signature, silently replaces any previously registered loops for that | |
| data-type. | |
| Before you can register a 1-d loop for a ufunc, the ufunc must be | |
| previously created. Then you call :cfunc:`PyUFunc_RegisterLoopForType` | |
| (...) with the information needed for the loop. The return value of | |
| this function is ``0`` if the process was successful and ``-1`` with | |
| an error condition set if it was not successful. | |
| .. cfunction:: int PyUFunc_RegisterLoopForType( PyUFuncObject* ufunc, | |
| int usertype, PyUFuncGenericFunction function, int* arg_types, void* data) | |
| *ufunc* | |
| The ufunc to attach this loop to. | |
| *usertype* | |
| The user-defined type this loop should be indexed under. This number | |
| must be a user-defined type or an error occurs. | |
| *function* | |
| The ufunc inner 1-d loop. This function must have the signature as | |
| explained in Section `3 <#sec-creating-a-new>`__ . | |
| *arg_types* | |
| (optional) If given, this should contain an array of integers of at | |
| least size ufunc.nargs containing the data-types expected by the loop | |
| function. The data will be copied into a NumPy-managed structure so | |
| the memory for this argument should be deleted after calling this | |
| function. If this is NULL, then it will be assumed that all data-types | |
| are of type usertype. | |
| *data* | |
| (optional) Specify any optional data needed by the function which will | |
| be passed when the function is called. | |
| .. index:: | |
| pair: dtype; adding new | |
| Subtyping the ndarray in C | |
| ========================== | |
| One of the lesser-used features that has been lurking in Python since | |
| 2.2 is the ability to sub-class types in C. This facility is one of | |
| the important reasons for basing NumPy off of the Numeric code-base | |
| which was already in C. A sub-type in C allows much more flexibility | |
| with regards to memory management. Sub-typing in C is not difficult | |
| even if you have only a rudimentary understanding of how to create new | |
| types for Python. While it is easiest to sub-type from a single parent | |
| type, sub-typing from multiple parent types is also possible. Multiple | |
| inheritence in C is generally less useful than it is in Python because | |
| a restriction on Python sub-types is that they have a binary | |
| compatible memory layout. Perhaps for this reason, it is somewhat | |
| easier to sub-type from a single parent type. | |
| .. index:: | |
| pair: ndarray; subtyping | |
| All C-structures corresponding to Python objects must begin with | |
| :cmacro:`PyObject_HEAD` (or :cmacro:`PyObject_VAR_HEAD`). In the same | |
| way, any sub-type must have a C-structure that begins with exactly the | |
| same memory layout as the parent type (or all of the parent types in | |
| the case of multiple-inheritance). The reason for this is that Python | |
| may attempt to access a member of the sub-type structure as if it had | |
| the parent structure ( *i.e.* it will cast a given pointer to a | |
| pointer to the parent structure and then dereference one of it's | |
| members). If the memory layouts are not compatible, then this attempt | |
| will cause unpredictable behavior (eventually leading to a memory | |
| violation and program crash). | |
| One of the elements in :cmacro:`PyObject_HEAD` is a pointer to a | |
| type-object structure. A new Python type is created by creating a new | |
| type-object structure and populating it with functions and pointers to | |
| describe the desired behavior of the type. Typically, a new | |
| C-structure is also created to contain the instance-specific | |
| information needed for each object of the type as well. For example, | |
| :cdata:`&PyArray_Type` is a pointer to the type-object table for the ndarray | |
| while a :ctype:`PyArrayObject *` variable is a pointer to a particular instance | |
| of an ndarray (one of the members of the ndarray structure is, in | |
| turn, a pointer to the type- object table :cdata:`&PyArray_Type`). Finally | |
| :cfunc:`PyType_Ready` (<pointer_to_type_object>) must be called for | |
| every new Python type. | |
| Creating sub-types | |
| ------------------ | |
| To create a sub-type, a similar proceedure must be followed except | |
| only behaviors that are different require new entries in the type- | |
| object structure. All other entires can be NULL and will be filled in | |
| by :cfunc:`PyType_Ready` with appropriate functions from the parent | |
| type(s). In particular, to create a sub-type in C follow these steps: | |
| 1. If needed create a new C-structure to handle each instance of your | |
| type. A typical C-structure would be: | |
| .. code-block:: c | |
| typedef _new_struct { | |
| PyArrayObject base; | |
| /* new things here */ | |
| } NewArrayObject; | |
| Notice that the full PyArrayObject is used as the first entry in order | |
| to ensure that the binary layout of instances of the new type is | |
| identical to the PyArrayObject. | |
| 2. Fill in a new Python type-object structure with pointers to new | |
| functions that will over-ride the default behavior while leaving any | |
| function that should remain the same unfilled (or NULL). The tp_name | |
| element should be different. | |
| 3. Fill in the tp_base member of the new type-object structure with a | |
| pointer to the (main) parent type object. For multiple-inheritance, | |
| also fill in the tp_bases member with a tuple containing all of the | |
| parent objects in the order they should be used to define inheritance. | |
| Remember, all parent-types must have the same C-structure for multiple | |
| inheritance to work properly. | |
| 4. Call :cfunc:`PyType_Ready` (<pointer_to_new_type>). If this function | |
| returns a negative number, a failure occurred and the type is not | |
| initialized. Otherwise, the type is ready to be used. It is | |
| generally important to place a reference to the new type into the | |
| module dictionary so it can be accessed from Python. | |
| More information on creating sub-types in C can be learned by reading | |
| PEP 253 (available at http://www.python.org/dev/peps/pep-0253). | |
| Specific features of ndarray sub-typing | |
| --------------------------------------- | |
| Some special methods and attributes are used by arrays in order to | |
| facilitate the interoperation of sub-types with the base ndarray type. | |
| The __array_finalize\__ method | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| .. attribute:: ndarray.__array_finalize__ | |
| Several array-creation functions of the ndarray allow | |
| specification of a particular sub-type to be created. This allows | |
| sub-types to be handled seamlessly in many routines. When a | |
| sub-type is created in such a fashion, however, neither the | |
| __new_\_ method nor the __init\__ method gets called. Instead, the | |
| sub-type is allocated and the appropriate instance-structure | |
| members are filled in. Finally, the :obj:`__array_finalize__` | |
| attribute is looked-up in the object dictionary. If it is present | |
| and not None, then it can be either a CObject containing a pointer | |
| to a :cfunc:`PyArray_FinalizeFunc` or it can be a method taking a | |
| single argument (which could be None). | |
| If the :obj:`__array_finalize__` attribute is a CObject, then the pointer | |
| must be a pointer to a function with the signature: | |
| .. code-block:: c | |
| (int) (PyArrayObject *, PyObject *) | |
| The first argument is the newly created sub-type. The second argument | |
| (if not NULL) is the "parent" array (if the array was created using | |
| slicing or some other operation where a clearly-distinguishable parent | |
| is present). This routine can do anything it wants to. It should | |
| return a -1 on error and 0 otherwise. | |
| If the :obj:`__array_finalize__` attribute is not None nor a CObject, | |
| then it must be a Python method that takes the parent array as an | |
| argument (which could be None if there is no parent), and returns | |
| nothing. Errors in this method will be caught and handled. | |
| The __array_priority\__ attribute | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| .. attribute:: ndarray.__array_priority__ | |
| This attribute allows simple but flexible determination of which sub- | |
| type should be considered "primary" when an operation involving two or | |
| more sub-types arises. In operations where different sub-types are | |
| being used, the sub-type with the largest :obj:`__array_priority__` | |
| attribute will determine the sub-type of the output(s). If two sub- | |
| types have the same :obj:`__array_priority__` then the sub-type of the | |
| first argument determines the output. The default | |
| :obj:`__array_priority__` attribute returns a value of 0.0 for the base | |
| ndarray type and 1.0 for a sub-type. This attribute can also be | |
| defined by objects that are not sub-types of the ndarray and can be | |
| used to determine which :obj:`__array_wrap__` method should be called for | |
| the return output. | |
| The __array_wrap\__ method | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| .. attribute:: ndarray.__array_wrap__ | |
| Any class or type can define this method which should take an ndarray | |
| argument and return an instance of the type. It can be seen as the | |
| opposite of the :obj:`__array__` method. This method is used by the | |
| ufuncs (and other NumPy functions) to allow other objects to pass | |
| through. For Python >2.4, it can also be used to write a decorator | |
| that converts a function that works only with ndarrays to one that | |
| works with any type with :obj:`__array__` and :obj:`__array_wrap__` methods. | |
| .. index:: | |
| pair: ndarray; subtyping | |