tmp
/
pip-install-ghxuqwgs
/numpy_78e94bf2b6094bf9a1f3d92042f9bf46
/doc
/source
/user
/whatisnumpy.rst
| ************** | |
| What is NumPy? | |
| ************** | |
| NumPy is the fundamental package for scientific computing in Python. | |
| It is a Python library that provides a multidimensional array object, | |
| various derived objects (such as masked arrays and matrices), and an | |
| assortment of routines for fast operations on arrays, including | |
| mathematical, logical, shape manipulation, sorting, selecting, I/O, | |
| discrete Fourier transforms, basic linear algebra, basic statistical | |
| operations, random simulation and much more. | |
| At the core of the NumPy package, is the `ndarray` object. This | |
| encapsulates *n*-dimensional arrays of homogeneous data types, with | |
| many operations being performed in compiled code for performance. | |
| There are several important differences between NumPy arrays and the | |
| standard Python sequences: | |
| - NumPy arrays have a fixed size at creation, unlike Python lists | |
| (which can grow dynamically). Changing the size of an `ndarray` will | |
| create a new array and delete the original. | |
| - The elements in a NumPy array are all required to be of the same | |
| data type, and thus will be the same size in memory. The exception: | |
| one can have arrays of (Python, including NumPy) objects, thereby | |
| allowing for arrays of different sized elements. | |
| - NumPy arrays facilitate advanced mathematical and other types of | |
| operations on large numbers of data. Typically, such operations are | |
| executed more efficiently and with less code than is possible using | |
| Python's built-in sequences. | |
| - A growing plethora of scientific and mathematical Python-based | |
| packages are using NumPy arrays; though these typically support | |
| Python-sequence input, they convert such input to NumPy arrays prior | |
| to processing, and they often output NumPy arrays. In other words, | |
| in order to efficiently use much (perhaps even most) of today's | |
| scientific/mathematical Python-based software, just knowing how to | |
| use Python's built-in sequence types is insufficient - one also | |
| needs to know how to use NumPy arrays. | |
| The points about sequence size and speed are particularly important in | |
| scientific computing. As a simple example, consider the case of | |
| multiplying each element in a 1-D sequence with the corresponding | |
| element in another sequence of the same length. If the data are | |
| stored in two Python lists, ``a`` and ``b``, we could iterate over | |
| each element:: | |
| c = [] | |
| for i in range(len(a)): | |
| c.append(a[i]*b[i]) | |
| This produces the correct answer, but if ``a`` and ``b`` each contain | |
| millions of numbers, we will pay the price for the inefficiencies of | |
| looping in Python. We could accomplish the same task much more | |
| quickly in C by writing (for clarity we neglect variable declarations | |
| and initializations, memory allocation, etc.) | |
| :: | |
| for (i = 0; i < rows; i++): { | |
| c[i] = a[i]*b[i]; | |
| } | |
| This saves all the overhead involved in interpreting the Python code | |
| and manipulating Python objects, but at the expense of the benefits | |
| gained from coding in Python. Furthermore, the coding work required | |
| increases with the dimensionality of our data. In the case of a 2-D | |
| array, for example, the C code (abridged as before) expands to | |
| :: | |
| for (i = 0; i < rows; i++): { | |
| for (j = 0; j < columns; j++): { | |
| c[i][j] = a[i][j]*b[i][j]; | |
| } | |
| } | |
| NumPy gives us the best of both worlds: element-by-element operations | |
| are the "default mode" when an `ndarray` is involved, but the | |
| element-by-element operation is speedily executed by pre-compiled C | |
| code. In NumPy | |
| :: | |
| c = a * b | |
| does what the earlier examples do, at near-C speeds, but with the code | |
| simplicity we expect from something based on Python. Indeed, the NumPy | |
| idiom is even simpler! This last example illustrates two of NumPy's | |
| features which are the basis of much of its power: vectorization and | |
| broadcasting. | |
| Vectorization describes the absence of any explicit looping, indexing, | |
| etc., in the code - these things are taking place, of course, just | |
| "behind the scenes" in optimized, pre-compiled C code. Vectorized | |
| code has many advantages, among which are: | |
| - vectorized code is more concise and easier to read | |
| - fewer lines of code generally means fewer bugs | |
| - the code more closely resembles standard mathematical notation | |
| (making it easier, typically, to correctly code mathematical | |
| constructs) | |
| - vectorization results in more "Pythonic" code. Without | |
| vectorization, our code would be littered with inefficient and | |
| difficult to read ``for`` loops. | |
| Broadcasting is the term used to describe the implicit | |
| element-by-element behavior of operations; generally speaking, in | |
| NumPy all operations, not just arithmetic operations, but | |
| logical, bit-wise, functional, etc., behave in this implicit | |
| element-by-element fashion, i.e., they broadcast. Moreover, in the | |
| example above, ``a`` and ``b`` could be multidimensional arrays of the | |
| same shape, or a scalar and an array, or even two arrays of with | |
| different shapes, provided that the smaller array is "expandable" to | |
| the shape of the larger in such a way that the resulting broadcast is | |
| unambiguous. For detailed "rules" of broadcasting see | |
| `numpy.doc.broadcasting`. | |
| NumPy fully supports an object-oriented approach, starting, once | |
| again, with `ndarray`. For example, `ndarray` is a class, possessing | |
| numerous methods and attributes. Many of its methods mirror | |
| functions in the outer-most NumPy namespace, giving the programmer | |
| complete freedom to code in whichever paradigm she prefers and/or | |
| which seems most appropriate to the task at hand. | |