API¶
Create and Store Arrays¶
| from_array(x, chunks[, name, lock]) | Create dask array from something that looks like an array |
| store(sources, targets, **kwargs) | Store dask arrays in array-like objects, overwrite data in target |
| Array.to_hdf5(filename, datapath, **kwargs) | Store array in HDF5 file |
Specialized Functions for Dask.Array¶
| Array.map_blocks(func[, chunks, dtype, name]) | Map a function across all blocks of a dask array |
| Array.map_overlap(func, depth[, boundary, trim]) | Map a function over blocks of the array with some overlap |
| topk(k, x) | The top k elements of an array |
| coarsen(reduction, x, axes[, trim_excess]) | Coarsen array by applying reduction to fixed size neighborhoods |
| stack(seq[, axis]) | Stack arrays along a new axis |
| concatenate(seq[, axis]) | Concatenate arrays along an existing axis |
Array Methods¶
- class dask.array.core.Array(dask, name, chunks, dtype=None, shape=None)¶
Parallel Array
Parameters: dask : dict
Task dependency graph
name : string
Name of array in dask
shape : tuple of ints
Shape of the entire array
chunks: iterable of tuples
block sizes along each dimension
Attributes
T chunks dtype imag nbytes Number of bytes in array ndim numblocks real shape size Number of elements in array vindex Methods
all([axis, keepdims]) Test whether all array elements along a given axis evaluate to True. any([axis, keepdims]) Test whether any array element along a given axis evaluates to True. argmax([axis]) Indices of the maximum values along an axis. argmin([axis]) Return the indices of the minimum values along an axis. astype(dtype, **kwargs) Copy of the array, cast to a specified type cache([store]) Evaluate and cache array compute(**kwargs) conj() dot(a, b[, out]) Dot product of two arrays. map_blocks(func[, chunks, dtype, name]) Map a function across all blocks of a dask array map_overlap(func, depth[, boundary, trim]) Map a function over blocks of the array with some overlap max([axis, keepdims]) Return the maximum of an array or maximum along an axis. mean([axis, dtype, keepdims]) Compute the arithmetic mean along the specified axis. min([axis, keepdims]) Return the minimum of an array or minimum along an axis. moment(order[, axis, dtype, keepdims, ddof]) Calculate the nth centralized moment. prod([axis, dtype, keepdims]) Return the product of array elements over a given axis. rechunk(chunks) squeeze() Remove single-dimensional entries from the shape of an array. std([axis, dtype, keepdims, ddof]) Compute the standard deviation along the specified axis. store(target, **kwargs) Store dask arrays in array-like objects, overwrite data in target sum([axis, dtype, keepdims]) Sum of array elements over a given axis. to_hdf5(filename, datapath, **kwargs) Store array in HDF5 file topk(k) The top k elements of an array transpose([axes]) Permute the dimensions of an array. var([axis, dtype, keepdims, ddof]) Compute the variance along the specified axis. visualize([filename, optimize_graph]) vnorm([ord, axis, keepdims]) Vector norm - all(axis=None, keepdims=False)¶
Test whether all array elements along a given axis evaluate to True.
Parameters: a : array_like
Input array or object that can be converted to an array.
axis : None or int or tuple of ints, optional
Axis or axes along which a logical AND reduction is performed. The default (axis = None) is perform a logical OR over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.
out : ndarray, optional
Alternate output array in which to place the result. It must have the same shape as the expected output and its type is preserved (e.g., if dtype(out) is float, the result will consist of 0.0’s and 1.0’s). See doc.ufuncs (Section “Output arguments”) for more details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: all : ndarray, bool
A new boolean or array is returned unless out is specified, in which case a reference to out is returned.
See also
- ndarray.all
- equivalent method
- any
- Test whether any element along a given axis evaluates to True.
Notes
Not a Number (NaN), positive infinity and negative infinity evaluate to True because these are not equal to zero.
Examples
>>> np.all([[True,False],[True,True]]) False
>>> np.all([[True,False],[True,True]], axis=0) array([ True, False], dtype=bool)
>>> np.all([-1, 4, 5]) True
>>> np.all([1.0, np.nan]) True
>>> o=np.array([False]) >>> z=np.all([-1, 4, 5], out=o) >>> id(z), id(o), z (28293632, 28293632, array([ True], dtype=bool))
- any(axis=None, keepdims=False)¶
Test whether any array element along a given axis evaluates to True.
Returns single boolean unless axis is not None
Parameters: a : array_like
Input array or object that can be converted to an array.
axis : None or int or tuple of ints, optional
Axis or axes along which a logical OR reduction is performed. The default (axis = None) is perform a logical OR over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.
out : ndarray, optional
Alternate output array in which to place the result. It must have the same shape as the expected output and its type is preserved (e.g., if it is of type float, then it will remain so, returning 1.0 for True and 0.0 for False, regardless of the type of a). See doc.ufuncs (Section “Output arguments”) for details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: any : bool or ndarray
A new boolean or ndarray is returned unless out is specified, in which case a reference to out is returned.
See also
- ndarray.any
- equivalent method
- all
- Test whether all elements along a given axis evaluate to True.
Notes
Not a Number (NaN), positive infinity and negative infinity evaluate to True because these are not equal to zero.
Examples
>>> np.any([[True, False], [True, True]]) True
>>> np.any([[True, False], [False, False]], axis=0) array([ True, False], dtype=bool)
>>> np.any([-1, 0, 5]) True
>>> np.any(np.nan) True
>>> o=np.array([False]) >>> z=np.any([-1, 4, 5], out=o) >>> z, o (array([ True], dtype=bool), array([ True], dtype=bool)) >>> # Check now that z is a reference to o >>> z is o True >>> id(z), id(o) # identity of z and o (191614240, 191614240)
- argmax(axis=None)¶
Indices of the maximum values along an axis.
Parameters: a : array_like
Input array.
axis : int, optional
By default, the index is into the flattened array, otherwise along the specified axis.
Returns: index_array : ndarray of ints
Array of indices into the array. It has the same shape as a.shape with the dimension along axis removed.
See also
ndarray.argmax, argmin
- amax
- The maximum value along a given axis.
- unravel_index
- Convert a flat index into an index tuple.
Notes
In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.
Examples
>>> a = np.arange(6).reshape(2,3) >>> a array([[0, 1, 2], [3, 4, 5]]) >>> np.argmax(a) 5 >>> np.argmax(a, axis=0) array([1, 1, 1]) >>> np.argmax(a, axis=1) array([2, 2])
>>> b = np.arange(6) >>> b[1] = 5 >>> b array([0, 5, 2, 3, 4, 5]) >>> np.argmax(b) # Only the first occurrence is returned. 1
- argmin(axis=None)¶
Return the indices of the minimum values along an axis.
See also
- argmax
- Similar function. Please refer to numpy.argmax for detailed documentation.
- astype(dtype, **kwargs)¶
Copy of the array, cast to a specified type
- cache(store=None, **kwargs)¶
Evaluate and cache array
Parameters: store: MutableMapping or ndarray-like
Place to put computed and cached chunks
kwargs:
Keyword arguments to pass on to get function for scheduling
This triggers evaluation and store the result in either
1. An ndarray object supporting setitem (see da.store)
2. A MutableMapping like a dict or chest
It then returns a new dask array that points to this store.
This returns a semantically equivalent dask array.
>>> import dask.array as da
>>> x = da.arange(5, chunks=2)
>>> y = 2*x + 1
>>> z = y.cache() # triggers computation
>>> y.compute() # Does entire computation
array([1, 3, 5, 7, 9])
>>> z.compute() # Just pulls from store
array([1, 3, 5, 7, 9])
You might base a cache off of an array like a numpy array or
h5py.Dataset.
>>> cache = np.empty(5, dtype=x.dtype)
>>> z = y.cache(store=cache)
>>> cache
array([1, 3, 5, 7, 9])
Or one might use a MutableMapping like a dict or chest
>>> cache = dict()
>>> z = y.cache(store=cache)
>>> cache # doctest: +SKIP
{(‘x’, 0): array([1, 3]),
(‘x’, 1): array([5, 7]), (‘x’, 2): array([9])}
- dot(a, b, out=None)¶
Dot product of two arrays.
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
Parameters: a : array_like
First argument.
b : array_like
Second argument.
out : ndarray, optional
Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.
Returns: output : ndarray
Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned. If out is given, then it is returned.
Raises: ValueError
If the last dimension of a is not the same size as the second-to-last dimension of b.
See also
- vdot
- Complex-conjugating dot product.
- tensordot
- Sum products over arbitrary axes.
- einsum
- Einstein summation convention.
Examples
>>> np.dot(3, 4) 12
Neither argument is complex-conjugated:
>>> np.dot([2j, 3j], [2j, 3j]) (-13+0j)
For 2-D arrays it’s the matrix product:
>>> a = [[1, 0], [0, 1]] >>> b = [[4, 1], [2, 2]] >>> np.dot(a, b) array([[4, 1], [2, 2]])
>>> a = np.arange(3*4*5*6).reshape((3,4,5,6)) >>> b = np.arange(3*4*5*6)[::-1].reshape((5,4,6,3)) >>> np.dot(a, b)[2,3,2,1,2,2] 499128 >>> sum(a[2,3,2,:] * b[1,2,:,2]) 499128
- map_blocks(func, chunks=None, dtype=None, name=None)¶
Map a function across all blocks of a dask array
You must also specify the chunks of the resulting array. If you don’t then we assume that the resulting array has the same block structure as the input.
>>> import dask.array as da >>> x = da.arange(6, chunks=3)
>>> x.map_blocks(lambda x: x * 2).compute() array([ 0, 2, 4, 6, 8, 10])
The da.map_blocks function can also accept multiple arrays
>>> d = da.arange(5, chunks=2) >>> e = da.arange(5, chunks=2)
>>> f = map_blocks(lambda a, b: a + b**2, d, e) >>> f.compute() array([ 0, 2, 6, 12, 20])
If function changes shape of the blocks then please provide chunks explicitly.
>>> y = x.map_blocks(lambda x: x[::2], chunks=((2, 2),))
Your block function can learn where in the array it is if it supports a block_id keyword argument. This will receive entries like (2, 0, 1), the position of the block in the dask array.
>>> def func(block, block_id=None): ... pass
You may specify the name of the resulting task in the graph with the optional name keyword argument.
>>> y = x.map_blocks(lambda x: x + 1, name='increment')
- map_overlap(func, depth, boundary=None, trim=True, **kwargs)¶
Map a function over blocks of the array with some overlap
We share neighboring zones between blocks of the array, then map a function, then trim away the neighboring strips.
Parameters: func: function
The function to apply to each extended block
depth: int, tuple, or dict
The number of cells that each block should share with its neighbors If a tuple or dict this can be different per axis
boundary: str, tuple, dict
how to handle the boundaries. Values include ‘reflect’, ‘periodic’, ‘nearest’, ‘none’, or any constant value like 0 or np.nan
trim: bool
Whether or not to trim the excess after the map function. Set this to false if your mapping function does this for you.
**kwargs:
Other keyword arguments valid in map_blocks
Examples
>>> x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1]) >>> x = from_array(x, chunks=5) >>> def derivative(x): ... return x - np.roll(x, 1)
>>> y = x.map_overlap(derivative, depth=1, boundary=0) >>> y.compute() array([ 1, 0, 1, 1, 0, 0, -1, -1, 0])
>>> import dask.array as da >>> x = np.arange(16).reshape((4, 4)) >>> d = da.from_array(x, chunks=(2, 2)) >>> d.map_overlap(lambda x: x + x.size, depth=1).compute() array([[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])
>>> func = lambda x: x + x.size >>> depth = {0: 1, 1: 1} >>> boundary = {0: 'reflect', 1: 'none'} >>> d.map_overlap(func, depth, boundary).compute() array([[ 12., 13., 14., 15.], [ 16., 17., 18., 19.], [ 20., 21., 22., 23.], [ 24., 25., 26., 27.]])
- max(axis=None, keepdims=False)¶
Return the maximum of an array or maximum along an axis.
Parameters: a : array_like
Input data.
axis : int, optional
Axis along which to operate. By default, flattened input is used.
out : ndarray, optional
Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. See doc.ufuncs (Section “Output arguments”) for more details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: amax : ndarray or scalar
Maximum of a. If axis is None, the result is a scalar value. If axis is given, the result is an array of dimension a.ndim - 1.
See also
- amin
- The minimum value of an array along a given axis, propagating any NaNs.
- nanmax
- The maximum value of an array along a given axis, ignoring any NaNs.
- maximum
- Element-wise maximum of two arrays, propagating any NaNs.
- fmax
- Element-wise maximum of two arrays, ignoring any NaNs.
- argmax
- Return the indices of the maximum values.
nanmin, minimum, fmin
Notes
NaN values are propagated, that is if at least one item is NaN, the corresponding max value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmax.
Don’t use amax for element-wise comparison of 2 arrays; when a.shape[0] is 2, maximum(a[0], a[1]) is faster than amax(a, axis=0).
Examples
>>> a = np.arange(4).reshape((2,2)) >>> a array([[0, 1], [2, 3]]) >>> np.amax(a) # Maximum of the flattened array 3 >>> np.amax(a, axis=0) # Maxima along the first axis array([2, 3]) >>> np.amax(a, axis=1) # Maxima along the second axis array([1, 3])
>>> b = np.arange(5, dtype=np.float) >>> b[2] = np.NaN >>> np.amax(b) nan >>> np.nanmax(b) 4.0
- mean(axis=None, dtype=None, keepdims=False)¶
Compute the arithmetic mean along the specified axis.
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.
Parameters: a : array_like
Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
axis : int, optional
Axis along which the means are computed. The default is to compute the mean of the flattened array.
dtype : data-type, optional
Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.
out : ndarray, optional
Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: m : ndarray, see dtype parameter above
If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.
Notes
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> np.mean(a) 2.5 >>> np.mean(a, axis=0) array([ 2., 3.]) >>> np.mean(a, axis=1) array([ 1.5, 3.5])
In single precision, mean can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.mean(a) 0.546875
Computing the mean in float64 is more accurate:
>>> np.mean(a, dtype=np.float64) 0.55000000074505806
- min(axis=None, keepdims=False)¶
Return the minimum of an array or minimum along an axis.
Parameters: a : array_like
Input data.
axis : int, optional
Axis along which to operate. By default, flattened input is used.
out : ndarray, optional
Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. See doc.ufuncs (Section “Output arguments”) for more details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: amin : ndarray or scalar
Minimum of a. If axis is None, the result is a scalar value. If axis is given, the result is an array of dimension a.ndim - 1.
See also
- amax
- The maximum value of an array along a given axis, propagating any NaNs.
- nanmin
- The minimum value of an array along a given axis, ignoring any NaNs.
- minimum
- Element-wise minimum of two arrays, propagating any NaNs.
- fmin
- Element-wise minimum of two arrays, ignoring any NaNs.
- argmin
- Return the indices of the minimum values.
nanmax, maximum, fmax
Notes
NaN values are propagated, that is if at least one item is NaN, the corresponding min value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmin.
Don’t use amin for element-wise comparison of 2 arrays; when a.shape[0] is 2, minimum(a[0], a[1]) is faster than amin(a, axis=0).
Examples
>>> a = np.arange(4).reshape((2,2)) >>> a array([[0, 1], [2, 3]]) >>> np.amin(a) # Minimum of the flattened array 0 >>> np.amin(a, axis=0) # Minima along the first axis array([0, 1]) >>> np.amin(a, axis=1) # Minima along the second axis array([0, 2])
>>> b = np.arange(5, dtype=np.float) >>> b[2] = np.NaN >>> np.amin(b) nan >>> np.nanmin(b) 0.0
- moment(order, axis=None, dtype=None, keepdims=False, ddof=0)¶
Calculate the nth centralized moment.
Parameters: order : int
Order of the moment that is returned, must be >= 2.
axis : int, optional
Axis along which the central moment is computed. The default is to compute the moment of the flattened array.
dtype : data-type, optional
Type to use in computing the moment. For arrays of integer type the default is float64; for arrays of float types it is the same as the array type.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.
ddof : int, optional
“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.
Returns: moment : ndarray
References
[R1] Pebay, Philippe (2008), “Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments” (PDF), Technical Report SAND2008-6212, Sandia National Laboratories
- nbytes¶
Number of bytes in array
- prod(axis=None, dtype=None, keepdims=False)¶
Return the product of array elements over a given axis.
Parameters: a : array_like
Input data.
axis : None or int or tuple of ints, optional
Axis or axes along which a product is performed. The default (axis = None) is perform a product over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a product is performed on multiple axes, instead of a single axis or all the axes as before.
dtype : data-type, optional
The data-type of the returned array, as well as of the accumulator in which the elements are multiplied. By default, if a is of integer type, dtype is the default platform integer. (Note: if the type of a is unsigned, then so is dtype.) Otherwise, the dtype is the same as that of a.
out : ndarray, optional
Alternative output array in which to place the result. It must have the same shape as the expected output, but the type of the output values will be cast if necessary.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: product_along_axis : ndarray, see dtype parameter above.
An array shaped as a but with the specified axis removed. Returns a reference to out if specified.
See also
- ndarray.prod
- equivalent method
- numpy.doc.ufuncs
- Section “Output arguments”
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow. That means that, on a 32-bit platform:
>>> x = np.array([536870910, 536870910, 536870910, 536870910]) >>> np.prod(x) #random 16
Examples
By default, calculate the product of all elements:
>>> np.prod([1.,2.]) 2.0
Even when the input array is two-dimensional:
>>> np.prod([[1.,2.],[3.,4.]]) 24.0
But we can also specify the axis over which to multiply:
>>> np.prod([[1.,2.],[3.,4.]], axis=1) array([ 2., 12.])
If the type of x is unsigned, then the output type is the unsigned platform integer:
>>> x = np.array([1, 2, 3], dtype=np.uint8) >>> np.prod(x).dtype == np.uint True
If x is of a signed integer type, then the output type is the default platform integer:
>>> x = np.array([1, 2, 3], dtype=np.int8) >>> np.prod(x).dtype == np.int True
- size¶
Number of elements in array
- squeeze()¶
Remove single-dimensional entries from the shape of an array.
Parameters: a : array_like
Input data.
axis : None or int or tuple of ints, optional
New in version 1.7.0.
Selects a subset of the single-dimensional entries in the shape. If an axis is selected with shape entry greater than one, an error is raised.
Returns: squeezed : ndarray
The input array, but with with all or a subset of the dimensions of length 1 removed. This is always a itself or a view into a.
Examples
>>> x = np.array([[[0], [1], [2]]]) >>> x.shape (1, 3, 1) >>> np.squeeze(x).shape (3,) >>> np.squeeze(x, axis=(2,)).shape (1, 3)
- std(axis=None, dtype=None, keepdims=False, ddof=0)¶
Compute the standard deviation along the specified axis.
Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.
Parameters: a : array_like
Calculate the standard deviation of these values.
axis : int, optional
Axis along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.
dtype : dtype, optional
Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type.
out : ndarray, optional
Alternative output array in which to place the result. It must have the same shape as the expected output but the type (of the calculated values) will be cast if necessary.
ddof : int, optional
Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: standard_deviation : ndarray, see dtype parameter above.
If out is None, return a new array containing the standard deviation, otherwise return a reference to the output array.
Notes
The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(abs(x - x.mean())**2)).
The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.
Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.
For floating-point input, the std is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> np.std(a) 1.1180339887498949 >>> np.std(a, axis=0) array([ 1., 1.]) >>> np.std(a, axis=1) array([ 0.5, 0.5])
In single precision, std() can be inaccurate:
>>> a = np.zeros((2,512*512), dtype=np.float32) >>> a[0,:] = 1.0 >>> a[1,:] = 0.1 >>> np.std(a) 0.45172946707416706
Computing the standard deviation in float64 is more accurate:
>>> np.std(a, dtype=np.float64) 0.44999999925552653
- store(target, **kwargs)¶
Store dask arrays in array-like objects, overwrite data in target
This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.
If your data fits in memory then you may prefer calling np.array(myarray) instead.
Parameters: sources: Array or iterable of Arrays
targets: array-like or iterable of array-likes
These should support setitem syntax target[10:20] = ...
Examples
>>> x = ...
>>> import h5py >>> f = h5py.File('myfile.hdf5') >>> dset = f.create_dataset('/data', shape=x.shape, ... chunks=x.chunks, ... dtype='f8')
>>> store(x, dset)
Alternatively store many arrays at the same time
>>> store([x, y, z], [dset1, dset2, dset3])
- sum(axis=None, dtype=None, keepdims=False)¶
Sum of array elements over a given axis.
Parameters: a : array_like
Elements to sum.
axis : None or int or tuple of ints, optional
Axis or axes along which a sum is performed. The default (axis = None) is perform a sum over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a sum is performed on multiple axes, instead of a single axis or all the axes as before.
dtype : dtype, optional
The type of the returned array and of the accumulator in which the elements are summed. By default, the dtype of a is used. An exception is when a has an integer type with less precision than the default platform integer. In that case, the default platform integer is used instead.
out : ndarray, optional
Array into which the output is placed. By default, a new array is created. If out is given, it must be of the appropriate shape (the shape of a with axis removed, i.e., numpy.delete(a.shape, axis)). Its type is preserved. See doc.ufuncs (Section “Output arguments”) for more details.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: sum_along_axis : ndarray
An array with the same shape as a, with the specified axis removed. If a is a 0-d array, or if axis is None, a scalar is returned. If an output array is specified, a reference to out is returned.
See also
- ndarray.sum
- Equivalent method.
- cumsum
- Cumulative sum of array elements.
- trapz
- Integration of array values using the composite trapezoidal rule.
mean, average
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
Examples
>>> np.sum([0.5, 1.5]) 2.0 >>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32) 1 >>> np.sum([[0, 1], [0, 5]]) 6 >>> np.sum([[0, 1], [0, 5]], axis=0) array([0, 6]) >>> np.sum([[0, 1], [0, 5]], axis=1) array([1, 5])
If the accumulator is too small, overflow occurs:
>>> np.ones(128, dtype=np.int8).sum(dtype=np.int8) -128
- to_hdf5(filename, datapath, **kwargs)¶
Store array in HDF5 file
>>> x.to_hdf5('myfile.hdf5', '/x')
Optionally provide arguments as though to h5py.File.create_dataset
>>> x.to_hdf5('myfile.hdf5', '/x', compression='lzf', shuffle=True)
See also
da.store, h5py.File.create_dataset
- topk(k)¶
The top k elements of an array
Returns the k greatest elements of the array in sorted order. Only works on arrays of a single dimension.
>>> x = np.array([5, 1, 3, 6]) >>> d = from_array(x, chunks=2) >>> d.topk(2).compute() array([6, 5])
Runs in near linear time, returns all results in a single chunk so all k elements must fit in memory.
- transpose(axes=None)¶
Permute the dimensions of an array.
Parameters: a : array_like
Input array.
axes : list of ints, optional
By default, reverse the dimensions, otherwise permute the axes according to the values given.
Returns: p : ndarray
a with its axes permuted. A view is returned whenever possible.
See also
rollaxis
Examples
>>> x = np.arange(4).reshape((2,2)) >>> x array([[0, 1], [2, 3]])
>>> np.transpose(x) array([[0, 2], [1, 3]])
>>> x = np.ones((1, 2, 3)) >>> np.transpose(x, (1, 0, 2)).shape (2, 1, 3)
- var(axis=None, dtype=None, keepdims=False, ddof=0)¶
Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
Parameters: a : array_like
Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.
axis : int, optional
Axis along which the variance is computed. The default is to compute the variance of the flattened array.
dtype : data-type, optional
Type to use in computing the variance. For arrays of integer type the default is float32; for arrays of float types it is the same as the array type.
out : ndarray, optional
Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.
ddof : int, optional
“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: variance : ndarray, see dtype parameter above
If out=None, returns a new array containing the variance; otherwise, a reference to the output array is returned.
Notes
The variance is the average of the squared deviations from the mean, i.e., var = mean(abs(x - x.mean())**2).
The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.
Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.
For floating-point input, the variance is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.
Examples
>>> a = np.array([[1,2],[3,4]]) >>> np.var(a) 1.25 >>> np.var(a, axis=0) array([ 1., 1.]) >>> np.var(a, axis=1) array([ 0.25, 0.25])
In single precision, var() can be inaccurate:
>>> a = np.zeros((2,512*512), dtype=np.float32) >>> a[0,:] = 1.0 >>> a[1,:] = 0.1 >>> np.var(a) 0.20405951142311096
Computing the variance in float64 is more accurate:
>>> np.var(a, dtype=np.float64) 0.20249999932997387 >>> ((1-0.55)**2 + (0.1-0.55)**2)/2 0.20250000000000001
- vnorm(ord=None, axis=None, keepdims=False)¶
Vector norm
Other functions¶
- dask.array.core.from_array(x, chunks, name=None, lock=False)¶
Create dask array from something that looks like an array
Input must have a .shape and support numpy-style slicing.
The chunks argument must be one of the following forms:
- a blocksize like 1000
- a blockshape like (1000, 1000)
- explicit sizes of all blocks along all dimensions like ((1000, 1000, 500), (400, 400)).
Examples
>>> x = h5py.File('...')['/data/path'] >>> a = da.from_array(x, chunks=(1000, 1000))
If your underlying datastore does not support concurrent reads then include the lock=True keyword argument or lock=mylock if you want multiple arrays to coordinate around the same lock.
>>> a = da.from_array(x, chunks=(1000, 1000), lock=True)
- dask.array.core.store(sources, targets, **kwargs)¶
Store dask arrays in array-like objects, overwrite data in target
This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.
If your data fits in memory then you may prefer calling np.array(myarray) instead.
Parameters: sources: Array or iterable of Arrays
targets: array-like or iterable of array-likes
These should support setitem syntax target[10:20] = ...
Examples
>>> x = ...
>>> import h5py >>> f = h5py.File('myfile.hdf5') >>> dset = f.create_dataset('/data', shape=x.shape, ... chunks=x.chunks, ... dtype='f8')
>>> store(x, dset)
Alternatively store many arrays at the same time
>>> store([x, y, z], [dset1, dset2, dset3])
- dask.array.core.topk(k, x)¶
The top k elements of an array
Returns the k greatest elements of the array in sorted order. Only works on arrays of a single dimension.
>>> x = np.array([5, 1, 3, 6]) >>> d = from_array(x, chunks=2) >>> d.topk(2).compute() array([6, 5])
Runs in near linear time, returns all results in a single chunk so all k elements must fit in memory.
- dask.array.core.coarsen(reduction, x, axes, trim_excess=False)¶
Coarsen array by applying reduction to fixed size neighborhoods
Parameters: reduction: function
Function like np.sum, np.mean, etc...
x: np.ndarray
Array to be coarsened
axes: dict
Mapping of axis to coarsening factor
- dask.array.core.stack(seq, axis=0)¶
Stack arrays along a new axis
Given a sequence of dask Arrays form a new dask Array by stacking them along a new dimension (axis=0 by default)
See also
Examples
Create slices
>>> import dask.array as da >>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2)) ... for i in range(3)]
>>> x = da.stack(data, axis=0) >>> x.shape (3, 4, 4)
>>> da.stack(data, axis=1).shape (4, 3, 4)
>>> da.stack(data, axis=-1).shape (4, 4, 3)
Result is a new dask Array
- dask.array.core.concatenate(seq, axis=0)¶
Concatenate arrays along an existing axis
Given a sequence of dask Arrays form a new dask Array by stacking them along an existing dimension (axis=0 by default)
See also
Examples
Create slices
>>> import dask.array as da >>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2)) ... for i in range(3)]
>>> x = da.concatenate(data, axis=0) >>> x.shape (12, 4)
>>> da.concatenate(data, axis=1).shape (4, 12)
Result is a new dask Array