API

Create and Store Arrays

from_array(x, chunks[, name, lock]) Create dask array from something that looks like an array
store(sources, targets, **kwargs) Store dask arrays in array-like objects, overwrite data in target
Array.to_hdf5(filename, datapath, **kwargs) Store array in HDF5 file

Specialized Functions for Dask.Array

Array.map_blocks(func[, chunks, dtype, name]) Map a function across all blocks of a dask array
Array.map_overlap(func, depth[, boundary, trim]) Map a function over blocks of the array with some overlap
topk(k, x) The top k elements of an array
coarsen(reduction, x, axes[, trim_excess]) Coarsen array by applying reduction to fixed size neighborhoods
stack(seq[, axis]) Stack arrays along a new axis
concatenate(seq[, axis]) Concatenate arrays along an existing axis

Array Methods

class dask.array.core.Array(dask, name, chunks, dtype=None, shape=None)

Parallel Array

Parameters:

dask : dict

Task dependency graph

name : string

Name of array in dask

shape : tuple of ints

Shape of the entire array

chunks: iterable of tuples

block sizes along each dimension

Attributes

T
chunks
dtype
imag
nbytes Number of bytes in array
ndim
numblocks
real
shape
size Number of elements in array
vindex

Methods

all([axis, keepdims]) Test whether all array elements along a given axis evaluate to True.
any([axis, keepdims]) Test whether any array element along a given axis evaluates to True.
argmax([axis]) Indices of the maximum values along an axis.
argmin([axis]) Return the indices of the minimum values along an axis.
astype(dtype, **kwargs) Copy of the array, cast to a specified type
cache([store]) Evaluate and cache array
compute(**kwargs)
conj()
dot(a, b[, out]) Dot product of two arrays.
map_blocks(func[, chunks, dtype, name]) Map a function across all blocks of a dask array
map_overlap(func, depth[, boundary, trim]) Map a function over blocks of the array with some overlap
max([axis, keepdims]) Return the maximum of an array or maximum along an axis.
mean([axis, dtype, keepdims]) Compute the arithmetic mean along the specified axis.
min([axis, keepdims]) Return the minimum of an array or minimum along an axis.
moment(order[, axis, dtype, keepdims, ddof]) Calculate the nth centralized moment.
prod([axis, dtype, keepdims]) Return the product of array elements over a given axis.
rechunk(chunks)
squeeze() Remove single-dimensional entries from the shape of an array.
std([axis, dtype, keepdims, ddof]) Compute the standard deviation along the specified axis.
store(target, **kwargs) Store dask arrays in array-like objects, overwrite data in target
sum([axis, dtype, keepdims]) Sum of array elements over a given axis.
to_hdf5(filename, datapath, **kwargs) Store array in HDF5 file
topk(k) The top k elements of an array
transpose([axes]) Permute the dimensions of an array.
var([axis, dtype, keepdims, ddof]) Compute the variance along the specified axis.
visualize([filename, optimize_graph])
vnorm([ord, axis, keepdims]) Vector norm
all(axis=None, keepdims=False)

Test whether all array elements along a given axis evaluate to True.

Parameters:

a : array_like

Input array or object that can be converted to an array.

axis : None or int or tuple of ints, optional

Axis or axes along which a logical AND reduction is performed. The default (axis = None) is perform a logical OR over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.

New in version 1.7.0.

If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.

out : ndarray, optional

Alternate output array in which to place the result. It must have the same shape as the expected output and its type is preserved (e.g., if dtype(out) is float, the result will consist of 0.0’s and 1.0’s). See doc.ufuncs (Section “Output arguments”) for more details.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

all : ndarray, bool

A new boolean or array is returned unless out is specified, in which case a reference to out is returned.

See also

ndarray.all
equivalent method
any
Test whether any element along a given axis evaluates to True.

Notes

Not a Number (NaN), positive infinity and negative infinity evaluate to True because these are not equal to zero.

Examples

>>> np.all([[True,False],[True,True]])
False
>>> np.all([[True,False],[True,True]], axis=0)
array([ True, False], dtype=bool)
>>> np.all([-1, 4, 5])
True
>>> np.all([1.0, np.nan])
True
>>> o=np.array([False])
>>> z=np.all([-1, 4, 5], out=o)
>>> id(z), id(o), z                             
(28293632, 28293632, array([ True], dtype=bool))
any(axis=None, keepdims=False)

Test whether any array element along a given axis evaluates to True.

Returns single boolean unless axis is not None

Parameters:

a : array_like

Input array or object that can be converted to an array.

axis : None or int or tuple of ints, optional

Axis or axes along which a logical OR reduction is performed. The default (axis = None) is perform a logical OR over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.

New in version 1.7.0.

If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.

out : ndarray, optional

Alternate output array in which to place the result. It must have the same shape as the expected output and its type is preserved (e.g., if it is of type float, then it will remain so, returning 1.0 for True and 0.0 for False, regardless of the type of a). See doc.ufuncs (Section “Output arguments”) for details.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

any : bool or ndarray

A new boolean or ndarray is returned unless out is specified, in which case a reference to out is returned.

See also

ndarray.any
equivalent method
all
Test whether all elements along a given axis evaluate to True.

Notes

Not a Number (NaN), positive infinity and negative infinity evaluate to True because these are not equal to zero.

Examples

>>> np.any([[True, False], [True, True]])
True
>>> np.any([[True, False], [False, False]], axis=0)
array([ True, False], dtype=bool)
>>> np.any([-1, 0, 5])
True
>>> np.any(np.nan)
True
>>> o=np.array([False])
>>> z=np.any([-1, 4, 5], out=o)
>>> z, o
(array([ True], dtype=bool), array([ True], dtype=bool))
>>> # Check now that z is a reference to o
>>> z is o
True
>>> id(z), id(o) # identity of z and o              
(191614240, 191614240)
argmax(axis=None)

Indices of the maximum values along an axis.

Parameters:

a : array_like

Input array.

axis : int, optional

By default, the index is into the flattened array, otherwise along the specified axis.

Returns:

index_array : ndarray of ints

Array of indices into the array. It has the same shape as a.shape with the dimension along axis removed.

See also

ndarray.argmax, argmin

amax
The maximum value along a given axis.
unravel_index
Convert a flat index into an index tuple.

Notes

In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.

Examples

>>> a = np.arange(6).reshape(2,3)
>>> a
array([[0, 1, 2],
       [3, 4, 5]])
>>> np.argmax(a)
5
>>> np.argmax(a, axis=0)
array([1, 1, 1])
>>> np.argmax(a, axis=1)
array([2, 2])
>>> b = np.arange(6)
>>> b[1] = 5
>>> b
array([0, 5, 2, 3, 4, 5])
>>> np.argmax(b) # Only the first occurrence is returned.
1
argmin(axis=None)

Return the indices of the minimum values along an axis.

See also

argmax
Similar function. Please refer to numpy.argmax for detailed documentation.
astype(dtype, **kwargs)

Copy of the array, cast to a specified type

cache(store=None, **kwargs)

Evaluate and cache array

Parameters:

store: MutableMapping or ndarray-like

Place to put computed and cached chunks

kwargs:

Keyword arguments to pass on to get function for scheduling

This triggers evaluation and store the result in either

1. An ndarray object supporting setitem (see da.store)

2. A MutableMapping like a dict or chest

It then returns a new dask array that points to this store.

This returns a semantically equivalent dask array.

>>> import dask.array as da

>>> x = da.arange(5, chunks=2)

>>> y = 2*x + 1

>>> z = y.cache() # triggers computation

>>> y.compute() # Does entire computation

array([1, 3, 5, 7, 9])

>>> z.compute() # Just pulls from store

array([1, 3, 5, 7, 9])

You might base a cache off of an array like a numpy array or

h5py.Dataset.

>>> cache = np.empty(5, dtype=x.dtype)

>>> z = y.cache(store=cache)

>>> cache

array([1, 3, 5, 7, 9])

Or one might use a MutableMapping like a dict or chest

>>> cache = dict()

>>> z = y.cache(store=cache)

>>> cache # doctest: +SKIP

{(‘x’, 0): array([1, 3]),

(‘x’, 1): array([5, 7]), (‘x’, 2): array([9])}

dot(a, b, out=None)

Dot product of two arrays.

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:

dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
Parameters:

a : array_like

First argument.

b : array_like

Second argument.

out : ndarray, optional

Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.

Returns:

output : ndarray

Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned. If out is given, then it is returned.

Raises:

ValueError

If the last dimension of a is not the same size as the second-to-last dimension of b.

See also

vdot
Complex-conjugating dot product.
tensordot
Sum products over arbitrary axes.
einsum
Einstein summation convention.

Examples

>>> np.dot(3, 4)
12

Neither argument is complex-conjugated:

>>> np.dot([2j, 3j], [2j, 3j])
(-13+0j)

For 2-D arrays it’s the matrix product:

>>> a = [[1, 0], [0, 1]]
>>> b = [[4, 1], [2, 2]]
>>> np.dot(a, b)
array([[4, 1],
       [2, 2]])
>>> a = np.arange(3*4*5*6).reshape((3,4,5,6))
>>> b = np.arange(3*4*5*6)[::-1].reshape((5,4,6,3))
>>> np.dot(a, b)[2,3,2,1,2,2]
499128
>>> sum(a[2,3,2,:] * b[1,2,:,2])
499128
map_blocks(func, chunks=None, dtype=None, name=None)

Map a function across all blocks of a dask array

You must also specify the chunks of the resulting array. If you don’t then we assume that the resulting array has the same block structure as the input.

>>> import dask.array as da
>>> x = da.arange(6, chunks=3)
>>> x.map_blocks(lambda x: x * 2).compute()
array([ 0,  2,  4,  6,  8, 10])

The da.map_blocks function can also accept multiple arrays

>>> d = da.arange(5, chunks=2)
>>> e = da.arange(5, chunks=2)
>>> f = map_blocks(lambda a, b: a + b**2, d, e)
>>> f.compute()
array([ 0,  2,  6, 12, 20])

If function changes shape of the blocks then please provide chunks explicitly.

>>> y = x.map_blocks(lambda x: x[::2], chunks=((2, 2),))

Your block function can learn where in the array it is if it supports a block_id keyword argument. This will receive entries like (2, 0, 1), the position of the block in the dask array.

>>> def func(block, block_id=None):
...     pass

You may specify the name of the resulting task in the graph with the optional name keyword argument.

>>> y = x.map_blocks(lambda x: x + 1, name='increment')
map_overlap(func, depth, boundary=None, trim=True, **kwargs)

Map a function over blocks of the array with some overlap

We share neighboring zones between blocks of the array, then map a function, then trim away the neighboring strips.

Parameters:

func: function

The function to apply to each extended block

depth: int, tuple, or dict

The number of cells that each block should share with its neighbors If a tuple or dict this can be different per axis

boundary: str, tuple, dict

how to handle the boundaries. Values include ‘reflect’, ‘periodic’, ‘nearest’, ‘none’, or any constant value like 0 or np.nan

trim: bool

Whether or not to trim the excess after the map function. Set this to false if your mapping function does this for you.

**kwargs:

Other keyword arguments valid in map_blocks

Examples

>>> x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1])
>>> x = from_array(x, chunks=5)
>>> def derivative(x):
...     return x - np.roll(x, 1)
>>> y = x.map_overlap(derivative, depth=1, boundary=0)
>>> y.compute()
array([ 1,  0,  1,  1,  0,  0, -1, -1,  0])
>>> import dask.array as da
>>> x = np.arange(16).reshape((4, 4))
>>> d = da.from_array(x, chunks=(2, 2))
>>> d.map_overlap(lambda x: x + x.size, depth=1).compute()
array([[16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])
>>> func = lambda x: x + x.size
>>> depth = {0: 1, 1: 1}
>>> boundary = {0: 'reflect', 1: 'none'}
>>> d.map_overlap(func, depth, boundary).compute()  
array([[ 12.,  13.,  14.,  15.],
       [ 16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.],
       [ 24.,  25.,  26.,  27.]])
max(axis=None, keepdims=False)

Return the maximum of an array or maximum along an axis.

Parameters:

a : array_like

Input data.

axis : int, optional

Axis along which to operate. By default, flattened input is used.

out : ndarray, optional

Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. See doc.ufuncs (Section “Output arguments”) for more details.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

amax : ndarray or scalar

Maximum of a. If axis is None, the result is a scalar value. If axis is given, the result is an array of dimension a.ndim - 1.

See also

amin
The minimum value of an array along a given axis, propagating any NaNs.
nanmax
The maximum value of an array along a given axis, ignoring any NaNs.
maximum
Element-wise maximum of two arrays, propagating any NaNs.
fmax
Element-wise maximum of two arrays, ignoring any NaNs.
argmax
Return the indices of the maximum values.

nanmin, minimum, fmin

Notes

NaN values are propagated, that is if at least one item is NaN, the corresponding max value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmax.

Don’t use amax for element-wise comparison of 2 arrays; when a.shape[0] is 2, maximum(a[0], a[1]) is faster than amax(a, axis=0).

Examples

>>> a = np.arange(4).reshape((2,2))
>>> a
array([[0, 1],
       [2, 3]])
>>> np.amax(a)           # Maximum of the flattened array
3
>>> np.amax(a, axis=0)   # Maxima along the first axis
array([2, 3])
>>> np.amax(a, axis=1)   # Maxima along the second axis
array([1, 3])
>>> b = np.arange(5, dtype=np.float)
>>> b[2] = np.NaN
>>> np.amax(b)
nan
>>> np.nanmax(b)
4.0
mean(axis=None, dtype=None, keepdims=False)

Compute the arithmetic mean along the specified axis.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.

Parameters:

a : array_like

Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

axis : int, optional

Axis along which the means are computed. The default is to compute the mean of the flattened array.

dtype : data-type, optional

Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.

out : ndarray, optional

Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

m : ndarray, see dtype parameter above

If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.

See also

average
Weighted average

std, var, nanmean, nanstd, nanvar

Notes

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.

Examples

>>> a = np.array([[1, 2], [3, 4]])
>>> np.mean(a)
2.5
>>> np.mean(a, axis=0)
array([ 2.,  3.])
>>> np.mean(a, axis=1)
array([ 1.5,  3.5])

In single precision, mean can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
0.546875

Computing the mean in float64 is more accurate:

>>> np.mean(a, dtype=np.float64)
0.55000000074505806
min(axis=None, keepdims=False)

Return the minimum of an array or minimum along an axis.

Parameters:

a : array_like

Input data.

axis : int, optional

Axis along which to operate. By default, flattened input is used.

out : ndarray, optional

Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. See doc.ufuncs (Section “Output arguments”) for more details.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

amin : ndarray or scalar

Minimum of a. If axis is None, the result is a scalar value. If axis is given, the result is an array of dimension a.ndim - 1.

See also

amax
The maximum value of an array along a given axis, propagating any NaNs.
nanmin
The minimum value of an array along a given axis, ignoring any NaNs.
minimum
Element-wise minimum of two arrays, propagating any NaNs.
fmin
Element-wise minimum of two arrays, ignoring any NaNs.
argmin
Return the indices of the minimum values.

nanmax, maximum, fmax

Notes

NaN values are propagated, that is if at least one item is NaN, the corresponding min value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmin.

Don’t use amin for element-wise comparison of 2 arrays; when a.shape[0] is 2, minimum(a[0], a[1]) is faster than amin(a, axis=0).

Examples

>>> a = np.arange(4).reshape((2,2))
>>> a
array([[0, 1],
       [2, 3]])
>>> np.amin(a)           # Minimum of the flattened array
0
>>> np.amin(a, axis=0)   # Minima along the first axis
array([0, 1])
>>> np.amin(a, axis=1)   # Minima along the second axis
array([0, 2])
>>> b = np.arange(5, dtype=np.float)
>>> b[2] = np.NaN
>>> np.amin(b)
nan
>>> np.nanmin(b)
0.0
moment(order, axis=None, dtype=None, keepdims=False, ddof=0)

Calculate the nth centralized moment.

Parameters:

order : int

Order of the moment that is returned, must be >= 2.

axis : int, optional

Axis along which the central moment is computed. The default is to compute the moment of the flattened array.

dtype : data-type, optional

Type to use in computing the moment. For arrays of integer type the default is float64; for arrays of float types it is the same as the array type.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.

ddof : int, optional

“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.

Returns:

moment : ndarray

References

[R1]Pebay, Philippe (2008), “Formulas for Robust, One-Pass Parallel

Computation of Covariances and Arbitrary-Order Statistical Moments” (PDF), Technical Report SAND2008-6212, Sandia National Laboratories

nbytes

Number of bytes in array

prod(axis=None, dtype=None, keepdims=False)

Return the product of array elements over a given axis.

Parameters:

a : array_like

Input data.

axis : None or int or tuple of ints, optional

Axis or axes along which a product is performed. The default (axis = None) is perform a product over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.

New in version 1.7.0.

If this is a tuple of ints, a product is performed on multiple axes, instead of a single axis or all the axes as before.

dtype : data-type, optional

The data-type of the returned array, as well as of the accumulator in which the elements are multiplied. By default, if a is of integer type, dtype is the default platform integer. (Note: if the type of a is unsigned, then so is dtype.) Otherwise, the dtype is the same as that of a.

out : ndarray, optional

Alternative output array in which to place the result. It must have the same shape as the expected output, but the type of the output values will be cast if necessary.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

product_along_axis : ndarray, see dtype parameter above.

An array shaped as a but with the specified axis removed. Returns a reference to out if specified.

See also

ndarray.prod
equivalent method
numpy.doc.ufuncs
Section “Output arguments”

Notes

Arithmetic is modular when using integer types, and no error is raised on overflow. That means that, on a 32-bit platform:

>>> x = np.array([536870910, 536870910, 536870910, 536870910])
>>> np.prod(x) #random
16

Examples

By default, calculate the product of all elements:

>>> np.prod([1.,2.])
2.0

Even when the input array is two-dimensional:

>>> np.prod([[1.,2.],[3.,4.]])
24.0

But we can also specify the axis over which to multiply:

>>> np.prod([[1.,2.],[3.,4.]], axis=1)
array([  2.,  12.])

If the type of x is unsigned, then the output type is the unsigned platform integer:

>>> x = np.array([1, 2, 3], dtype=np.uint8)
>>> np.prod(x).dtype == np.uint
True

If x is of a signed integer type, then the output type is the default platform integer:

>>> x = np.array([1, 2, 3], dtype=np.int8)
>>> np.prod(x).dtype == np.int
True
size

Number of elements in array

squeeze()

Remove single-dimensional entries from the shape of an array.

Parameters:

a : array_like

Input data.

axis : None or int or tuple of ints, optional

New in version 1.7.0.

Selects a subset of the single-dimensional entries in the shape. If an axis is selected with shape entry greater than one, an error is raised.

Returns:

squeezed : ndarray

The input array, but with with all or a subset of the dimensions of length 1 removed. This is always a itself or a view into a.

Examples

>>> x = np.array([[[0], [1], [2]]])
>>> x.shape
(1, 3, 1)
>>> np.squeeze(x).shape
(3,)
>>> np.squeeze(x, axis=(2,)).shape
(1, 3)
std(axis=None, dtype=None, keepdims=False, ddof=0)

Compute the standard deviation along the specified axis.

Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.

Parameters:

a : array_like

Calculate the standard deviation of these values.

axis : int, optional

Axis along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.

dtype : dtype, optional

Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type.

out : ndarray, optional

Alternative output array in which to place the result. It must have the same shape as the expected output but the type (of the calculated values) will be cast if necessary.

ddof : int, optional

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

standard_deviation : ndarray, see dtype parameter above.

If out is None, return a new array containing the standard deviation, otherwise return a reference to the output array.

See also

var, mean, nanmean, nanstd, nanvar

numpy.doc.ufuncs
Section “Output arguments”

Notes

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(abs(x - x.mean())**2)).

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.

For floating-point input, the std is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.

Examples

>>> a = np.array([[1, 2], [3, 4]])
>>> np.std(a)
1.1180339887498949
>>> np.std(a, axis=0)
array([ 1.,  1.])
>>> np.std(a, axis=1)
array([ 0.5,  0.5])

In single precision, std() can be inaccurate:

>>> a = np.zeros((2,512*512), dtype=np.float32)
>>> a[0,:] = 1.0
>>> a[1,:] = 0.1
>>> np.std(a)
0.45172946707416706

Computing the standard deviation in float64 is more accurate:

>>> np.std(a, dtype=np.float64)
0.44999999925552653
store(target, **kwargs)

Store dask arrays in array-like objects, overwrite data in target

This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.

If your data fits in memory then you may prefer calling np.array(myarray) instead.

Parameters:

sources: Array or iterable of Arrays

targets: array-like or iterable of array-likes

These should support setitem syntax target[10:20] = ...

Examples

>>> x = ...  
>>> import h5py  
>>> f = h5py.File('myfile.hdf5')  
>>> dset = f.create_dataset('/data', shape=x.shape,
...                                  chunks=x.chunks,
...                                  dtype='f8')  
>>> store(x, dset)  

Alternatively store many arrays at the same time

>>> store([x, y, z], [dset1, dset2, dset3])  
sum(axis=None, dtype=None, keepdims=False)

Sum of array elements over a given axis.

Parameters:

a : array_like

Elements to sum.

axis : None or int or tuple of ints, optional

Axis or axes along which a sum is performed. The default (axis = None) is perform a sum over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.

New in version 1.7.0.

If this is a tuple of ints, a sum is performed on multiple axes, instead of a single axis or all the axes as before.

dtype : dtype, optional

The type of the returned array and of the accumulator in which the elements are summed. By default, the dtype of a is used. An exception is when a has an integer type with less precision than the default platform integer. In that case, the default platform integer is used instead.

out : ndarray, optional

Array into which the output is placed. By default, a new array is created. If out is given, it must be of the appropriate shape (the shape of a with axis removed, i.e., numpy.delete(a.shape, axis)). Its type is preserved. See doc.ufuncs (Section “Output arguments”) for more details.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

sum_along_axis : ndarray

An array with the same shape as a, with the specified axis removed. If a is a 0-d array, or if axis is None, a scalar is returned. If an output array is specified, a reference to out is returned.

See also

ndarray.sum
Equivalent method.
cumsum
Cumulative sum of array elements.
trapz
Integration of array values using the composite trapezoidal rule.

mean, average

Notes

Arithmetic is modular when using integer types, and no error is raised on overflow.

Examples

>>> np.sum([0.5, 1.5])
2.0
>>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32)
1
>>> np.sum([[0, 1], [0, 5]])
6
>>> np.sum([[0, 1], [0, 5]], axis=0)
array([0, 6])
>>> np.sum([[0, 1], [0, 5]], axis=1)
array([1, 5])

If the accumulator is too small, overflow occurs:

>>> np.ones(128, dtype=np.int8).sum(dtype=np.int8)
-128
to_hdf5(filename, datapath, **kwargs)

Store array in HDF5 file

>>> x.to_hdf5('myfile.hdf5', '/x')  

Optionally provide arguments as though to h5py.File.create_dataset

>>> x.to_hdf5('myfile.hdf5', '/x', compression='lzf', shuffle=True)  

See also

da.store, h5py.File.create_dataset

topk(k)

The top k elements of an array

Returns the k greatest elements of the array in sorted order. Only works on arrays of a single dimension.

>>> x = np.array([5, 1, 3, 6])
>>> d = from_array(x, chunks=2)
>>> d.topk(2).compute()
array([6, 5])

Runs in near linear time, returns all results in a single chunk so all k elements must fit in memory.

transpose(axes=None)

Permute the dimensions of an array.

Parameters:

a : array_like

Input array.

axes : list of ints, optional

By default, reverse the dimensions, otherwise permute the axes according to the values given.

Returns:

p : ndarray

a with its axes permuted. A view is returned whenever possible.

See also

rollaxis

Examples

>>> x = np.arange(4).reshape((2,2))
>>> x
array([[0, 1],
       [2, 3]])
>>> np.transpose(x)
array([[0, 2],
       [1, 3]])
>>> x = np.ones((1, 2, 3))
>>> np.transpose(x, (1, 0, 2)).shape
(2, 1, 3)
var(axis=None, dtype=None, keepdims=False, ddof=0)

Compute the variance along the specified axis.

Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.

Parameters:

a : array_like

Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.

axis : int, optional

Axis along which the variance is computed. The default is to compute the variance of the flattened array.

dtype : data-type, optional

Type to use in computing the variance. For arrays of integer type the default is float32; for arrays of float types it is the same as the array type.

out : ndarray, optional

Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.

ddof : int, optional

“Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.

keepdims : bool, optional

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

Returns:

variance : ndarray, see dtype parameter above

If out=None, returns a new array containing the variance; otherwise, a reference to the output array is returned.

See also

std, mean, nanmean, nanstd, nanvar

numpy.doc.ufuncs
Section “Output arguments”

Notes

The variance is the average of the squared deviations from the mean, i.e., var = mean(abs(x - x.mean())**2).

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.

For floating-point input, the variance is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.

Examples

>>> a = np.array([[1,2],[3,4]])
>>> np.var(a)
1.25
>>> np.var(a, axis=0)
array([ 1.,  1.])
>>> np.var(a, axis=1)
array([ 0.25,  0.25])

In single precision, var() can be inaccurate:

>>> a = np.zeros((2,512*512), dtype=np.float32)
>>> a[0,:] = 1.0
>>> a[1,:] = 0.1
>>> np.var(a)
0.20405951142311096

Computing the variance in float64 is more accurate:

>>> np.var(a, dtype=np.float64)
0.20249999932997387
>>> ((1-0.55)**2 + (0.1-0.55)**2)/2
0.20250000000000001
vnorm(ord=None, axis=None, keepdims=False)

Vector norm

Other functions

dask.array.core.from_array(x, chunks, name=None, lock=False)

Create dask array from something that looks like an array

Input must have a .shape and support numpy-style slicing.

The chunks argument must be one of the following forms:

  • a blocksize like 1000
  • a blockshape like (1000, 1000)
  • explicit sizes of all blocks along all dimensions like ((1000, 1000, 500), (400, 400)).

Examples

>>> x = h5py.File('...')['/data/path']  
>>> a = da.from_array(x, chunks=(1000, 1000))  

If your underlying datastore does not support concurrent reads then include the lock=True keyword argument or lock=mylock if you want multiple arrays to coordinate around the same lock.

>>> a = da.from_array(x, chunks=(1000, 1000), lock=True)  
dask.array.core.store(sources, targets, **kwargs)

Store dask arrays in array-like objects, overwrite data in target

This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.

If your data fits in memory then you may prefer calling np.array(myarray) instead.

Parameters:

sources: Array or iterable of Arrays

targets: array-like or iterable of array-likes

These should support setitem syntax target[10:20] = ...

Examples

>>> x = ...  
>>> import h5py  
>>> f = h5py.File('myfile.hdf5')  
>>> dset = f.create_dataset('/data', shape=x.shape,
...                                  chunks=x.chunks,
...                                  dtype='f8')  
>>> store(x, dset)  

Alternatively store many arrays at the same time

>>> store([x, y, z], [dset1, dset2, dset3])  
dask.array.core.topk(k, x)

The top k elements of an array

Returns the k greatest elements of the array in sorted order. Only works on arrays of a single dimension.

>>> x = np.array([5, 1, 3, 6])
>>> d = from_array(x, chunks=2)
>>> d.topk(2).compute()
array([6, 5])

Runs in near linear time, returns all results in a single chunk so all k elements must fit in memory.

dask.array.core.coarsen(reduction, x, axes, trim_excess=False)

Coarsen array by applying reduction to fixed size neighborhoods

Parameters:

reduction: function

Function like np.sum, np.mean, etc...

x: np.ndarray

Array to be coarsened

axes: dict

Mapping of axis to coarsening factor

dask.array.core.stack(seq, axis=0)

Stack arrays along a new axis

Given a sequence of dask Arrays form a new dask Array by stacking them along a new dimension (axis=0 by default)

See also

concatenate

Examples

Create slices

>>> import dask.array as da
>>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2))
...          for i in range(3)]
>>> x = da.stack(data, axis=0)
>>> x.shape
(3, 4, 4)
>>> da.stack(data, axis=1).shape
(4, 3, 4)
>>> da.stack(data, axis=-1).shape
(4, 4, 3)

Result is a new dask Array

dask.array.core.concatenate(seq, axis=0)

Concatenate arrays along an existing axis

Given a sequence of dask Arrays form a new dask Array by stacking them along an existing dimension (axis=0 by default)

See also

stack

Examples

Create slices

>>> import dask.array as da
>>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2))
...          for i in range(3)]
>>> x = da.concatenate(data, axis=0)
>>> x.shape
(12, 4)
>>> da.concatenate(data, axis=1).shape
(4, 12)

Result is a new dask Array