GlossaryΒΆ
- Context manager:
- Python uses context managers and the “with” statement to handle mutual exclusion locks for resources such as files, network sockets, and databases, as described in PEP 343 and the Python documentation.
- Dask array:
- Dask arrays are a drop-in replacement for a commonly used subset of NumPy algorithms. They implement a subset of the NumPy ndarray interface to provide blocked algorithms that divide each large array into small arrays. This enables computation on arrays larger than memory and enables the use of multiple cores.
- Dask bag:
- A set is an unordered collection of elements, each of which may be present only once in the set. A multiset or “bag” is an unordered collection of elements, each of which may be present multiple times in the bag. Dask bags parallelize computations across large bags of generic Python objects. Dask bags are suitable for processing unstructured or semi-structured data such as large JSON blobs or log files.
- Dask dataframe:
- The pandas dataframe is a two dimensional labeled data structure with columns which may have different types, similar to a spreadsheet or SQL table, or a dict of pandas series objects. Dask dataframes look and feel like pandas dataframes but operate on datasets larger than memory using multiple threads. Dask.dataframe does not implement the complete pandas interface.
- GIL
- CPython’s Global Interpreter Lock synchronizes threads safely but can affect performance.
- Opportunistic caching:
- Dask’s Opportunistic Caching monitors tasks and caches them according to predicted future use.