Computation¶
- OpenClimateGIS offers an extensible computation framework that supports:
- NumPy-based array calculations
- Temporal grouping (level grouping not supported)
- Parameters (e.g. threshold) and multivariate functions (e.g. heat index)
- Overload hooks for aggregation operations
Computations Described¶
Computations are applied following any initial subsetting by time, level, or geometry. If data is spatially aggregated, any computation is applied to the aggregated data values unless calc_raw is set to True
. Computations are applied to “temporal groups” within the target data defined by the calc_grouping parameter. A “temporal group” is a unique set of date parts (e.g. the month of August, the year 2002, January 2004). Data is summarized within the temporal group to produce a single value within the temporal aggregation for each level and spatial coordinate.
As a functional example, the following code replicates (in principle) the computational process used in OpenClimateGIS for calculating the mean of non-leveled (i.e. three-dimensional) data with temporal aggregation:
>>> import numpy as np
>>> from datetime import datetime
>>> # Generate a random three-dimensional dataset (time, latitude/Y, longitude/X).
>>> data = np.ma.array(np.random.rand(4, 2, 2),mask=False)
>>> # This is an example "temporal dimension".
>>> temporal = np.array([datetime(2001,8,1),datetime(2001,8,2),datetime(2001,9,1),datetime(2001,9,2)])
>>> # Assuming a "calc_grouping" of ['month'], split the data into monthly groups (OpenClimateGIS uses a boolean array
>>> # here).
>>> aug, sept = data[0:2, :, :], data[2:, :, :]
>>> # Calculate means along the temporal axis.
>>> mu_aug, mu_sept = [np.ma.mean(d,axis=0) for d in [aug, sept]]
>>> # Recombine the data.
>>> ret = np.vstack((mu_aug, mu_sept))
>>> ret.shape
(2, 2, 2)
It is possible to write functions that do not use a temporal aggregation. In these cases, the function output will have the same shape as the input - as opposed to being reduced by temporal aggregation.
In addition, sample size is always calculated and returned in any calculation output file (not currently supported for multivariate calculations).
Masked data is respected throughout the computational process. These data are assumed to be missing. Hence, they are not used in the sample size calculation.
Temporal and Spatial Aggregation¶
It is possible to overload methods for temporal and/or spatial aggregation in any function. This is described in greater detail in the section Defining Custom Functions. If the source code method is not defined (i.e. not overloaded), it is a mean (for temporal) and a weighted average (for spatial). For ease-of-programming and potential speed-ups through NumPy, temporal aggregation is performed within the function unless that function may operate on single values (i.e. mean v. logarithm). In this case, a method overload is required to accomodate temporal aggregations.
Using Computations¶
Warning
Always use NumPy masked array functions!! Standard array functions may not be compatible with masked variables.
Computations are applied by passing a list of “function dictionaries” to the calc argument of the OcgOperations
object. The other two relevant arguments are calc_raw and calc_grouping.
In its simplest form, a “function dictionary” is composed of a 'func'
key and a 'name'
key. The 'func'
key corresponds to the key
attribute of the function class. The 'name'
key in the “function dictionary” is required and is a user-supplied alias. This is required to allow multiple calculations with the same function names to be performed with different parameters (in a single request).
Functions currently available are listed below: Available Functions. In the case where a function does not expose a key
attribute, the 'func'
value is the lower case string of the function’s class name (i.e. Mean = ‘mean’).
For example to calculate a monthly mean and median on a hypothetical daily climate dataset (written to CSV format), an OpenClimateGIS call may look like:
>>> from ocgis import OcgOperations, RequestDataset
>>> rd = RequestDataset('/path/to/data', 'tas')
>>> calc = [{'func': 'mean', 'name': 'monthly_mean'}, {'func': 'median', 'name': 'monthly_median'}]
>>> ops = OcgOperations(dataset=rd, calc=calc, calc_grouping=['month'], output_format='csv', prefix='my_calculation')
>>> path = ops.execute()
A calculation with arguments includes a 'kwds'
key in the function dictionary:
>>> calc = [{'func': 'between', 'name': 'between_5_10', 'kwds': {'lower': 5, 'upper': 10}}]
If a function takes parameters, those parameters are documented in the Available Functions section. The keyword parameter name maps directly to its keyword name in the calculate
method.
There are also keyword arguments common to all calculations:
'meta_attrs'
: A dictionary containing metadata attributes (e.g. NetCDF attributes) to attach to the output calculation variable. It is also possible to modify the field attributes (e.g. global dataset NetCDF attributes). Both examples are below.
>>> calc = [{'func': 'mean', 'name': 'mean', 'meta_attrs': {'new_attribute': 'the_value'}}]
>>> calc = [{'func': 'mean', 'name': 'mean', 'meta_attrs': {'new_attribute': 5, 'hello': 'attribute'}}]
>>> # Modify the field attributes using a fully specified "meta_attrs" dictionary.
>>> meta_attrs = {'variable': {'new_attribute': 5, 'hello': 'attribute'}, 'field': {'global_attr': 50}}
>>> calc = [{'func': 'mean', 'name': 'mean', 'meta_attrs': meta_attrs}]
Defining Custom Functions¶
String-Based Function Expressions¶
String-based functions composed of variable aliases and selected NumPy functions are also allowed for the calc argument. The list of enabled NumPy functions is found in the ocgis.constants.enabled_numpy_ufuncs
attribute. The string on the left-hand side of the expression will be the name of the output variable. Some acceptable string-based functions are:
>>> calc = 'tas_added=tas+4'
>>> calc = 'es=6.1078*exp(17.08085*(tas-273.16)/(234.175+(tas-273.16)))'
>>> calc = 'diff=tasmax-tasmin'
Note
It is not possible to perform any temporal aggregations using string-based function expressions.
Subclassing OpenClimateGIS Function Classes¶
Once a custom calculation is defined, it must be appended to ocgis.FunctionRegistry
.
>>> from my_functions import MyCustomFunction
>>> from ocgis import FunctionRegistry
>>> FunctionRegistry.append(MyCustomFunction)
Inheritance Structure¶
- All calculations are classes that inherit from the following abstract base classes:
AbstractUnivariateFunction
: Functions with no required parameters operating on a single variable.AbstractUnivariateSetFunction
: Functions with no required parameters opearting on a single variable and reducing along the temporal axis.AbstractParameterizedFunction
: Functions with input parameters. Functions do not inherit directly from this base class. It is used as part of a ‘mix-in’ to indicate a function has parameters.AbstractMultivariateFunction
: Functions operating on two or more variables.
-
class
ocgis.calc.base.
AbstractFunction
(alias=None, dtype=None, field=None, file_only=False, vc=None, parms=None, tgd=None, calc_sample_size=False, fill_value=None, meta_attrs=None, tag='_ocgis_data_variables', spatial_aggregation=False)[source]¶ Bases:
ocgis.base.AbstractOcgisObject
Required class attributes to overload:
- description (str): A arbitrary length string describing the calculation.
- key (str): The function’s unique string identifier.
- standard_name (str): Standard name to store in output metadata.
- long_name (str): Long name description to store in output metadata.
Parameters: - alias (str) – The string identifier to use for the calculation.
- dtype (str or
numpy.core.multiarray.dtype
) – The output data type. Set this to'int'
or'float'
to use the default datatype for the output format and NumPy installation (recommended). If a specific NumPy type is needed, provide the string representation of the type (i.e.'int32'
). - field (
ocgis.interface.base.Field
) – The field object over which the calculation is applied. - file_only (bool) – If
True
pass through but compute output sizes, etc. - vc (
ocgis.interface.base.variable.VariableCollection
) – Theocgis.interface.base.variable.VariableCollection
to append output calculation arrays to. IfNone
a new collection will be created. - parms (dict) – A dictionary of parameter values. The includes any parameters for the calculation.
- tgd (
ocgis.interface.base.dimension.temporal.TemporalGroupDimension
) – An instance ofocgis.interface.base.dimension.temporal.TemporalGroupDimension
. - calc_sample_size (bool) – If
True
, also compute sample sizes for the calculation. - meta_attrs (
ocgis.driver.parms.definition_helpers.MetadataAttributes
) – Contains overloads for variable and/or field attribute values. - tag (str) – The tag to use for variable iteration on the source field (the source variables for calculation).
-
aggregate_spatial
(values, weights)[source]¶ Optional method overload for spatial aggregation. :param values: The input array with dimensions (m, n). :type values:
numpy.ma.core.MaskedArray
:param weights: The input weights array with dimension matchingvalue
. :type weights:numpy.core.multiarray.ndarray
:rtype:numpy.ma.core.MaskedArray
-
aggregate_temporal
(values, **kwargs)[source]¶ Optional method to overload for temporal aggregation.
Parameters: values ( numpy.ma.core.MaskedArray
) – The input five-dimensional array.
-
calculate
(values, **kwargs)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
execute
()[source]¶ Execute the computation over the input field.
Return type: ocgis.interface.base.variable.VariableCollection
-
classmethod
validate
(ops)[source]¶ Optional method to overload that validates the input
ocgis.OcgOperations
.Raises: ocgis.exc.DefinitionValidationError
-
class
ocgis.calc.base.
AbstractUnivariateFunction
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractFunction
Base class for functions accepting a single univariate input.
-
required_units
= None¶ Optional sequence of acceptable string units definitions for input variables. If this is set to
None
, no unit validation will occur.
-
-
class
ocgis.calc.base.
AbstractUnivariateSetFunction
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateFunction
Base class for functions operating on a single variable but always reducing input data along the time dimension.
-
aggregate_temporal
(*args, **kwargs)[source]¶ This operations is always implicit to
calculate()
.
-
-
class
ocgis.calc.base.
AbstractParameterizedFunction
(**kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractFunction
Base class for functions accepting parameters.
-
parms_definition
¶ A dictionary describing the input parameters with keys corresponding to parameter names and values to their types. Set the type to None for no type checking.
>>> {'threshold': float, 'operation': str, 'basis': None}
-
-
class
ocgis.calc.base.
AbstractMultivariateFunction
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractFunction
Base class for functions operating on multivariate inputs. Multivariate functions also double as set functions (i.e. they can temporally group). This can be turned off at the class level by setting
time_aggregation_external
toFalse
.-
required_units
= None¶ For example: required_units = {‘tas’:’fahrenheit’,’rhs’:’percent’}
-
required_variables
¶ Required property/attribute containing the list of input variables expected by the function.
>>> ('tas', 'rhs')
-
Available Functions¶
Click on Show Source to the right of the function to get descriptive information and see class-level definitions.
Mathematical Operations¶
-
class
ocgis.calc.library.math.
Sum
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
-
calculate
(values)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
description
= 'Compute the algebraic sum of a series.'¶
-
-
class
ocgis.calc.library.math.
Convolve1D
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateFunction
,ocgis.calc.base.AbstractParameterizedFunction
-
calculate
(values, v=None, mode='same')[source]¶ Parameters: - values (
numpy.ma.core.MaskedArray
) – Array containing variable values. - v (
numpy.core.multiarray.ndarray
) – The one-dimensional array to convolve withvalues
. - mode (str) – The convolution mode. See: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html.
The output mode
full
is not supported.
Return type: numpy.ma.core.MaskedArray
Raises: AssertionError
- values (
-
description
= 'Perform a one-dimensional convolution for each grid element along the time axis. See: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html'¶
-
Basic Statistics¶
-
class
ocgis.calc.library.statistics.
FrequencyPercentile
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
,ocgis.calc.base.AbstractParameterizedFunction
-
calculate
(values, percentile=None)[source]¶ Parameters: percentile (float on the interval [0,100]) – Percentile to compute.
-
description
= 'The percentile value along the time axis. See: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html.'¶
-
-
class
ocgis.calc.library.statistics.
Max
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
-
calculate
(values)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
description
= 'Max value for the series.'¶
-
-
class
ocgis.calc.library.statistics.
Mean
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
-
calculate
(values)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
description
= 'Compute mean value of the set.'¶
-
-
class
ocgis.calc.library.statistics.
Median
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
-
calculate
(values)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
description
= 'Compute median value of the set.'¶
-
-
class
ocgis.calc.library.statistics.
Min
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
-
calculate
(values)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
description
= 'Min value for the series.'¶
-
-
class
ocgis.calc.library.statistics.
StandardDeviation
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
-
calculate
(values)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
description
= 'Compute standard deviation of the set.'¶
-
Moving Window / Kernel-Based¶
-
class
ocgis.calc.library.statistics.
MovingWindow
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateFunction
,ocgis.calc.base.AbstractParameterizedFunction
-
calculate
(values, k=None, operation=None, mode='valid')[source]¶ Calculate
operation
for the set of values with window of widthk
centered on time coordinate t. Themode
may either be'valid'
or'same'
following the definition here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html. The window widthk
must be an odd number and >= 3. Supported operations are: mean, min, max, median, var, and std.Parameters: - values (
numpy.ma.core.MaskedArray
) – Array containing variable values. - k (int) – The width of the moving window.
k
must be odd and greater than three. - operation (str in ('mean', 'min', 'max', 'median', 'var', 'std')) – The NumPy-based array operation to perform on the set of window values.
- mode (str) – See: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html. The output mode
full
is not supported.
Return type: numpy.ma.core.MaskedArray
Raises: AssertionError, NotImplementedError
- values (
-
description
= ()¶
-
Multivariate Calculations / Indices¶
-
class
ocgis.calc.library.index.duration.
FrequencyDuration
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractKeyedOutputFunction
,ocgis.calc.library.index.duration.Duration
-
description
= 'Count the frequency of spell durations within the temporal aggregation.'¶
-
-
class
ocgis.calc.library.index.duration.
Duration
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
,ocgis.calc.base.AbstractParameterizedFunction
-
description
= 'Summarizes consecutive occurrences in a sequence where the logical operation returns TRUE. The summary operation is applied to the sequences within a temporal aggregation.'¶
-
-
class
ocgis.calc.library.index.heat_index.
HeatIndex
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractMultivariateFunction
-
calculate
(tas=None, rhs=None)[source]¶ The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).
Parameters: - values (
numpy.ma.MaskedArray
) – A three-dimensional array with dimensions (time, row, column). - kwargs – Any keyword parameters for the function.
Return type: numpy.ma.MaskedArray
- values (
-
description
= 'Heat Index following: http://en.wikipedia.org/wiki/Heat_index. If temperature is < 80F or relative humidity is < 40%, the value is masked during calculation. Output units are Fahrenheit.'¶
-
Thresholds¶
-
class
ocgis.calc.library.thresholds.
Between
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
,ocgis.calc.base.AbstractParameterizedFunction
-
description
= 'Count of values falling within the limits lower and upper (inclusive).'¶
-
-
class
ocgis.calc.library.thresholds.
Threshold
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
,ocgis.calc.base.AbstractParameterizedFunction
-
description
= 'Count of values where the logical operation returns TRUE.'¶
-
Miscellaneous¶
-
class
ocgis.calc.library.index.freeze_thaw.
FreezeThaw
(*args, **kwargs)[source]¶ Bases:
ocgis.calc.base.AbstractUnivariateSetFunction
,ocgis.calc.base.AbstractParameterizedFunction
-
calculate
(values, threshold=15)[source]¶ Return the number of freeze-thaw transitions. A value of 2 corresponds to a complete cycle (frozen-thawed-frozen).
Parameters: threshold – The number of degree-days above or below the freezing point after which the ground is considered frozen or thawed.
-
description
= 'Number of freeze-thaw events, where freezing and thawing occurs once a threshold of degree days below or above 0C is reached. A complete cycle (freeze-thaw-freeze) will return a value of 2. '¶
-
Calculation using icclim
for ECA Indices¶
The Python library icclim
(http://icclim.readthedocs.io/en/latest) may be used to calculate the full suite of European Climate Assessment (ECA) indices. To select an icclim
calculation, prefix the name of the indice with the prefix 'icclim_'
. A list of indices computable with icclim
is available here: http://icclim.readthedocs.io/en/latest/python_api.html#icclim-indice-compute-indice.
NESII hosts an Anaconda icclim
build:
conda install -c nesii icclim
For example, to calculate the TG indice (mean of daily mean temperature), select the calculation like:
>>> calc = [{'func': 'icclim_TG', 'name': 'TG'}]
Any optional calculation parameters may be passed in using the 'kwds'
key:
>>> calc = [{..., 'kwds': {'percentile_dict': <Percentile Dictionary>}}]
Custom user indices are not implemented in OpenClimateGIS. OpenClimateGIS may be used to pre-process an ICCLIM input file prior to a custom calculation. Please contact user support if your application could benefit from custom ICCLIM user indices in OpenClimateGIS.