Computation

OpenClimateGIS offers an extensible computation framework that supports:
  1. NumPy-based array calculations
  2. Temporal grouping (level grouping not supported)
  3. Parameters (e.g. threshold) and multivariate functions (e.g. heat index)
  4. Overload hooks for aggregation operations

Computations Described

Computations are applied following any initial subsetting by time, level, or geometry. If data is spatially aggregated, any computation is applied to the aggregated data values unless calc_raw is set to True. Computations are applied to “temporal groups” within the target data defined by the calc_grouping parameter. A “temporal group” is a unique set of date parts (e.g. the month of August, the year 2002, January 2004). Data is summarized within the temporal group to produce a single value within the temporal aggregation for each level and spatial coordinate.

As a functional example, the following code replicates (in principle) the computational process used in OpenClimateGIS for calculating the mean of non-leveled (i.e. three-dimensional) data with temporal aggregation:

>>> import numpy as np
>>> from datetime import datetime
>>> # Generate a random three-dimensional dataset (time, latitude/Y, longitude/X).
>>> data = np.ma.array(np.random.rand(4, 2, 2),mask=False)
>>> # This is an example "temporal dimension".
>>> temporal = np.array([datetime(2001,8,1),datetime(2001,8,2),datetime(2001,9,1),datetime(2001,9,2)])
>>> # Assuming a "calc_grouping" of ['month'], split the data into monthly groups (OpenClimateGIS uses a boolean array
>>> # here).
>>> aug, sept = data[0:2, :, :], data[2:, :, :]
>>> # Calculate means along the temporal axis.
>>> mu_aug, mu_sept = [np.ma.mean(d,axis=0) for d in [aug, sept]]
>>> # Recombine the data.
>>> ret = np.vstack((mu_aug, mu_sept))
>>> ret.shape
(2, 2, 2)

It is possible to write functions that do not use a temporal aggregation. In these cases, the function output will have the same shape as the input - as opposed to being reduced by temporal aggregation.

In addition, sample size is always calculated and returned in any calculation output file (not currently supported for multivariate calculations).

Masked data is respected throughout the computational process. These data are assumed to be missing. Hence, they are not used in the sample size calculation.

Temporal and Spatial Aggregation

It is possible to overload methods for temporal and/or spatial aggregation in any function. This is described in greater detail in the section Defining Custom Functions. If the source code method is not defined (i.e. not overloaded), it is a mean (for temporal) and a weighted average (for spatial). For ease-of-programming and potential speed-ups through NumPy, temporal aggregation is performed within the function unless that function may operate on single values (i.e. mean v. logarithm). In this case, a method overload is required to accomodate temporal aggregations.

Using Computations

Warning

Always use NumPy masked array functions!! Standard array functions may not be compatible with masked variables.

Computations are applied by passing a list of “function dictionaries” to the calc argument of the OcgOperations object. The other two relevant arguments are calc_raw and calc_grouping.

In its simplest form, a “function dictionary” is composed of a 'func' key and a 'name' key. The 'func' key corresponds to the key attribute of the function class. The 'name' key in the “function dictionary” is required and is a user-supplied alias. This is required to allow multiple calculations with the same function names to be performed with different parameters (in a single request).

Functions currently available are listed below: Available Functions. In the case where a function does not expose a key attribute, the 'func' value is the lower case string of the function’s class name (i.e. Mean = ‘mean’).

For example to calculate a monthly mean and median on a hypothetical daily climate dataset (written to CSV format), an OpenClimateGIS call may look like:

>>> from ocgis import OcgOperations, RequestDataset
>>> rd = RequestDataset('/path/to/data', 'tas')
>>> calc = [{'func': 'mean', 'name': 'monthly_mean'}, {'func': 'median', 'name': 'monthly_median'}]
>>> ops = OcgOperations(dataset=rd, calc=calc, calc_grouping=['month'], output_format='csv', prefix='my_calculation')
>>> path = ops.execute()

A calculation with arguments includes a 'kwds' key in the function dictionary:

>>> calc = [{'func': 'between', 'name': 'between_5_10', 'kwds': {'lower': 5, 'upper': 10}}]

If a function takes parameters, those parameters are documented in the Available Functions section. The keyword parameter name maps directly to its keyword name in the calculate method.

There are also keyword arguments common to all calculations:

  • 'meta_attrs': A dictionary containing metadata attributes (e.g. NetCDF attributes) to attach to the output calculation variable. It is also possible to modify the field attributes (e.g. global dataset NetCDF attributes). Both examples are below.
>>> calc = [{'func': 'mean', 'name': 'mean', 'meta_attrs': {'new_attribute': 'the_value'}}]
>>> calc = [{'func': 'mean', 'name': 'mean', 'meta_attrs': {'new_attribute': 5, 'hello': 'attribute'}}]
>>> # Modify the field attributes using a fully specified "meta_attrs" dictionary.
>>> meta_attrs = {'variable': {'new_attribute': 5, 'hello': 'attribute'}, 'field': {'global_attr': 50}}
>>> calc = [{'func': 'mean', 'name': 'mean', 'meta_attrs': meta_attrs}]

Defining Custom Functions

String-Based Function Expressions

String-based functions composed of variable aliases and selected NumPy functions are also allowed for the calc argument. The list of enabled NumPy functions is found in the ocgis.constants.enabled_numpy_ufuncs attribute. The string on the left-hand side of the expression will be the name of the output variable. Some acceptable string-based functions are:

>>> calc = 'tas_added=tas+4'
>>> calc = 'es=6.1078*exp(17.08085*(tas-273.16)/(234.175+(tas-273.16)))'
>>> calc = 'diff=tasmax-tasmin'

Note

It is not possible to perform any temporal aggregations using string-based function expressions.

Subclassing OpenClimateGIS Function Classes

Once a custom calculation is defined, it must be appended to ocgis.FunctionRegistry.

>>> from my_functions import MyCustomFunction
>>> from ocgis import FunctionRegistry
>>> FunctionRegistry.append(MyCustomFunction)

Inheritance Structure

All calculations are classes that inherit from the following abstract base classes:
  1. AbstractUnivariateFunction: Functions with no required parameters operating on a single variable.
  2. AbstractUnivariateSetFunction: Functions with no required parameters opearting on a single variable and reducing along the temporal axis.
  3. AbstractParameterizedFunction: Functions with input parameters. Functions do not inherit directly from this base class. It is used as part of a ‘mix-in’ to indicate a function has parameters.
  4. AbstractMultivariateFunction: Functions operating on two or more variables.

class ocgis.calc.base.AbstractFunction(alias=None, dtype=None, field=None, file_only=False, vc=None, parms=None, tgd=None, calc_sample_size=False, fill_value=None, meta_attrs=None, tag='_ocgis_data_variables', spatial_aggregation=False)[source]

Bases: ocgis.base.AbstractOcgisObject

Required class attributes to overload:

  • description (str): A arbitrary length string describing the calculation.
  • key (str): The function’s unique string identifier.
  • standard_name (str): Standard name to store in output metadata.
  • long_name (str): Long name description to store in output metadata.
Parameters:
  • alias (str) – The string identifier to use for the calculation.
  • dtype (str or numpy.core.multiarray.dtype) – The output data type. Set this to 'int' or 'float' to use the default datatype for the output format and NumPy installation (recommended). If a specific NumPy type is needed, provide the string representation of the type (i.e. 'int32').
  • field (ocgis.interface.base.Field) – The field object over which the calculation is applied.
  • file_only (bool) – If True pass through but compute output sizes, etc.
  • vc (ocgis.interface.base.variable.VariableCollection) – The ocgis.interface.base.variable.VariableCollection to append output calculation arrays to. If None a new collection will be created.
  • parms (dict) – A dictionary of parameter values. The includes any parameters for the calculation.
  • tgd (ocgis.interface.base.dimension.temporal.TemporalGroupDimension) – An instance of ocgis.interface.base.dimension.temporal.TemporalGroupDimension.
  • calc_sample_size (bool) – If True, also compute sample sizes for the calculation.
  • meta_attrs (ocgis.driver.parms.definition_helpers.MetadataAttributes) – Contains overloads for variable and/or field attribute values.
  • tag (str) – The tag to use for variable iteration on the source field (the source variables for calculation).
aggregate_spatial(values, weights)[source]

Optional method overload for spatial aggregation. :param values: The input array with dimensions (m, n). :type values: numpy.ma.core.MaskedArray :param weights: The input weights array with dimension matching value. :type weights: numpy.core.multiarray.ndarray :rtype: numpy.ma.core.MaskedArray

aggregate_temporal(values, **kwargs)[source]

Optional method to overload for temporal aggregation.

Parameters:values (numpy.ma.core.MaskedArray) – The input five-dimensional array.
calculate(values, **kwargs)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

execute()[source]

Execute the computation over the input field.

Return type:ocgis.interface.base.variable.VariableCollection
get_output_units(variable)[source]

Get the output units.

Return type:str
classmethod validate(ops)[source]

Optional method to overload that validates the input ocgis.OcgOperations.

Raises:ocgis.exc.DefinitionValidationError
validate_units(*args, **kwargs)[source]

Optional method to overload for units validation at the calculation level.


class ocgis.calc.base.AbstractUnivariateFunction(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractFunction

Base class for functions accepting a single univariate input.

required_units = None

Optional sequence of acceptable string units definitions for input variables. If this is set to None, no unit validation will occur.


class ocgis.calc.base.AbstractUnivariateSetFunction(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateFunction

Base class for functions operating on a single variable but always reducing input data along the time dimension.

aggregate_temporal(*args, **kwargs)[source]

This operations is always implicit to calculate().


class ocgis.calc.base.AbstractParameterizedFunction(**kwargs)[source]

Bases: ocgis.calc.base.AbstractFunction

Base class for functions accepting parameters.

parms_definition

A dictionary describing the input parameters with keys corresponding to parameter names and values to their types. Set the type to None for no type checking.

>>> {'threshold': float, 'operation': str, 'basis': None}

class ocgis.calc.base.AbstractMultivariateFunction(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractFunction

Base class for functions operating on multivariate inputs. Multivariate functions also double as set functions (i.e. they can temporally group). This can be turned off at the class level by setting time_aggregation_external to False.

required_units = None

For example: required_units = {‘tas’:’fahrenheit’,’rhs’:’percent’}

required_variables

Required property/attribute containing the list of input variables expected by the function.

>>> ('tas', 'rhs')

Available Functions

Click on Show Source to the right of the function to get descriptive information and see class-level definitions.

Mathematical Operations

class ocgis.calc.library.math.Sum(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction

calculate(values)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

description = 'Compute the algebraic sum of a series.'
class ocgis.calc.library.math.Convolve1D(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateFunction, ocgis.calc.base.AbstractParameterizedFunction

calculate(values, v=None, mode='same')[source]
Parameters:
Return type:

numpy.ma.core.MaskedArray

Raises:

AssertionError

description = 'Perform a one-dimensional convolution for each grid element along the time axis. See: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html'

Basic Statistics

class ocgis.calc.library.statistics.FrequencyPercentile(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction, ocgis.calc.base.AbstractParameterizedFunction

calculate(values, percentile=None)[source]
Parameters:percentile (float on the interval [0,100]) – Percentile to compute.
description = 'The percentile value along the time axis. See: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html.'
class ocgis.calc.library.statistics.Max(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction

calculate(values)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

description = 'Max value for the series.'
class ocgis.calc.library.statistics.Mean(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction

calculate(values)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

description = 'Compute mean value of the set.'
class ocgis.calc.library.statistics.Median(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction

calculate(values)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

description = 'Compute median value of the set.'
class ocgis.calc.library.statistics.Min(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction

calculate(values)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

description = 'Min value for the series.'
class ocgis.calc.library.statistics.StandardDeviation(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction

calculate(values)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

description = 'Compute standard deviation of the set.'

Moving Window / Kernel-Based

class ocgis.calc.library.statistics.MovingWindow(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateFunction, ocgis.calc.base.AbstractParameterizedFunction

calculate(values, k=None, operation=None, mode='valid')[source]

Calculate operation for the set of values with window of width k centered on time coordinate t. The mode may either be 'valid' or 'same' following the definition here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html. The window width k must be an odd number and >= 3. Supported operations are: mean, min, max, median, var, and std.

Parameters:
  • values (numpy.ma.core.MaskedArray) – Array containing variable values.
  • k (int) – The width of the moving window. k must be odd and greater than three.
  • operation (str in ('mean', 'min', 'max', 'median', 'var', 'std')) – The NumPy-based array operation to perform on the set of window values.
  • mode (str) – See: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html. The output mode full is not supported.
Return type:

numpy.ma.core.MaskedArray

Raises:

AssertionError, NotImplementedError

description = ()

Multivariate Calculations / Indices

class ocgis.calc.library.index.duration.FrequencyDuration(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractKeyedOutputFunction, ocgis.calc.library.index.duration.Duration

calculate(values, threshold=None, operation=None)[source]
Parameters:
  • threshold (float) – The threshold value to use for the logical operation.
  • operation (str) – The logical operation. One of ‘gt’,’gte’,’lt’, or ‘lte’.
description = 'Count the frequency of spell durations within the temporal aggregation.'
class ocgis.calc.library.index.duration.Duration(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction, ocgis.calc.base.AbstractParameterizedFunction

calculate(values, threshold=None, operation=None, summary='mean')[source]
Parameters:
  • threshold (float) – The threshold value to use for the logical operation.
  • operation (str) – The logical operation. One of ‘gt’,’gte’,’lt’, or ‘lte’.
  • summary (str) – The summary operation to apply the durations. One of ‘mean’,’median’,’std’,’max’, or ‘min’.
description = 'Summarizes consecutive occurrences in a sequence where the logical operation returns TRUE. The summary operation is applied to the sequences within a temporal aggregation.'
class ocgis.calc.library.index.heat_index.HeatIndex(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractMultivariateFunction

calculate(tas=None, rhs=None)[source]

The calculation method to overload. Values are explicitly passed to avoid dereferencing. Reducing along the time axis is required (i.e. axis=0).

Parameters:
  • values (numpy.ma.MaskedArray) – A three-dimensional array with dimensions (time, row, column).
  • kwargs – Any keyword parameters for the function.
Return type:

numpy.ma.MaskedArray

description = 'Heat Index following: http://en.wikipedia.org/wiki/Heat_index. If temperature is < 80F or relative humidity is < 40%, the value is masked during calculation. Output units are Fahrenheit.'

Thresholds

class ocgis.calc.library.thresholds.Between(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction, ocgis.calc.base.AbstractParameterizedFunction

calculate(values, lower=None, upper=None)[source]
Parameters:
  • lower (float) – The lower value of the range.
  • upper (float) – The upper value of the range.
description = 'Count of values falling within the limits lower and upper (inclusive).'
class ocgis.calc.library.thresholds.Threshold(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction, ocgis.calc.base.AbstractParameterizedFunction

calculate(values, threshold=None, operation=None)[source]
Parameters:
  • threshold (float) – The threshold value to use for the logical operation.
  • operation (str) – The logical operation. One of ‘gt’,’gte’,’lt’, or ‘lte’.
description = 'Count of values where the logical operation returns TRUE.'

Miscellaneous

class ocgis.calc.library.index.freeze_thaw.FreezeThaw(*args, **kwargs)[source]

Bases: ocgis.calc.base.AbstractUnivariateSetFunction, ocgis.calc.base.AbstractParameterizedFunction

calculate(values, threshold=15)[source]

Return the number of freeze-thaw transitions. A value of 2 corresponds to a complete cycle (frozen-thawed-frozen).

Parameters:threshold – The number of degree-days above or below the freezing point after which the ground is considered frozen or thawed.
description = 'Number of freeze-thaw events, where freezing and thawing occurs once a threshold of degree days below or above 0C is reached. A complete cycle (freeze-thaw-freeze) will return a value of 2. '

Calculation using icclim for ECA Indices

The Python library icclim (http://icclim.readthedocs.io/en/latest) may be used to calculate the full suite of European Climate Assessment (ECA) indices. To select an icclim calculation, prefix the name of the indice with the prefix 'icclim_'. A list of indices computable with icclim is available here: http://icclim.readthedocs.io/en/latest/python_api.html#icclim-indice-compute-indice.

NESII hosts an Anaconda icclim build:

conda install -c nesii icclim

For example, to calculate the TG indice (mean of daily mean temperature), select the calculation like:

>>> calc = [{'func': 'icclim_TG', 'name': 'TG'}]

Any optional calculation parameters may be passed in using the 'kwds' key:

>>> calc = [{..., 'kwds': {'percentile_dict': <Percentile Dictionary>}}]

Custom user indices are not implemented in OpenClimateGIS. OpenClimateGIS may be used to pre-process an ICCLIM input file prior to a custom calculation. Please contact user support if your application could benefit from custom ICCLIM user indices in OpenClimateGIS.