Operations API¶
This page provides an overview of OpenClimateGIS’s operations interface. A Class Reference and Function Reference are also available. OpenClimateGIS’s combined functionality is generally accessed using the operations interface (OcgOperations
). Potential keyword arguments are described below in Keyword Arguments. Please see Examples for an overview of using operations with real data.
Default Coordinate System¶
The default coordinate system for OpenClimateGIS is Spherical
. The representative PROJ.4 string is {'a': 6370997, 'no_defs': True, 'b': 6370997, 'proj': 'longlat', 'towgs84': '0,0,0,0,0,0,0'}
. Prior to OpenClimateGIS v2.x
, the WGS84 datum was used by default. This default coordinate system also applies to bounding boxes. If a bounding box coordinate system differs from the default, this must be converted to a dictionary representation:
>>> import shapely
>>> import ocgis
>>> bbox = [-120, 30, -130, 40]
>>> bbox = shapely.geometry.box(*bbox)
>>> geom = {'geom': bbox, 'crs': ocgis.crs.WGS84()}
The correct coordinate system will always be read from file by the driver. The default coordinate system used by OpenClimateGIS may be changed using env.DEFAULT_COORDSYS.
OcgOperations
Reference¶
The basic operations syntax is:
>>> import ocgis
>>> ops = ocgis.OcgOperations(**kwargs)
>>> res = ops.execute()
- Some notable features of the operations object:
- The only required keyword argument is dataset.
- All keyword arguments are exposed as public attributes which may be arbitrarily set using standard syntax:
>>> import ocgis
>>> rd = ocgis.RequestDataset(uri='/path/to/some/dataset', variable='foo')
>>> ops = OcgOperations(dataset=rd)
>>> ops.aggregate = True
- Operations may be run in parallel using MPI. See Parallel Operations for guidance.
- The object SHOULD NOT be reused following an execution as the software may add/modify attribute contents. Instantiate a new object following an execution or copy the object appropriately.
Keyword Arguments¶
Additional information on arguments are found in their respective sections.
abstraction¶
Note
OpenClimateGIS uses the bounds
attribute of a NetCDF variable to construct 'polygon'
representations of regular grids. If no bounds
attribute is found, the software defaults to the 'point'
geometry abstraction.
Value | Description |
---|---|
'polygon' (default) |
Represent cells as shapely.geometry.Polygon objects. |
'point' |
Represent cells as shapely.geometry.Point objects. |
add_auxiliary_files¶
If True
(the default), create a new directory and add metadata and other informational files in addition to the converted file. If False
, write the target file only to dir_output and do not create a new directory.
aggregate¶
Value | Description |
---|---|
True |
Selected geometries are combined into a single geometry (see Aggregate (Union)). |
False (default) |
Selected geometries are not combined. |
agg_selection¶
Value | Description |
---|---|
True |
Aggregate (union) geom to a single geometry. |
False (default) |
Leave geom as is. |
The purpose of this data manipulation is to ease the method required to aggregate (union) geometries into arbitrary regions. A simple example would be unioning the U.S. state boundaries of Utah, Nevada, Arizona, and New Mexico into a single polygon representing a “Southwestern Region”.
allow_empty¶
Value | Description |
---|---|
True | Allow the empty set for geometries not geographically coincident with a source geometry. |
False (default) | Raise EmptyDataNotAllowed if the empty set is encountered. |
calc¶
See the Computation page for more details.
calc_grouping¶
There are three forms for this argument:
- Date Part Grouping: Any combination of
'day'
,'month'
, and'year'
.
>>> calc_grouping = ['day']
>>> calc_grouping = ['month','year']
>>> calc_grouping = ['day','year']
Temporal aggregation splits date/time coordinates into parts and groups them according to unique combinations of those parts. If data is grouped by month, then all of the January times would be in one group with all of the August times in another. If a grouping of month and year are applied, then all of the January 2000 times would be in a group with all of the January 2001 times and so on.
Any temporal aggregation applied to a dataset should be consistent with the input data’s temporal resolution. For example, aggregating by day, month, and year on daily input dataset is not a reasonable aggregation as the data selected for aggregation will have a sample size of one (i.e. one day per aggregation group).
- Summarize Over All: The string
'all'
indicates the entire time domain should be summarized.
>>> calc_grouping = 'all'
- Seasonal Groups: A sequence of integer sequences. Element sequences must be mutually exclusive (i.e. no repeated integers). Representative times for the climatology are chosen as the center month in a sequence (i.e. January in the sequence [12,1,2]).
Month integers map as expected (1=January, 2=February, etc.). The example below constructs a single season composed of March, April, and May. Note the nested lists.
>>> calc_grouping = [[3, 4, 5]]
The next example consumes all the months in a year.
>>> calc_grouping = [[12, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
Unique, time sequential seasons are possible with the 'unique'
flag:
>>> calc_grouping = [[12, 1, 2], 'unique']
- A unique season has at least one value associated with each month in the season. If a month is missing, the season will be dropped. The season specification above returns a calculation based on values with date coordinates in:
- Dec 1900, Jan 1901, Feb 1901
- Dec 1901, Jan 1902, Feb 1902
It is also possible to group the seasons by year.
>>> calc_grouping = [[12, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], 'year']
- For example, this returns a calculation based on values with date coordinates in:
- 1900: Dec, Jan, Feb
- 1901: Dec, Jan, Feb
- 1902: Dec, Jan, Feb
calc_raw¶
Value | Description |
---|---|
True |
If ocgis.OcgOperations.aggregate is True , perform computations on raw, unaggregated values. |
False (default) |
Use aggregated values during computation. |
callback¶
A callback function that may be used for custom messaging. This function integrates with the log handler and will receive messages at or above the logging.INFO
level.
>>> def callback(percent, message):
>>> print(percent, message)
conform_units_to¶
Destination units for conversion. If this parameter is set, then the cf_units
module must be installed. Setting this parameter will override conformed units set on dataset
objects.
dataset¶
This is the only required parameter. All elements of dataset
will be processed.
- A
dataset
is the target file(s) or object(s) containing data to process. Adataset
may be: - A file on the local machine or network location accessible by the software (use
RequestDataset
). - A URL to an unsecured OpenDAP dataset (use
RequestDataset
). - An OpenClimateGIS field object (use
Field
). If aField
object is used, be aware operations may modify the object inplace.
- A file on the local machine or network location accessible by the software (use
>>> # A keyword argument dictionary can be used in place of an actual request object.
>>> dataset = {'uri': '/path/to/my/data.nc'}
>>> # Use variable auto-discovery.
>>> from ocgis import RequestDataset
>>> dataset = RequestDataset(uri='/path/to/my/data.nc'}
>>> # Specify the target variable directly.
>>> dataset = RequestDataset(uri='/path/to/my/data.nc', variable='tas')
In version v2.x
the RequestDatasetCollection
was removed. Use sequences of request dataset or field objects in their place.
dir_output¶
This sets the output folder for any disk formats. If this is None
and :attr:ocgis.env.DIR_OUTPUT
is None
, then output will be written to the current working directory.
geom¶
Warning
Unless aggregate
or agg_selection
is True, subsetting with multiple geometries to netCDF will raise an error.
If a geometry(s) is provided, it is used to subset every RequestDataset
or Field
object. Supplying a value of None
(the default) results in the return of the entire spatial domain.
There are a number of ways to parameterize the geom
keyword argument:
- Bounding Box
This is a list of floats corresponding to: [min_x, min_y, max_x, max_y]
. See Default Coordinate System for guidance on coordinate system defaults and usages.
>>> geom = [-120.4, 30.0, -110.3, 41.4]
- Point
This is a list of floats corresponding to: [longitude, latitude]
. See Default Coordinate System for guidance on coordinate system defaults and usages.
>>> geom = [-120.4, 36.5]
- Using
GeomCabinetIterator
>>> from ocgis import GeomCabinetIterator
>>> geom = GeomCabinetIterator('state_boundaries', geom_select_uid=[16])
- Using a
GeomCabinet
key
>>> geom = 'state_boundaries'
- Custom Sequence of Shapely Geometry Dictionaries
The 'crs'
key is optional. If it is not included, WGS84 is assumed. The 'properties'
key is also optional. See Default Coordinate System for guidance on coordinate system defaults and usages.
>>> geom = [{'geom': Point(x,y), 'properties': {'UGID': 23, 'NAME': 'geometry23'}, 'crs': CoordinateReferenceSystem(epsg=4326)} ,...]
- Path to a GIS file
>>> geom = '/path/to/shapefile.shp'
geom_select_sql_where¶
Warning
Single quotes must be used inside double quotes!
If provided, this string will be used as part of a SQL WHERE
clause to select geometries from the source. See the section titled “WHERE” for documentation on supported statements: http://www.gdal.org/ogr_sql.html. This works only for geometries read from file.
>>> geom_select_sql_where = "STATE_NAME = 'Wisconsin'"
>>> geom_select_sql_where = "STATE_NAME in ('Wisconsin', 'Nebraska')"
>>> geom_select_sql_where = "POPULATION > 1500"
geom_select_uid¶
Select specific geometries from the target shapefile chosen using geom. The integer sequence selects matching UGID values from the shapefiles. For more information on adding new shapefiles or the requirements of input shapefiles, please see the section titled Shapefile Data.
>>> geom_select_uid = [1, 2, 3]
>>> geom_select_uid = [4, 55]
>>> geom_select_uid = [1]
As clarification, suppose there is a shapefile called basins.shp
(this assumes the folder containing the shapefile has been set as the value for ocgis.env.DIR_GEOMCABINET
) with the following attribute table:
UGID | Name |
---|---|
1 | Basin A |
2 | Basin B |
3 | Basin C |
If the goal is to subset the data by the boundary of “Basin A” and write the resulting data to netCDF, a call to OpenClimateGIS operations looks like:
>>> import ocgis
>>> rd = ocgis.RequestDataset(uri='/path/to/data.nc', variable='tas')
>>> path = ocgis.OcgOperations(dataset=rd, geom='basins', geom_select_uid=[1], output_format='nc').execute()
geom_uid¶
All subset geometries must have a unique identifier. The unique identifier allows subsetted data to be linked to the selection geometry. Passing a string value to geom_uid
will overload the default unique identifier ocgis.env.DEFAULT_GEOM_UID
. If no unique identifier is available, a one-based unique identifier will be generated having a name with value ocgis.env.DEFAULT_GEOM_UID
.
interpolate_spatial_bounds¶
If True
, attempt to interpolate bounds coordinates if they are absent. This will also extrapolate exterior bounds to avoid losing spatial coverage.
melted¶
If False
, variable names will be individual column headers (non-melted). If True
, variable names will be placed into a single column.
A non-melted format:
TIME | TAS | TASMAX |
---|---|---|
1 | 30.3 | 40.3 |
2 | 32.2 | 41.7 |
3 | 31.7 | 40.9 |
A melted format:
TIME | NAME | VALUE |
---|---|---|
1 | TAS | 30.3 |
2 | TAS | 32.2 |
3 | TAS | 31.7 |
1 | TASMAX | 40.3 |
2 | TASMAX | 41.7 |
3 | TASMAX | 40.9 |
optimized_bbox_subset¶
If True
, only perform the bounding box subset ignoring other subsetting procedures like masking within the bounding coordinates. Using this option should result in lower memory requirements and shorter processing times for subsets. Note this assumes the bounding box aligns appropriately with the target grid.
output_crs¶
By default, the output coordinate reference system (CRS) is the CRS of the input RequestDataset
object. If multiple RequestDataset
objects are part of an OcgOperations
call, then output_crs
must be provided if the input CRS values of the RequestDataset
objects differ. The value for output_crs
is an instance of CRS
.
>>> import ocgis
>>> output_crs = ocgis.crs.Spherical()
General PROJ.4
and EPSG
codes are supported by coordinate systems. For example, to output data on the GCS North American 1983 coordinate system, you can configure the coordinate system like:
>>> from ocgis import crs
>>> output_crs = crs.CRS(epgs=4269)
You could also use a PROJ.4
string:
>>> from ocgis import crs
>>> output_crs = crs.CRS(proj4='+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs')
output_format¶
Value | Description |
---|---|
'ocgis' (default) |
Return a SpatialCollection with keys matching ugid (see geom). Also see Spatial Collections for more information on this output format. |
'csv' |
A CSV file representation of the data. |
'csv-shp' |
In addition to a CSV representation, shapefiles with primary key links to the CSV are provided. |
'nc' |
A NetCDF4-CF file. See NetCDF Output for additional information on the structure of the NetCDF format. |
'geojson' |
A GeoJSON representation of the data. |
'shp' |
A shapefile representation of the data. |
output_format_options¶
A dictionary of converter-specific options. Options for each converter are listed in the table below.
Output Format | Option | Description |
---|---|---|
'nc' |
data_model | The netCDF data model: http://unidata.github.io/netcdf4-python/#netCDF4.Dataset. |
variable_kwargs | Dictionary of keyword parameters to use for netCDF variable creation. See: http://unidata.github.io/netcdf4-python/#netCDF4.Variable. | |
unlimited_to_fixedsize | If True , convert the unlimited dimension to fixed size. Only applies to time and level dimensions. |
|
geom_dim | The name of the dimension storing aggregated (unioned) outputs. Only applies when aggregate is True . |
>>> output_format_options = {'data_model': 'NETCDF4_CLASSIC'}
>>> options = {'variable_kwargs': {'zlib': True, 'complevel': 4}}
prefix¶
The prefix
provides the name of the output folder (if add_auxiliary_files=True
) and the filename prefix for any file output created by OpenClimateGIS.
>>> prefix = 'fn_start'
regrid_destination¶
Please see ESMPy Regridding for an overview and limitations.
If provided, all RequestDataset
objects in dataset
will be regridded to match the grid provided in the argument’s object. This argument must be a RequestDataset
or Field
.
>>> regrid_destination = ocgis.RequestDataset(uri='/path/to/destination.nc')
regrid_options¶
A dictionary with regridding options. Please see the documentation for regrid_field()
. Dictionary elements of regrid_options
correspond to the keyword arguments of this function.
>>> import ESMF
>>> regrid_options = {'regrid_method': ESMF.RegridMethod.CONSERVE}
search_radius_mult¶
This is a scalar float value multiplied by the target data’s resolution to determine the buffer radius for the point. This is only applicable when subsetting against gridded datasets.
Note
Prior to v2.x
, this was a float value by default. This was changed to None
in current versions. Hence, point geometries will be used for subsetting and not a buffered point.
select_nearest¶
If True
, the nearest geometry to the centroid of the current selection geometry is returned.
slice¶
This is a list of integers, None
, or lists of integers. The values composing the list will be converted to slice objects. For example, to return the first ten time steps:
>>> slc = [None, [0, 10], None, None, None]
The index locations in the above list correspond to:
Index | Description |
---|---|
0 | Realization / Ensemble Member |
1 | Time |
2 | Level |
3 | Row |
4 | Column |
To select the last time step:
>>> slice = [None, -1, None, None, None]
snippet¶
Note
The entire spatial domain is returned unless geom is specified.
Note
Only applies for pure subsetting for limiting computations use time_range
and/or time_region
.
Value | Description |
---|---|
True |
Return only the first time point and the first level slice (if applicable). |
False (default) |
Return all data. |
spatial_operation¶
Value | Description |
---|---|
"intersects" (default) |
Source geometries touching or overlapping selection geometries are returned (see Intersects (Select)). |
"clip" |
A full geometric intersection is performed between source and selection geometries. New geometries may be created. (see Clip (Intersection)) |
spatial_reorder¶
If True
, reorder wrapped coordinates such that the longitude values are in ascending order. Reordering assumes the first row of longitude coordinates are representative of the other longitude coordinate rows. Bounds and corners will be removed in the event of a reorder. Only applies to spherical coordinate systems.
If False
(the default), do not attempt to reorder wrapped spherical longitude coordinates.
Note
If aggregate=True
, spatial reordering is not possible.
spatial_wrapping¶
Allows control of the wrapped state for all input fields. Only field objects with a wrappable coordinate system are affected. Wrapping operations are applied before all other operations.
Value | Description |
---|---|
None (default) |
Do not attempt a wrap or unwrap operation. |
"wrap" |
Wrap spherical coordinates to the -180 to 180 longitudinal domain. |
"unwrap" |
Unwrap spherical coordinate to the 0 to 360 longitudinal domain. |
time_range¶
Upper and lower bounds for the time dimension subset composed of a two-element sequence of datetime.datetime
-like objects. If None
, return all time points. Using this argument will overload all RequestDataset
time_range
values.
time_region¶
A dictionary with keys of ‘month’ and/or ‘year’ and values as sequences corresponding to target month and/or year values. Empty region selection for a key may be set to None
. Using this argument will overload all RequestDataset
time_region
values.
>>> time_region = {'month': [6, 7], 'year': [2010, 2011]}
>>> time_region = {'year': [2010]}
time_subset_func¶
Subset the time dimension by an arbitrary function. The functions must take one argument and one keyword. The argument is a vector of datetime.datetime
-like objects. The keyword argument should be called “bounds” and may be None
. If the bounds value is not None
, it should expect a n-by-2 array of datetime
objects. The function must return an integer sequence suitable for indexing. For example:
>>> def subset_func(value, bounds=None):
>>> indices = []
>>> for ii, v in enumerate(value):
>>> if v.month == 6:
>>> indices.append(ii)
>>> return indices
Note
The subset function is applied following time_region
and time_range
.
vector_wrap¶
Note
Only applicable for spherical, geographic coordinate systems.
Value | Description |
---|---|
True (default) |
For vector geometry outputs (e.g. shp ), ensure output longitudinal domain is -180 to 180. |
False |
Maintain the RequestDataset ’s longitudinal domain. |
Environment¶
These are global parameters used by OpenClimateGIS. For those familiar with arcpy
programming, this behaves similarly to the arcpy.env
module. Any ocgis.env
variable be overloaded with system environment variables by setting OCGIS_<variable-name>.
env.DEFAULT_GEOM_UID
='UGID'
- The default unique geometry identifier to search for in geometry datasets. This is also the name of the created unique identifier if none exists in the target.
env.DIR_DATA
=None
- Directory(s) to search through to find data. If specified, this should be a sequence of directories. It may also be a single directory location. Note that the search may take considerable time if a very high level directory is chosen. If this variable is set, it is only necessary to specify the filename(s) when creating a
RequestDataset
. env.DIR_OUTPUT
=None
(defaults to current working directory)- The directory where output data is written. OpenClimateGIS creates directories inside which output data is stored unless
add_auxiliary_files
isFalse
. IfNone
, it defaults to the current working directory.
env.DIR_GEOMCABINET
= <path-to-directory>- Location of the geometry directory (e.g. a directory containing shapefiles) for use by
GeomCabinet
. Formerly calledDIR_SHPCABINET
. env.MELTED
=False
- If
True
, use a melted tabular format with all variable values collected in a single column. env.OVERWRITE
=False
Warning
Use with caution.
Set to
True
to overwrite existing output folders. This will remove the folder if it exists!env.PREFIX
='ocgis_output'
- The default prefix to apply to output files. This is also the output folder name.
env.SUPPRESS_WARNINGS
=True
- If
True
, suppress all OpenClimateGIS warning messages to standard out. Warning messages will still be logged. env.USE_CFUNITS
=True
- If
True
, usecfunits
for any unit transformations. This will be automatically set toFalse
ifcfunits
is not available for import. env.USE_MEMORY_OPTIMIZATIONS
=False
- If
True
, some methods will attempt to minimize their memory usage at the expense of computational time. env.USE_SPATIAL_INDEX
=True
- If
True
, usertree
to create spatial indices for spatial operations. This will be automatically set toFalse
ifrtree
is not available for import. env.VERBOSE
=False
- Indicate if additional output information should be printed to terminal.
env.DEFAULT_COORDSYS
=ocgis.crs.Spherical
- The default coordinate system used by OpenClimateGIS.
env.USE_NETCDF4_MPI
=None
- If
None
, detect if it is possible to usenetCDF4-python
’s MPI asynchronous write capability. Use it if available. IfTrue
, do asynchronous writes withnetCDF4-python
. Set toFalse
to use synchronous writes always.
Inspecting Data¶
See Inspection for guidance on inspecting datasets.
Spatial Collections¶
See the Advanced Subsetting example for SpatialCollection
usage. Spatial collections are returned by default from OcgOperations
.
Shapefile Data¶
Shapefiles may be added to the directory mapped by the environment variable env.DIR_GEOMCABINET.
The shapefile’s geom key is the name of the shapefile. It must have an alphanumeric name with no spaces with the only allowable special character being underscores “_”.