Shared Xarray Functionality

Return to Homepage

Defines gval accessor for xarray objects.

class gval.accessors.gval_xarray.GVALXarray(xarray_obj)

Class for extending xarray functionality.

_obj

Object to use off the accessor

Type:

Union[xr.Dataset, xr.DataArray]

data_type

Data type of the _obj

Type:

type

attribute_tracking(benchmark_map: DataArray | Dataset, agreement_map: Dataset | DataArray | None = None, candidate_suffix: str | None = '_candidate', benchmark_suffix: str | None = '_benchmark', candidate_include: Iterable[str] | None = None, candidate_exclude: Iterable[str] | None = None, benchmark_include: Iterable[str] | None = None, benchmark_exclude: Iterable[str] | None = None) DataFrame[AttributeTrackingDf] | Tuple[DataFrame[AttributeTrackingDf], DataArray | Dataset]

Concatenate xarray attributes into a single pandas dataframe.

Parameters:
  • candidate_map (Union[xr.DataArray, xr.Dataset]) – Self. Candidate map xarray object.

  • benchmark_map (Union[xr.DataArray, xr.Dataset]) – Benchmark map xarray object.

  • candidate_suffix (Optional[str], default = '_candidate') – Suffix to append to candidate map xarray attributes, by default ‘_candidate’.

  • benchmark_suffix (Optional[str], default = '_benchmark') – Suffix to append to benchmark map xarray attributes, by default ‘_benchmark’.

  • candidate_include (Optional[Iterable[str]], default = None) – List of attributes to include from candidate map. candidate_include and candidate_exclude are mutually exclusive arguments.

  • candidate_exclude (Optional[Iterable[str]], default = None) – List of attributes to exclude from candidate map. candidate_include and candidate_exclude are mutually exclusive arguments.

  • benchmark_include (Optional[Iterable[str]], default = None) – List of attributes to include from benchmark map. benchmark_include and benchmark_exclude are mutually exclusive arguments.

  • benchmark_exclude (Optional[Iterable[str]], default = None) – List of attributes to exclude from benchmark map. benchmark_include and benchmark_exclude are mutually exclusive arguments.

Raises:
  • ValueError – If candidate_include and candidate_exclude are both not None.

  • ValueError – If benchmark_include and benchmark_exclude are both not None.

Returns:

Pandas dataframe with concatenated attributes from candidate and benchmark maps. If agreement_map is not None, returns a tuple with the dataframe and the agreement map.

Return type:

Union[DataFrame[AttributeTrackingDf], Tuple[DataFrame[AttributeTrackingDf], Union[xr.DataArray, xr.Dataset]]]

cat_plot(title: str = 'Categorical Map', colormap: str = 'viridis', figsize: Tuple[int, int] | None = None, legend_labels: list | None = None, plot_bands: str | list = 'all', colorbar_label: str | list = '', basemap: TileProvider = {'attribution': '(C) OpenStreetMap contributors', 'html_attribution': '&copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors', 'max_zoom': 19, 'name': 'OpenStreetMap.Mapnik', 'url': 'https://tile.openstreetmap.org/{z}/{x}/{y}.png'})

Plots categorical Map for xarray object

Parameters:
  • title (str) – Title of map, default = “Categorical Map”

  • colormap (str, default = "viridis") – Colormap of data

  • figsize (tuple[int, int], default=None) – Size of the plot

  • legend_labels (list, default = None) – Override labels in legend

  • plot_bands (Union[str, list], default='all') – What bands to plot

  • color_bar_label (Union[str, list], default ="") – Label or labels for colorbar in the case of continuous plots

  • basemap (Union[bool, xyzservices.lib.TileProvider], default = cx.providers.Stamen.Terrain) – Add basemap to the plot

References

categorical_compare(benchmark_map: GeoDataFrame | Dataset | DataArray, positive_categories: Number | Iterable[Number] | None, comparison_function: Callable | DUFunc | ufunc | vectorize | str = 'szudzik', metrics: str | Iterable[str] = 'all', target_map: Dataset | str | None = 'benchmark', resampling: Resampling | None = Resampling.nearest, pairing_dict: Dict[Tuple[Number, Number], Number] | None = None, allow_candidate_values: Iterable[int | float] | None = None, allow_benchmark_values: Iterable[int | float] | None = None, nodata: Number | None = None, encode_nodata: bool | None = False, exclude_value: Number | None = None, negative_categories: Number | Iterable[Number] | None = None, average: str = 'micro', weights: Iterable[Number] | None = None, rasterize_attributes: list | None = None, attribute_tracking: bool = False, attribute_tracking_kwargs: Dict | None = None, subsampling_df: GeoDataFrame | None = None, subsampling_average: str | None = None) Tuple[Tuple[Dataset | DataArray, DataFrame[Crosstab_df], DataFrame[Metrics_df]] | Tuple[Dataset | DataArray, DataFrame[Crosstab_df], DataFrame[Metrics_df], DataFrame[AttributeTrackingDf]]]

Computes comparison between two categorical value xarray’s.

Conducts the following steps:
  • homogenize: aligns data types, spatial alignment, and rasterizes data

  • compute_agreement: computes agreement map

  • compute_crosstab: computes crosstabulation

  • compute_metrics: computes metrics

Spatially aligning the xarray’s produces copies of the original candidate and benchmark maps. To reduce memory usage, consider using the homogenize() accessor method to overwrite the original maps in memory or saving them on disk.

Parameters:
  • benchmark_map (Union[gpd.GeoDataFrame, xr.Dataset, xr.DataArray]) – Benchmark map.

  • positive_categories (Optional[Union[Number, Iterable[Number]]]) – Number or list of numbers representing the values to consider as the positive condition. When the average argument is either “macro” or “weighted”, this represents the categories to compute metrics for.

  • comparison_function (Union[Callable, nb.np.ufunc.dufunc.DUFunc, np.ufunc, np.vectorize, str], default = 'szudzik') – Comparison function. Created by decorating function with @nb.vectorize() or using np.ufunc(). Use of numba is preferred as it is faster. Strings with registered comparison_functions are also accepted. Possible options include “pairing_dict”. If passing “pairing_dict” value, please see the description for the argument for more information on behaviour. All available comparison functions can be found with gval.Comparison.available_functions().

  • metrics (Union[str, Iterable[str]], default = "all") – Statistics to return in metric table. All returns every default and registered metric. This can be seen with gval.CatStats.available_functions().

  • target_map (Optional[Union[xr.Dataset, str]], default = "benchmark") – xarray object to match the CRS’s and coordinates of candidates and benchmarks to or str with ‘candidate’ or ‘benchmark’ as accepted values.

  • resampling (rasterio.enums.Resampling) – See rasterio.warp.reproject() for more details.

  • pairing_dict (Optional[Dict[Tuple[Number, Number], Number]], default = None) –

    When “pairing_dict” is used for the comparison_function argument, a pairing dictionary can be passed by user. A pairing dictionary is structured as {(c, b) : a} where (c, b) is a tuple of the candidate and benchmark value pairing, respectively, and a is the value for the agreement array to be used for this pairing.

    If None is passed for pairing_dict, the allow_candidate_values and allow_benchmark_values arguments are required. For this case, the pairings in these two iterables will be paired in the order provided and an agreement value will be assigned to each pairing starting with 0 and ending with the number of possible pairings.

    A pairing dictionary can be used by the user to note which values to allow and which to ignore for comparisons. It can also be used to decide how nans are handled for cases where either the candidate and benchmark maps have nans or both.

  • allow_candidate_values (Optional[Iterable[Union[int,float]]], default = None) – List of values in candidate to include in computation of agreement map. Remaining values are excluded. If “pairing_dict” is provided for comparison_function and pairing_function is None, this argument is necessary to construct the dictionary. Otherwise, this argument is optional and by default this value is set to None and all values are considered.

  • allow_benchmark_values (Optional[Iterable[Union[int,float]]], default = None) – List of values in benchmark to include in computation of agreement map. Remaining values are excluded. If “pairing_dict” is provided for comparison_function and pairing_function is None, this argument is necessary to construct the dictionary. Otherwise, this argument is optional and by default this value is set to None and all values are considered.

  • nodata (Optional[Number], default = None) – No data value to write to agreement map output. This will use rxr.rio.write_nodata(nodata).

  • encode_nodata (Optional[bool], default = False) – Encoded no data value to write to agreement map output. A nodata argument must be passed. This will use rxr.rio.write_nodata(nodata, encode=encode_nodata).

  • exclude_value (Optional[Number], default = None) – Value to exclude from crosstab. This could be used to denote a no data value if masking wasn’t used. By default, NaNs are not cross-tabulated.

  • negative_categories (Optional[Union[Number, Iterable[Number]]], default = None) – Number or list of numbers representing the values to consider as the negative condition. This should be set to None when no negative categories are used or when the average type is “macro” or “weighted”.

  • average (str, default = "micro") – Type of average to use when computing metrics. Options are “micro”, “macro”, and “weighted”. Micro weighing computes the conditions, tp, tn, fp, and fn, for each category and then sums them. Macro weighing computes the metrics for each category then averages them. Weighted average computes the metrics for each category then averages them weighted by the number of weights argument in each category.

  • weights (Optional[Iterable[Number]], default = None) –

    Weights to use when computing weighted average, specifically when the average argument is “weighted”. Elements correspond to positive categories in order.

    Example:

    positive_categories = [1, 2]; weights = [0.25, 0.75]

  • rasterize_attributes (Optional[list], default = None) – Numerical attributes of a Benchmark Map GeoDataFrame to rasterize. Only applicable if benchmark map is a vector file. This cannot be none if the benchmark map is a vector file.

  • attribute_tracking (bool, default = False) – Whether to return a dataframe with the attributes of the candidate and benchmark maps.

  • attribute_tracking_kwargs (Optional[Dict], default = None) – Keyword arguments to pass to gval.attribute_tracking(). This is only used if attribute_tracking is True. By default, agreement maps are used for attribute tracking but this can be set to None within this argument to override. See gval.attribute_tracking for more information.

  • subsampling_df (Optional[gpd.GeoDataFrame], default = None) – DataFrame with spatial geometries and method types to subsample

  • subsampling_average (Optional[str], default = None) – Way to aggregate statistics for subsamples if provided. Options are “sample”, “band”, and “full-detail” Sample calculates metrics and averages the results by subsample Band calculates metrics and averages all the metrics by band Full-detail does not aggregation on subsample or band

Returns:

  • Union[ – Tuple[Union[xr.Dataset, xr.DataArray], DataFrame[Crosstab_df], DataFrame[Metrics_df]], Tuple[Union[xr.Dataset, xr.DataArray], DataFrame[Crosstab_df], DataFrame[Metrics_df], DataFrame[AttributeTrackingDf]]

  • ] – Tuple with agreement map/s, cross-tabulation table, and metric table. Possibly attribute tracking table as well.

check_same_type(benchmark_map: Dataset | DataArray)

Makes sure benchmark map is the same data type as the candidate object

Parameters:

benchmark_map (Union[xr.Dataset, xr.DataArray]) – Benchmark Map

Raises:

TypeError

compute_agreement_map(benchmark_map: Dataset | DataArray, comparison_function: Callable | DUFunc | ufunc | vectorize | str = 'szudzik', pairing_dict: Dict[Tuple[Number, Number], Number] | None = None, allow_candidate_values: Iterable[int | float] | None = None, allow_benchmark_values: Iterable[int | float] | None = None, nodata: Number | None = None, encode_nodata: bool | None = False, subsampling_df: GeoDataFrame | None = None, continuous: bool = False) Dataset | DataArray | List[Dataset | DataArray]

Computes agreement map as xarray from candidate and benchmark xarray’s.

Parameters:
  • benchmark_map (Union[xr.Dataset, xr.DataArray]) – Benchmark map.

  • comparison_function (Union[Callable, nb.np.ufunc.dufunc.DUFunc, np.ufunc, np.vectorize, str], default = 'szudzik') – Comparison function. Created by decorating function with @nb.vectorize() or using np.ufunc(). Use of numba is preferred as it is faster. Strings with registered comparison_functions are also accepted. Possible options include “pairing_dict”. If passing “pairing_dict” value, please see the description for the argument for more information on behaviour.

  • pairing_dict (Optional[Dict[Tuple[Number, Number], Number]], default = None) –

    When “pairing_dict” is used for the comparison_function argument, a pairing dictionary can be passed by user. A pairing dictionary is structured as {(c, b) : a} where (c, b) is a tuple of the candidate and benchmark value pairing, respectively, and a is the value for the agreement array to be used for this pairing.

    If None is passed for pairing_dict, the allow_candidate_values and allow_benchmark_values arguments are required. For this case, the pairings in these two iterables will be paired in the order provided and an agreement value will be assigned to each pairing starting with 0 and ending with the number of possible pairings.

    A pairing dictionary can be used by the user to note which values to allow and which to ignore for comparisons. It can also be used to decide how nans are handled for cases where either the candidate and benchmark maps have nans or both.

  • allow_candidate_values (Optional[Iterable[Union[int,float]]], default = None) – List of values in candidate to include in computation of agreement map. Remaining values are excluded. If “pairing_dict” is set selected for comparison_function and pairing_function is None, this argument is necessary to construct the dictionary. Otherwise, this argument is optional and by default this value is set to None and all values are considered.

  • allow_benchmark_values (Optional[Iterable[Union[int,float]]], default = None) – List of values in benchmark to include in computation of agreement map. Remaining values are excluded. If “pairing_dict” is set selected for comparison_function and pairing_function is None, this argument is necessary to construct the dictionary. Otherwise, this argument is optional and by default this value is set to None and all values are considered.

  • nodata (Optional[Number], default = None) – No data value to write to agreement map output. This will use rxr.rio.write_nodata(nodata).

  • encode_nodata (Optional[bool], default = False) – Encoded no data value to write to agreement map output. A nodata argument must be passed. This will use rxr.rio.write_nodata(nodata, encode=encode_nodata).

  • subsampling_df (Optional[gpd.GeoDataFrame], default = None) – DataFrame with geometries to subsample data with or use as an exclusionary mask

  • continuous (bool, default = False) – Whether to return modified candidate and benchmark maps

Returns:

Agreement map.

Return type:

Union[Union[xr.Dataset, xr.DataArray, List[Union[xr.Dataset, xr.DataArray]]]]

compute_crosstab(agreement_map: DataArray | Dataset | Iterable[DataArray | Dataset] | None = None, subsampling_df: GeoDataFrame | None = None) DataFrame[Crosstab_df]

Crosstab 2 or 3-dimensional xarray DataArray to produce Crosstab DataFrame.

Parameters:
  • agreement_map (Union[xr.Dataset, xr.DataArray], default = None) – Benchmark map, {dimension}-dimensional.

  • subsampling_df (Optional[gpd.GeoDataFrame], default = None) – DataFrame with spatial geometries and method types to subsample

Returns:

Crosstab DataFrame

Return type:

DataFrame[Crosstab_df]

cont_plot(title: str = 'Continuous Map', colormap: str = 'viridis', figsize: Tuple[int, int] | None = None, plot_bands: str | list = 'all', colorbar_label: str | list = '', basemap: TileProvider = {'attribution': '(C) OpenStreetMap contributors', 'html_attribution': '&copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors', 'max_zoom': 19, 'name': 'OpenStreetMap.Mapnik', 'url': 'https://tile.openstreetmap.org/{z}/{x}/{y}.png'})

Plots categorical Map for xarray object

Parameters:
  • title (str) – Title of map, default = “Categorical Map”

  • colormap (str, default = "viridis") – Colormap of data

  • figsize (tuple[int, int], default=None) – Size of the plot

  • plot_bands (Union[str, list], default='all') – What bands to plot

  • colorbar_label (Union[str, list], default ="") – Label or labels for colorbar in the case of continuous plots

  • basemap (Union[bool, xyzservices.lib.TileProvider], default = cx.providers.Stamen.Terrain) – Add basemap to the plot

References

continuous_compare(benchmark_map: GeoDataFrame | Dataset | DataArray, metrics: str | Iterable[str] = 'all', target_map: Dataset | str | None = 'benchmark', resampling: Resampling | None = Resampling.nearest, nodata: Number | None = None, encode_nodata: bool | None = False, rasterize_attributes: list | None = None, attribute_tracking: bool = False, attribute_tracking_kwargs: Dict | None = None, subsampling_df: GeoDataFrame | None = None, subsampling_average: str = 'none') Tuple[Tuple[Dataset | DataArray, DataFrame[Metrics_df]] | Tuple[Dataset | DataArray, DataFrame[Metrics_df], DataFrame[AttributeTrackingDf]]]

Computes comparison between two continuous value xarray’s.

Conducts the following steps:
  • homogenize: aligns data types, spatial alignment, and rasterizes data

  • compute_agreement: computes agreement map which is error or candidate minus benchmark

  • compute_metrics: computes metrics

Spatially aligning the xarray’s produces copies of the original candidate and benchmark maps. To reduce memory usage, consider using the homogenize() accessor method to overwrite the original maps in memory or saving them on disk.

Parameters:
  • benchmark_map (Union[gpd.GeoDataFrame, xr.DataArray, xr.Dataset]) – Benchmark map.

  • metrics (Union[str, Iterable[str]], default = "all") – Statistics to return in metric table. This can be seen with gval.ContStats.available_functions().

  • target_map (Optional[Union[xr.Dataset, str]], default = "benchmark") – xarray object to match the CRS’s and coordinates of candidates and benchmarks to or str with ‘candidate’ or ‘benchmark’ as accepted values.

  • resampling (rasterio.enums.Resampling) – See rasterio.warp.reproject() for more details.

  • nodata (Optional[Number], default = None) – No data value to write to agreement map output. This will use rxr.rio.write_nodata(nodata).

  • encode_nodata (Optional[bool], default = False) – Encoded no data value to write to agreement map output. A nodata argument must be passed. This will use rxr.rio.write_nodata(nodata, encode=encode_nodata).

  • rasterize_attributes (Optional[list], default = None) – Numerical attributes of a GeoDataFrame to rasterize.

  • attribute_tracking (bool, default = False) – Whether to return a dataframe with the attributes of the candidate and benchmark maps.

  • attribute_tracking_kwargs (Optional[Dict], default = None) – Keyword arguments to pass to gval.attribute_tracking(). This is only used if attribute_tracking is True. By default, agreement maps are used for attribute tracking but this can be set to None within this argument to override. See gval.attribute_tracking for more information.

  • subsampling_df (Optional[gpd.GeoDataFrame], default = None) – DataFrame with spatial geometries and method types to subsample

  • subsampling_average (str, default = None) – Way to aggregate statistics for subsamples if provided. Options are “sample”, “band”, “weighted”, and “none” Sample calculates metrics and averages the results by subsample Band calculates metrics and averages all the metrics by band Weighted calculates metrics, scales by the weight and then averages them based on the weights Full-detail provides full detailed table

Returns:

  • Union[ – Tuple[Union[xr.Dataset, xr.DataArray], DataFrame[Metrics_df]], Tuple[Union[xr.Dataset, xr.DataArray], DataFrame[Metrics_df], DataFrame[AttributeTrackingDf]]

  • ] – Tuple with agreement map and metric table, possibly attribute tracking table as well.

homogenize(benchmark_map: GeoDataFrame | Dataset | DataArray, target_map: Dataset | str | None = 'benchmark', resampling: Resampling | None = Resampling.nearest, rasterize_attributes: list | None = None) Dataset | DataArray

Homogenize candidate and benchmark maps to prepare for comparison.

Currently supported operations include:
  • Matching projections and coordinates (spatial alignment)

  • Homogenize file formats (xarray/rasters)

  • Homogenize numerical data types (int, float, etc.).

Parameters:
  • benchmark_map (Union[gpd.GeoDataFrame, xr.Dataset, xr.DataArray]) – Benchmark map.

  • target_map (Optional[Union[xr.DataArray, xr.Dataset, str]], default = "benchmark") – xarray object to match candidates and benchmarks to or str with ‘candidate’ or ‘benchmark’ as accepted values.

  • resampling (rasterio.enums.Resampling) – See rasterio.warp.reproject() for more details.

  • rasterize_attributes (Optional[list], default = None) – Numerical attributes of a GeoDataFrame to rasterize

Returns:

Tuple with candidate and benchmark map respectively.

Return type:

Union[xr.Dataset, xr.DataArray]

probabilistic_compare(benchmark_map: GeoDataFrame | Dataset | DataArray, metric_kwargs: dict, return_on_error: Any | None = None, target_map: Dataset | str | None = 'benchmark', resampling: Resampling | None = Resampling.nearest, rasterize_attributes: list | None = None, attribute_tracking: bool = False, attribute_tracking_kwargs: Dict | None = None) DataFrame[Prob_metrics_df]

Computes probabilistic metrics from candidate and benchmark maps.

Parameters:
  • benchmark_map (xr.DataArray or xr.Dataset) – Benchmark map.

  • metric_kwargs (dict) – Dictionary of keyword arguments to metric functions. Keys must be metrics. Values are keyword arguments to metric functions. Don’t pass keys or values for ‘observations’ or ‘forecasts’ as these are handled internally with benchmark_map and candidate_map, respectively. Available keyword arguments by metric are available in DEFAULT_METRIC_KWARGS. If values are None or empty dictionary, default values in DEFAULT_METRIC_KWARGS are used.

  • return_on_error (Optional[Any], default = None) – Value to return within metrics dataframe if an error occurs when computing a metric. If None, the metric is not computed and None is returned. If ‘error’, the raised error is returned.

  • target_map (Optional[xr.Dataset or str], default = "benchmark") – xarray object to match the CRS’s and coordinates of candidates and benchmarks to or str with ‘candidate’ or ‘benchmark’ as accepted values.

  • resampling (rasterio.enums.Resampling) – See rasterio.warp.reproject() for more details.

  • nodata (Optional[Number], default = None) – No data value to write to agreement map output. This will use rxr.rio.write_nodata(nodata).

  • encode_nodata (Optional[bool], default = False) – Encoded no data value to write to agreement map output. A nodata argument must be passed. This will use rxr.rio.write_nodata(nodata, encode=encode_nodata).

  • rasterize_attributes (Optional[list], default = None) – Numerical attributes of a GeoDataFrame to rasterize.

  • attribute_tracking (bool, default = False) – Whether to return a dataframe with the attributes of the candidate and benchmark maps.

  • attribute_tracking_kwargs (Optional[Dict], default = None) – Keyword arguments to pass to gval.attribute_tracking(). This is only used if attribute_tracking is True. By default, agreement maps are used for attribute tracking but this can be set to None within this argument to override. See gval.attribute_tracking for more information.

Returns:

Probabilistic metrics Pandas DataFrame with computed xarray’s per metric and sample.

Return type:

DataFrame[Prob_metrics_df]

Raises:

ValueError – If keyword argument is required for metric but not passed. If keyword argument is not available for metric but passed. If metric is not available.

Warns:

UserWarning – Warns if a metric cannot be computed. return_on_error determines whether the metric is not computed and None is returned or if the raised error is returned.

References

vectorize_data() GeoDataFrame

Vectorize an xarray DataArray or Dataset

Returns:

Vectorized data

Return type:

gpd.GeoDataFrame