DataFrame Functionality

class gval.accessors.gval_dataframe.GVALDataFrame(pandas_obj)

Class for extending pandas DataFrame functionality

Object to use off the accessor

Type:: pd.DataFrame

compute_categorical_metrics(positive_categories: Number | Iterable[Number], negative_categories: Number | Iterable[Number], metrics: str | Iterable[str] = 'all', average: str = 'micro', weights: Iterable[Number] | None = None, subsampling_average: str | None = None) → DataFrame[Metrics_df]

Computes categorical metrics from a crosstab df.

Parameters:

crosstab_df (DataFrame[Crosstab_df]) – Crosstab DataFrame with candidate, benchmark, and agreement values as well as the counts for each occurrence.
positive_categories (Optional[Union[Number, Iterable[Number]]]) – Number or list of numbers representing the values to consider as the positive condition. For average types “macro” and “weighted”, this represents the categories to compute metrics for.
negative_categories (Optional[Union[Number, Iterable[Number]]], default = None) – Number or list of numbers representing the values to consider as the negative condition. This should be set to None when no negative categories are used or when the average type is “macro” or “weighted”.
metrics (Union[str, Iterable[str]], default = "all") – String or list of strings representing metrics to compute.
average (str, default = "micro") – Type of average to use when computing metrics. Options are “micro”, “macro”, and “weighted”. Micro weighing computes the conditions, tp, tn, fp, and fn, for each category and then sums them. Macro weighing computes the metrics for each category then averages them. Weighted average computes the metrics for each category then averages them weighted by the number of weights argument in each category.
weights (Optional[Iterable[Number]], default = None) –
Weights to use when computing weighted average. Elements correspond to positive categories in order.

Example:

positive_categories = [1, 2]; weights = [0.25, 0.75]
subsampling_average (Optional[str], default = None) – Way to aggregate statistics for subsamples if provided. Options are “sample”, “band”, and “full-detail” Sample calculates metrics and averages the results by subsample Band calculates metrics and averages all the metrics by band Full-detail does not aggregation on subsample or band

Returns:

Metrics DF with computed metrics per sample.

Return type:

DataFrame[Metrics_df]

Raises:

ValueError – Value is shared in positive and negative categories.
ValueError – Category not found in crosstab df.
ValueError – Cannot use average type with only one positive category.
ValueError – Number of weights must be the same as the number of positive categories.
ValueError – Cannot use average type with negative_categories as not None. Set negative_categories to None for this average type.

References

Parameters:

geometries (List[Geometry], default = None) – Geometries if none are already in the GeoDataFrame
crs (str) – The spatial reference for the geometries provided
subsampling_type (Union[str, List[str]], default = "exclude") – Whether each geometry should be an inclusive subsample or an exclusionary mask
subsampling_weights (List[Union[int, float]], default = None) – Values to scale the numeric impact of a particular sample
inplace (bool, default = False) – Whether to adjust the GeoDataFrame calling the operation or a return a new one

Raises:

ValueError – List provided has more or less entries than the DataFrame
TypeError – CRS must be provided if geometries are provided

Returns:

GeoDataFrame adhering to subsampling dataframe if not inplace, otherwise None

Return type:

Union[None, SubsamplingDf]

rasterize_data(reference_map: Dataset | DataArray, rasterize_attributes: list) → Dataset | DataArray

Convenience function for rasterizing vector data using a reference raster. For more control use make_geocube from the geocube package.

Parameters:

reference_map (Union[xr.Dataset, xr.DataArray]) – Map to reference in creation of rasterized vector map
rasterize_attributes (list) – Attributes to rasterize

Returns:

Rasterized Data

Return type:

Union[xr.Dataset, xr.DataArray]

Raises:

KeyError –

References