Catalogs

Return to Homepage

Compare catalogs of candidates and benchmarks.

gval.catalogs.catalogs.catalog_compare(candidate_catalog: DataFrame | DataFrame, benchmark_catalog: DataFrame | DataFrame, map_ids: str | Iterable[str], how: str = 'inner', on: str | Iterable[str] | None = None, left_on: str | Iterable[str] | None = None, right_on: str | Iterable[str] | None = None, suffixes: tuple[str, str] = ('_candidate', '_benchmark'), merge_kwargs: dict | None = None, open_kwargs: dict | None = None, compare_type: str | Callable = 'continuous', compare_kwargs: dict | None = None, agreement_map_field: str | None = None, agreement_map_write_kwargs: dict | None = None) DataFrame | DataFrame

Compare catalogs of candidate and benchmark maps.

Parameters:
  • candidate_catalog (pandas.DataFrame | dask.DataFrame) – Candidate catalog.

  • benchmark_catalog (pandas.DataFrame | dask.DataFrame) – Benchmark catalog.

  • map_ids (str | Iterable of str) –

    Column name(s) where maps or paths to maps occur. If str is given, then the same value should occur in both catalogs. If Iterable[str] is given of length 2, then the column names where maps are will be in [candidate, benchmark] respectively.

    The columns corresponding to map_ids should have either str, xarray.DataArray, xarray.Dataset, rasterio.io.DatasetReader, rasterio.vrt.WarpedVRT, or os.PathLike objects.

  • how (str, default = "inner") – Type of merge to perform. See pandas.DataFrame.merge for more information.

  • on (str | Iterable of str, default = None) – Column(s) to join on. Must be found in both catalogs. If None, and left_on and right_on are also None, then the intersection of the columns in both catalogs will be used.

  • left_on (str | Iterable of str, default = None) – Column(s) to join on in left catalog. Must be found in left catalog.

  • right_on (str | Iterable of str, default = None) – Column(s) to join on in right catalog. Must be found in right catalog.

  • suffixes (tuple of str, default = ("_candidate", "_benchmark")) – Suffixes to apply to overlapping column names in candidate and benchmark catalogs, respectively. Length two tuple of strings.

  • merge_kwargs (dict, default = None) – Keyword arguments to pass to pandas.DataFrame.merge.

  • compare_type (str | Callable, default = "continuous") – Type of comparison to perform. If str, then must be one of {“continuous”, “categorical”, “probabilistic”}. If Callable, then must be a function that takes two xarray.DataArray or xarray.Dataset objects and returns a tuple of length 2. The first element of the tuple must be an xarray.DataArray or xarray.Dataset object representing the agreement map. The second element of the tuple must be a pandas.DataFrame object representing the metrics.

  • compare_kwargs (dict, default = None) – Keyword arguments to pass to the compare_type function.

  • agreement_map_field (str, default = None) – Column name to write agreement maps to. If None, then agreement maps will not be written to file.

  • agreement_map_write_kwargs (dict, default = None) – Keyword arguments to pass to xarray.DataArray.rio.to_raster when writing agreement maps to file.

Raises:
  • ValueError – If map_ids is not str or Iterable of str. If compare_type is not str or Callable. If compare_type is str and not one of {“continuous”, “categorical”, “probabilistic”}.

  • NotImplementedError – If compare_type is “probabilistic”.

Returns:

Agreement catalog.

Return type:

pandas.DataFrame | dask.DataFrame