Catalogs
Compare catalogs of candidates and benchmarks.
- gval.catalogs.catalogs.catalog_compare(candidate_catalog: DataFrame | DataFrame, benchmark_catalog: DataFrame | DataFrame, map_ids: str | Iterable[str], how: str = 'inner', on: str | Iterable[str] | None = None, left_on: str | Iterable[str] | None = None, right_on: str | Iterable[str] | None = None, suffixes: tuple[str, str] = ('_candidate', '_benchmark'), merge_kwargs: dict | None = None, open_kwargs: dict | None = None, compare_type: str | Callable = 'continuous', compare_kwargs: dict | None = None, agreement_map_field: str | None = None, agreement_map_write_kwargs: dict | None = None) DataFrame | DataFrame
Compare catalogs of candidate and benchmark maps.
- Parameters:
candidate_catalog (pandas.DataFrame | dask.DataFrame) – Candidate catalog.
benchmark_catalog (pandas.DataFrame | dask.DataFrame) – Benchmark catalog.
map_ids (str | Iterable of str) –
Column name(s) where maps or paths to maps occur. If str is given, then the same value should occur in both catalogs. If Iterable[str] is given of length 2, then the column names where maps are will be in [candidate, benchmark] respectively.
The columns corresponding to map_ids should have either str, xarray.DataArray, xarray.Dataset, rasterio.io.DatasetReader, rasterio.vrt.WarpedVRT, or os.PathLike objects.
how (str, default = "inner") – Type of merge to perform. See pandas.DataFrame.merge for more information.
on (str | Iterable of str, default = None) – Column(s) to join on. Must be found in both catalogs. If None, and left_on and right_on are also None, then the intersection of the columns in both catalogs will be used.
left_on (str | Iterable of str, default = None) – Column(s) to join on in left catalog. Must be found in left catalog.
right_on (str | Iterable of str, default = None) – Column(s) to join on in right catalog. Must be found in right catalog.
suffixes (tuple of str, default = ("_candidate", "_benchmark")) – Suffixes to apply to overlapping column names in candidate and benchmark catalogs, respectively. Length two tuple of strings.
merge_kwargs (dict, default = None) – Keyword arguments to pass to pandas.DataFrame.merge.
compare_type (str | Callable, default = "continuous") – Type of comparison to perform. If str, then must be one of {“continuous”, “categorical”, “probabilistic”}. If Callable, then must be a function that takes two xarray.DataArray or xarray.Dataset objects and returns a tuple of length 2. The first element of the tuple must be an xarray.DataArray or xarray.Dataset object representing the agreement map. The second element of the tuple must be a pandas.DataFrame object representing the metrics.
compare_kwargs (dict, default = None) – Keyword arguments to pass to the compare_type function.
agreement_map_field (str, default = None) – Column name to write agreement maps to. If None, then agreement maps will not be written to file.
agreement_map_write_kwargs (dict, default = None) – Keyword arguments to pass to xarray.DataArray.rio.to_raster when writing agreement maps to file.
- Raises:
ValueError – If map_ids is not str or Iterable of str. If compare_type is not str or Callable. If compare_type is str and not one of {“continuous”, “categorical”, “probabilistic”}.
NotImplementedError – If compare_type is “probabilistic”.
- Returns:
Agreement catalog.
- Return type:
pandas.DataFrame | dask.DataFrame