
Catalog Comparisons

import pandas as pd
import rioxarray as rxr

from gval.catalogs.catalogs import catalog_compare

Initializing Catalogs

The cataloging functionality was designed to easily facilitate batch comparisons of maps residing locally, in a service, or in the cloud. The format of such catalogs are as follows:


candidate_continuous_catalog = pd.read_csv(f'{TEST_DATA_DIR}candidate_catalog_0.csv')
benchmark_continuous_catalog = pd.read_csv(f'{TEST_DATA_DIR}benchmark_catalog_0.csv')
candidate_categorical_catalog = pd.read_csv(f'{TEST_DATA_DIR}candidate_catalog_1.csv')
benchmark_categorical_catalog = pd.read_csv(f'{TEST_DATA_DIR}benchmark_catalog_1.csv')

Candidate Catalog

candidate_categorical_catalog['catalog_attribute_1'] = [1, 2]
map_id compare_id agreement_maps catalog_attribute_1
0 ./candidate_categorical_0.tif compare1 agreement_categorical_0.tif 1
1 ./candidate_categorical_1.tif compare2 agreement_categorical_1.tif 2

The catalog should have columns representing: 1. An identifier of a candidate map, (in this case compare_id) 2. The location of the candidate map, (in this case map_id) 3. The name of the agreement map to be created named agreement_maps

Benchmark Catalog

benchmark_categorical_catalog['catalog_attribute_2'] = [3, 4]
map_id compare_id catalog_attribute_2
0 ./benchmark_categorical_0.tif compare1 3
1 ./benchmark_categorical_1.tif compare2 4

Similar to the previous catalog, the benchmark catalog should have columns representing: 1. An identifier of a candidate map, (in this case compare_id) 2. The location of the candidate map, (in this case map_id)

Categorical Catalog Comparison

When compare_type is set to ‘categorical’ the catalog will be run as categorical comparisons. See arguments and output below for the comparison metrics:

arguments = {
    "candidate_catalog": candidate_categorical_catalog,
    "benchmark_catalog": benchmark_categorical_catalog,
    "on": "compare_id",
    "map_ids": "map_id",
    "how": "inner",
    "compare_type": "categorical",
    "compare_kwargs": {
        "metrics": (
        "encode_nodata": True,
        "nodata": -9999,
        "positive_categories": 2,
        "negative_categories": 1
    "open_kwargs": {
        "mask_and_scale": True,
        "masked": True

agreement_categorical_catalog = catalog_compare(**arguments)
0 1 2
map_id_candidate ./candidate_categorical_0.tif ./candidate_categorical_1.tif ./candidate_categorical_1.tif
compare_id compare1 compare2 compare2
agreement_maps agreement_categorical_0.tif agreement_categorical_1.tif agreement_categorical_1.tif
catalog_attribute_1 1 2 2
map_id_benchmark ./benchmark_categorical_0.tif ./benchmark_categorical_1.tif ./benchmark_categorical_1.tif
catalog_attribute_2 3 4 4
band 1 1 2
fn 844.0 844.0 844.0
fp 844.0 844.0 844.0
tn 5939.0 5939.0 5939.0
tp 1977.0 1977.0 1977.0
critical_success_index 0.539427 0.539427 0.539427
true_positive_rate 0.700815 0.700815 0.700815
positive_predictive_value 0.700815 0.700815 0.700815

We can see the agreement maps below (and why the metrics are similar as the datasets were essentially equivalent):

for ag_map in agreement_categorical_catalog['agreement_maps'].unique():
    rxr.open_rasterio(ag_map, mask_and_scale=True).gval.cat_plot(
        title=f'Agreement Map {int(ag_map.split("_")[-1][0]) + 1}'

Continuous Catalog Compare

The continuous catalogs are as follows:

candidate_continuous_catalog['catalog_attribute_1'] = [1, 2]
map_id compare_id agreement_maps catalog_attribute_1
0 ./candidate_continuous_0.tif compare1 ./agreement_continuous_0.tif 1
1 ./candidate_continuous_1.tif compare2 ./agreement_continuous_1.tif 2
benchmark_continuous_catalog['catalog_attribute_2'] = [3, 4]
map_id compare_id catalog_attribute_2
0 ./benchmark_continuous_0.tif compare1 3
1 ./benchmark_continuous_1.tif compare2 4

Just like before, compare_type is set to ‘continuous’ and the catalog will be run as continuous comparisons:

arguments = {
    "candidate_catalog": candidate_continuous_catalog,
    "benchmark_catalog": benchmark_continuous_catalog,
    "on": "compare_id",
    "map_ids": "map_id",
    "how": "inner",
    "compare_type": "continuous",
    "compare_kwargs": {
        "metrics": (
        "encode_nodata": True,
        "nodata": -9999,
    "open_kwargs": {
        "mask_and_scale": True,
        "masked": True

agreement_continuous_catalog = catalog_compare(**arguments)
0 1 2
map_id_candidate ./candidate_continuous_0.tif ./candidate_continuous_1.tif ./candidate_continuous_1.tif
compare_id compare1 compare2 compare2
agreement_maps ./agreement_continuous_0.tif ./agreement_continuous_1.tif ./agreement_continuous_1.tif
catalog_attribute_1 1 2 2
map_id_benchmark ./benchmark_continuous_0.tif ./benchmark_continuous_1.tif ./benchmark_continuous_1.tif
catalog_attribute_2 3 4 4
band 1 1 2
coefficient_of_determination -0.06616 -2.829421 0.10903
mean_absolute_error 0.317389 0.485031 0.485031
mean_absolute_percentage_error 0.159568 0.202235 0.153235

We can see the continuous agreement maps below:

for ag_map in agreement_continuous_catalog['agreement_maps'].unique():
    rxr.open_rasterio(ag_map, mask_and_scale=True).gval.cont_plot(
        title=f'Agreement Map {int(ag_map.split("_")[-1][0]) + 1}'