1c725da0fbec4ec3a60b105dcca44eca

Catalog Comparisons

[1]:

import pandas as pd
import rioxarray as rxr

from gval.catalogs.catalogs import catalog_compare

Initializing Catalogs

The cataloging functionality was designed to easily facilitate batch comparisons of maps residing locally, in a service, or in the cloud. The format of such catalogs are as follows:

[2]:

TEST_DATA_DIR = './'

candidate_continuous_catalog = pd.read_csv(f'{TEST_DATA_DIR}candidate_catalog_0.csv')
benchmark_continuous_catalog = pd.read_csv(f'{TEST_DATA_DIR}benchmark_catalog_0.csv')
candidate_categorical_catalog = pd.read_csv(f'{TEST_DATA_DIR}candidate_catalog_1.csv')
benchmark_categorical_catalog = pd.read_csv(f'{TEST_DATA_DIR}benchmark_catalog_1.csv')

Candidate Catalog

[3]:

candidate_categorical_catalog['catalog_attribute_1'] = [1, 2]
candidate_categorical_catalog

[3]:

	map_id	compare_id	agreement_maps	catalog_attribute_1
0	./candidate_categorical_0.tif	compare1	agreement_categorical_0.tif	1
1	./candidate_categorical_1.tif	compare2	agreement_categorical_1.tif	2

The catalog should have columns representing: 1. An identifier of a candidate map, (in this case compare_id) 2. The location of the candidate map, (in this case map_id) 3. The name of the agreement map to be created named agreement_maps

Benchmark Catalog

[4]:

benchmark_categorical_catalog['catalog_attribute_2'] = [3, 4]
benchmark_categorical_catalog

[4]:

	map_id	compare_id	catalog_attribute_2
0	./benchmark_categorical_0.tif	compare1	3
1	./benchmark_categorical_1.tif	compare2	4

Similar to the previous catalog, the benchmark catalog should have columns representing: 1. An identifier of a candidate map, (in this case compare_id) 2. The location of the candidate map, (in this case map_id)

Categorical Catalog Comparison

When compare_type is set to ‘categorical’ the catalog will be run as categorical comparisons. See arguments and output below for the comparison metrics:

[5]:

arguments = {
    "candidate_catalog": candidate_categorical_catalog,
    "benchmark_catalog": benchmark_categorical_catalog,
    "on": "compare_id",
    "map_ids": "map_id",
    "how": "inner",
    "compare_type": "categorical",
    "compare_kwargs": {
        "metrics": (
            "critical_success_index",
            "true_positive_rate",
            "positive_predictive_value",
        ),
        "encode_nodata": True,
        "nodata": -9999,
        "positive_categories": 2,
        "negative_categories": 1
    },
    "open_kwargs": {
        "mask_and_scale": True,
        "masked": True
    }
}

agreement_categorical_catalog = catalog_compare(**arguments)
agreement_categorical_catalog.transpose()

[5]:

	0	1	2
map_id_candidate	./candidate_categorical_0.tif	./candidate_categorical_1.tif	./candidate_categorical_1.tif
compare_id	compare1	compare2	compare2
agreement_maps	agreement_categorical_0.tif	agreement_categorical_1.tif	agreement_categorical_1.tif
catalog_attribute_1	1	2	2
map_id_benchmark	./benchmark_categorical_0.tif	./benchmark_categorical_1.tif	./benchmark_categorical_1.tif
catalog_attribute_2	3	4	4
band	1	1	2
fn	844.0	844.0	844.0
fp	844.0	844.0	844.0
tn	5939.0	5939.0	5939.0
tp	1977.0	1977.0	1977.0
critical_success_index	0.539427	0.539427	0.539427
true_positive_rate	0.700815	0.700815	0.700815
positive_predictive_value	0.700815	0.700815	0.700815

We can see the agreement maps below (and why the metrics are similar as the datasets were essentially equivalent):

[6]:

for ag_map in agreement_categorical_catalog['agreement_maps'].unique():
    rxr.open_rasterio(ag_map, mask_and_scale=True).gval.cat_plot(
        title=f'Agreement Map {int(ag_map.split("_")[-1][0]) + 1}'
    )

Continuous Catalog Compare

The continuous catalogs are as follows:

[7]:

candidate_continuous_catalog['catalog_attribute_1'] = [1, 2]
candidate_continuous_catalog

[7]:

	map_id	compare_id	agreement_maps	catalog_attribute_1
0	./candidate_continuous_0.tif	compare1	./agreement_continuous_0.tif	1
1	./candidate_continuous_1.tif	compare2	./agreement_continuous_1.tif	2

[8]:

benchmark_continuous_catalog['catalog_attribute_2'] = [3, 4]
benchmark_continuous_catalog

[8]:

	map_id	compare_id	catalog_attribute_2
0	./benchmark_continuous_0.tif	compare1	3
1	./benchmark_continuous_1.tif	compare2	4

Just like before, compare_type is set to ‘continuous’ and the catalog will be run as continuous comparisons:

[9]:

arguments = {
    "candidate_catalog": candidate_continuous_catalog,
    "benchmark_catalog": benchmark_continuous_catalog,
    "on": "compare_id",
    "map_ids": "map_id",
    "how": "inner",
    "compare_type": "continuous",
    "compare_kwargs": {
        "metrics": (
            "coefficient_of_determination",
            "mean_absolute_error",
            "mean_absolute_percentage_error",
        ),
        "encode_nodata": True,
        "nodata": -9999,
    },
    "open_kwargs": {
        "mask_and_scale": True,
        "masked": True
    }
}

agreement_continuous_catalog = catalog_compare(**arguments)
agreement_continuous_catalog.transpose()

[9]:

	0	1	2
map_id_candidate	./candidate_continuous_0.tif	./candidate_continuous_1.tif	./candidate_continuous_1.tif
compare_id	compare1	compare2	compare2
agreement_maps	./agreement_continuous_0.tif	./agreement_continuous_1.tif	./agreement_continuous_1.tif
catalog_attribute_1	1	2	2
map_id_benchmark	./benchmark_continuous_0.tif	./benchmark_continuous_1.tif	./benchmark_continuous_1.tif
catalog_attribute_2	3	4	4
band	1	1	2
coefficient_of_determination	-0.06616	-2.829421	0.10903
mean_absolute_error	0.317389	0.485031	0.485031
mean_absolute_percentage_error	0.159568	0.202235	0.153235

We can see the continuous agreement maps below:

[10]:

for ag_map in agreement_continuous_catalog['agreement_maps'].unique():
    rxr.open_rasterio(ag_map, mask_and_scale=True).gval.cont_plot(
        title=f'Agreement Map {int(ag_map.split("_")[-1][0]) + 1}'
    )