abc2e4d120fa41d59de62d15ea40d8a2

Multi-Class Categorical Comparisons

[1]:

import rioxarray as rxr
import gval
import numpy as np
import pandas as pd
import xarray as xr
from itertools import product

pd.set_option('display.max_columns', None)

Load Datasets

[2]:

candidate = rxr.open_rasterio(
    "./candidate_map_multi_categorical.tif", mask_and_scale=True
)
benchmark = rxr.open_rasterio(
    "./benchmark_map_multi_categorical.tif", mask_and_scale=True
)
depth_raster = rxr.open_rasterio(
    "./candidate_raw_elevation_multi_categorical.tif", mask_and_scale=True
)

Homogenize Datasets and Make Agreement Map

Although one can call candidate.gval.categorical_compare to run the entire workflow, in this case homogenization and creation of an agreement map will be done separately to show more options for multi-class comparisons.

Homogenize

[3]:

candidate_r, benchmark_r = candidate.gval.homogenize(benchmark)
depth_raster_r, arb = depth_raster.gval.homogenize(benchmark_r)
del arb

Agreement Map

The following makes a pairing dictionary which maps combinations of values in the candidate and benchmark maps to unique values in the agreement map. In this case we will encode each value as concatenation of what the values are. Instead of making a pairing dictionary one can use the szudzik or cantor pairing functions to make unique values for each combination of candidate and benchmark map values. e.g. 12 represents a class 1 for the candidate and a class 2 for the benchmark.

[4]:

classes = [1, 2, 3, 4, 5]
pairing_dictionary = {(x, y): int(f'{x}{y}') for x, y in product(*([classes]*2))}

# Showing the first 6 entries
print('\n'.join([f'{k}: {v}' for k,v  in pairing_dictionary.items()][:6]))

(1, 1): 11
(1, 2): 12
(1, 3): 13
(1, 4): 14
(1, 5): 15
(2, 1): 21

The benchmark map has an extra class 0 which is very similar to nodata so it will not be included in allow_benchmark_values in the following methods.

[5]:

agreement_map = candidate_r.gval.compute_agreement_map(
    benchmark_r,
    nodata=255,
    encode_nodata=True,
    comparison_function="pairing_dict",
    pairing_dict=pairing_dictionary,
    allow_candidate_values=classes,
    allow_benchmark_values=classes,
)

crosstab = agreement_map.gval.compute_crosstab()

The following only shows a small subset of the map for memory purposes:

[7]:

agreement_map.gval.cat_plot(
    title='Agreement Map',
    figsize=(8, 6),
    colormap='tab20b'
)

[7]:

<matplotlib.collections.QuadMesh at 0x7f99a1e34b50>

Comparisons

For multi-categorical statistics GVAL offers 4 methods of averaging:

No Averaging which provides one vs. all metrics on a class basis
Micro Averaging which sums up the contingencies of each class defined as either positive or negative
Macro Averaging which sums up the contingencies of one class vs all and then averages them
Weighted Averaging which does macro averaging with the inclusion of weights to be applied to each positive category.

Using None for the averaging argument runs a one class vs. all methodology for each class and reports their metrics on a class basis:

[8]:

no_averaged_metrics = crosstab.gval.compute_categorical_metrics(
    positive_categories=[1, 2, 3, 4, 5],
    negative_categories=None,
    average=None
)
no_averaged_metrics.transpose()

[8]:

	0	1	2	3	4
band	1	1	1	1	1
positive_categories	1	2	3	4	5
fn	6.0	1043.0	318274.0	516572.0	364147.0
fp	172762.0	561004.0	462496.0	3775.0	5.0
tn	1043592.0	653360.0	422623.0	693617.0	852206.0
tp	0.0	953.0	12967.0	2396.0	2.0
accuracy	0.857963	0.537927	0.358109	0.57221	0.700622
balanced_accuracy	0.428984	0.507741	0.258311	0.499602	0.5
critical_success_index	0.0	0.001693	0.016337	0.004584	0.000005
equitable_threat_score	-0.000005	0.000055	-0.175401	-0.000455	-0.0
f_score	0.0	0.00338	0.032148	0.009125	0.000011
false_discovery_rate	1.0	0.998304	0.972728	0.611732	0.714286
false_negative_rate	1.0	0.522545	0.960853	0.995383	0.999995
false_omission_rate	0.000006	0.001594	0.429579	0.426852	0.299376
false_positive_rate	0.142033	0.461974	0.522524	0.005413	0.000006
fowlkes_mallows_index	0.0	0.028455	0.032675	0.042339	0.001253
matthews_correlation_coefficient	-0.000904	0.001257	-0.440983	-0.005543	-0.000072
negative_likelihood_ratio	1.165546	0.971226	2.01236	1.000801	1.0
negative_predictive_value	0.999994	0.998406	0.570421	0.573148	0.700624
overall_bias	28793.666667	281.541583	1.435399	0.011891	0.000019
positive_likelihood_ratio	0.0	1.033511	0.074919	0.852916	0.936112
positive_predictive_value	0.0	0.001696	0.027272	0.388268	0.285714
prevalence	0.000005	0.001641	0.272322	0.426657	0.299376
prevalence_threshold	1.0	0.49588	0.785107	0.519876	0.508252
true_negative_rate	0.857967	0.538026	0.477476	0.994587	0.999994
true_positive_rate	0.0	0.477455	0.039147	0.004617	0.000005

The following is an example of a using micro averaging to combine classes to process two-class categorical statistics. In this example we will use classes 1 and 2 as positive classes and classes 3, 4, and 5 as negative classes:

[9]:

micro_averaged_metrics = crosstab.gval.compute_categorical_metrics(
    positive_categories=[1, 2],
    negative_categories=[3, 4, 5],
    average="micro"
)
micro_averaged_metrics.transpose()

[9]:

	0
band	1
fn	382.0
fp	733099.0
tn	481259.0
tp	1620.0
accuracy	0.396987
balanced_accuracy	0.602749
critical_success_index	0.002204
equitable_threat_score	0.00056
f_score	0.004398
false_discovery_rate	0.997795
false_negative_rate	0.190809
false_omission_rate	0.000793
false_positive_rate	0.603693
fowlkes_mallows_index	0.04224
matthews_correlation_coefficient	0.017033
negative_likelihood_ratio	0.481468
negative_predictive_value	0.999207
overall_bias	366.992507
positive_likelihood_ratio	1.340402
positive_predictive_value	0.002205
prevalence	0.001646
prevalence_threshold	0.463444
true_negative_rate	0.396307
true_positive_rate	0.809191

The following shows macro averaging and is equivalent to the values of shared columns in no_averaged_comps.mean():

[10]:

macro_averaged_metrics = crosstab.gval.compute_categorical_metrics(
    positive_categories=classes,
    negative_categories=None,
    average="macro"
)
macro_averaged_metrics.transpose()

[10]:

	0
band	1
accuracy	0.605366
balanced_accuracy	0.438927
critical_success_index	0.004524
equitable_threat_score	-0.035161
f_score	0.008933
false_discovery_rate	0.85941
false_negative_rate	0.895755
false_omission_rate	0.231481
false_positive_rate	0.22639
fowlkes_mallows_index	0.020944
matthews_correlation_coefficient	-0.089249
negative_likelihood_ratio	1.229986
negative_predictive_value	0.768519
overall_bias	5815.331112
positive_likelihood_ratio	0.579492
positive_predictive_value	0.14059
prevalence	0.2
prevalence_threshold	0.661823
true_negative_rate	0.77361
true_positive_rate	0.104245

To further enhance macro-averaging, we can apply weights to the classes of interest in order to appropriately change the strength of evaluations for each class. For instance, if we applied the following vector the classes uses in this notebook, [1, 4, 1, 5, 1], classes 2 and 4 would have greater influence on the final averaging of the scores for all classes. (All weight values are in reference to the other weight values respectively. e.g. the vector [5, 5, 5, 5, 5] would cause no change in the averaging because each weight value is equivalent to a ll other weight values.) Let’s use the first weight vector mentioned in weighted averaging:

[11]:

weight_averaged_metrics = crosstab.gval.compute_categorical_metrics(
    positive_categories=classes,
    weights=[1, 4, 1, 5, 1],
    negative_categories=None,
    average="weighted"
)
weight_averaged_metrics.transpose()

[11]:

	0
band	1
accuracy	0.577454
balanced_accuracy	0.476356
critical_success_index	0.003836
equitable_threat_score	-0.014789
f_score	0.007609
false_discovery_rate	0.811574
false_negative_rate	0.835662
false_omission_rate	0.239133
false_positive_rate	0.211627
fowlkes_mallows_index	0.029953
matthews_correlation_coefficient	-0.03872
negative_likelihood_ratio	1.088901
negative_predictive_value	0.760867
overall_bias	2493.443989
positive_likelihood_ratio	0.784138
positive_predictive_value	0.188426
prevalence	0.225962
prevalence_threshold	0.573022
true_negative_rate	0.788373
true_positive_rate	0.164338

Regardless of the averaging methodology it seems as though the candidate does not agree with the benchmark. We can now save the output.

Save Output

[12]:

# output agreement map
agreement_file = 'multi_categorical_agreement_map.tif'
metric_file = 'macro_averaged_metric_file.csv'

agreement_map.rio.to_raster(agreement_file)
macro_averaged_metrics.to_csv(metric_file)