hydrotools.metrics.metrics module¶
Evaluation Metrics¶
Convenience methods for computing common evaluation metrics.
For a description of common evaluation metrics, see:
http://www.eumetrain.org/data/4/451/english/courses/msgcrs/index.htm
Functions¶
compute_contingency_table
probability_of_detection
probability_of_false_detection
probability_of_false_alarm
threat_score
frequency_bias
percent_correct
base_chance
equitable_threat_score
mean_error
nash_sutcliffe_efficiency
kling_gupta_efficiency
volumetric_efficiency
mean_squared_error
root_mean_squared_error
mean_error_skill_score
coefficient_of_persistence
coefficient_of_extrapolation
- hydrotools.metrics.metrics.base_chance(contingency_table: dict | DataFrame | Series, true_positive_key: str = 'true_positive', false_positive_key: str = 'false_positive', false_negative_key: str = 'false_negative', true_negative_key: str = 'true_negative') float ¶
Compute base chance to hit (a_r). Base chance is the relative frequency of occurences. In other words, this is the probability of scoring a “hit” or true positive by chance.
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) –
- Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key,
false_negative_key, true_negative_key; and int or float values
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
false_negative_key (str, optional, default 'false_negative') – Label to use for false negatives.
true_negative_key (str, optional, default 'true_negative') – Label to use for true negatives.
- Returns:
a_r – Base chance to hit by chance.
- Return type:
float
- hydrotools.metrics.metrics.coefficient_of_extrapolation(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], log: bool = False, power: float = 2.0, normalized: bool = False) float ¶
Compute the coefficient of extrapolation (Kitanidis & Bras, 1980). The coefficient of extrapolation compares the model output to the last two values of the observations, assuming the linear trend of the these values will continue. In other words, the coefficient of extrapolation is a skill score with baseline values $y_{b,i} = y_{b,i-1} + (y_{b,i-1} - y_{b,i-2})$.
The coefficient of extrapolation ranges from -inf to 1.0, higher is better. A score of 0.0 indicates skill no better than assuming the difference between the last two observations will persist. A perfect score is 1.0.
- Parameters:
y_true (array-like of shape (n_samples,), required) – Ground truth (correct) target values, also called observations, measurements, or observed values.
y_pred (array-like of shape (n_samples,), required) – Estimated target values, also called simulations or modeled values.
log (bool, default False) – Apply numpy.log (natural logarithm) to y_true and y_pred before computing the NSE.
power (float, default 2.0) – Exponent for each mean error summation value.
normalized (bool, default False) – When True, normalize the final NSE value using the method from Nossent & Bauwens, 2012.
- Returns:
score – Coefficient of extrapolation.
- Return type:
float
See also
mean_error_skill_score
Generic method for computing model skill.
References
- Kitanidis, P. K., & Bras, R. L. (1980). Real-time forecasting with a conceptual
hydrologic model: 2. Applications and results. Water Resources Research, 16(6), 1034-1044.
- Nossent, J., & Bauwens, W. (2012, April). Application of a normalized
Nash-Sutcliffe efficiency to improve the accuracy of the Sobol’sensitivity analysis of a hydrological model. In EGU General Assembly Conference Abstracts (p. 237).
- hydrotools.metrics.metrics.coefficient_of_persistence(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], lag: int = 1, log: bool = False, power: float = 2.0, normalized: bool = False) float ¶
Compute the coefficient of persistence (Kitanidis & Bras, 1980). The coefficient of persistence compares the model to a recent observation, given some lag. This score assesses the model’s skill compared to assuming a previous observation does not change (persists).
In the default case, the ith modeled value will be compared to the i-1 observed value. The result is the mean squared error skill score using the i-1 observed values as a baseline. The coefficient of persistence ranges from -inf to 1.0, higher is better. A score of 0.0 indicates skill no better than assuming the last observation would persist. A perfect score is 1.0.
- Parameters:
y_true (array-like of shape (n_samples,), required) – Ground truth (correct) target values, also called observations, measurements, or observed values.
y_pred (array-like of shape (n_samples,), required) – Estimated target values, also called simulations or modeled values.
lag (int, default 1) – Number of values by which to lag the baseline.
log (bool, default False) – Apply numpy.log (natural logarithm) to y_true and y_pred before computing the score.
power (float, default 2.0) – Exponent for each mean error summation value.
normalized (bool, default False) – When True, normalize the final score using the method from Nossent & Bauwens, 2012.
- Returns:
score – Coefficient of persistence.
- Return type:
float
See also
mean_error_skill_score
Generic method for computing model skill.
References
- Kitanidis, P. K., & Bras, R. L. (1980). Real-time forecasting with a conceptual
hydrologic model: 2. Applications and results. Water Resources Research, 16(6), 1034-1044.
- Nossent, J., & Bauwens, W. (2012, April). Application of a normalized
Nash-Sutcliffe efficiency to improve the accuracy of the Sobol’sensitivity analysis of a hydrological model. In EGU General Assembly Conference Abstracts (p. 237).
- hydrotools.metrics.metrics.compute_contingency_table(observed: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], simulated: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], true_positive_key: str = 'true_positive', false_positive_key: str = 'false_positive', false_negative_key: str = 'false_negative', true_negative_key: str = 'true_negative') Series ¶
Compute components of a contingency table required for the evaluation of categorical forecasts and simulations. Returns a pandas.Series indexed by table component. ‘true_positive’ indicates the number of times the simulation correctly indicated True according to the observations. ‘false_positive’ indicates the number of times the simulation incorrectly indicated True according to the observations. ‘false_negative’ indicates the number of times the simulation incorrectly indicated False according to the observations. ‘true_negative’ indicates the number of times the simulation correctly indicated False according to the observations.
- Parameters:
observed (array-like, required) – Array-like of boolean values indicating observed occurrences
simulated (array-like, required) – Array-like of boolean values indicating simulated occurrences
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
false_negative_key (str, optional, default 'false_negative') – Label to use for false negatives.
true_negative_key (str, optional, default 'true_negative') – Label to use for true negatives.
- Returns:
contingency_table –
- pandas.Series of integer values keyed to pandas.Index([true_positive_key, false_positive_key,
false_negative_key, true_negative_key])
- Return type:
pandas.Series
Examples
>>> obs = [True, True, True, False, False, False, False, False, True, True] >>> sim = [True, True, False, False, False, False, True, False, False, False] >>> metrics.compute_contingency_table(obs, sim) true_positive 2 false_positive 1 false_negative 3 true_negative 4 dtype: int64
- hydrotools.metrics.metrics.equitable_threat_score(contingency_table: dict | DataFrame | Series, true_positive_key: str = 'true_positive', false_positive_key: str = 'false_positive', false_negative_key: str = 'false_negative', true_negative_key: str = 'true_negative') float ¶
Compute equitable threat score (ETS). Threat score/Critical Success Index tends to yield lower scores for rare events. ETS computes a threat score, but accounts for the relative frequency of scoring a true positive by chance.
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) – Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key, false_negative_key, true_negative_key; and int or float values
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
false_negative_key (str, optional, default 'false_negative') – Label to use for false negatives.
true_negative_key (str, optional, default 'true_negative') – Label to use for true negatives.
- Returns:
ETS – Equitable threat score.
- Return type:
float
- hydrotools.metrics.metrics.frequency_bias(contingency_table: dict | DataFrame | Series, true_positive_key: str = 'true_positive', false_positive_key: str = 'false_positive', false_negative_key: str = 'false_negative') float ¶
Compute frequency bias (FBI). FBI measures the tendency of the simulation or forecast to over or under-predict. FBI ranges from 0.0 to inf. A perfect score is 1.0. Values less than 1.0 indicate under-prediction. Values greater than 1.0 indicate over-prediction.
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) – Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key, false_negative_key, true_negative_key; and int or float values
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
false_negative_key (str, optional, default 'false_negative') – Label to use for false negatives.
- Returns:
FBI – Frequency bias.
- Return type:
float
- hydrotools.metrics.metrics.kling_gupta_efficiency(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], r_scale: float = 1.0, a_scale: float = 1.0, b_scale: float = 1.0) float ¶
Compute the Kling-Gupta model efficiency coefficient (KGE). The KGE is a summary metric that combines the relative mean, relative variance, and linear correlation between observed and simulated values. The final metric is computed using the root sum of squares with optional scaling factors, similar to computing distance in a 3-dimensional Euclidean space.
- Parameters:
y_true (array-like of shape (n_samples,), required) – Ground truth (correct) target values, also called observations, measurements, or observed values.
y_pred (array-like of shape (n_samples,), required) – Estimated target values, also called simulations or modeled values.
r_scale (float, optional, default 1.0) – Linear correlation (r) scaling factor. Used to re-scale the Euclidean space by emphasizing different KGE components.
a_scale (float, optional, default 1.0) – Relative variability (alpha) scaling factor. Used to re-scale the Euclidean space by emphasizing different KGE components.
b_scale (float, optional, default 1.0) – Relative mean (beta) scaling factor. Used to re-scale the Euclidean space by emphasizing different KGE components.
- Returns:
score – Kling-Gupta efficiency.
- Return type:
float
References
- Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of
the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of hydrology, 377(1-2), 80-91. https://doi.org/10.1016/j.jhydrol.2009.08.003
- hydrotools.metrics.metrics.mean_error(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], power: float = 1.0, root: bool = False) float ¶
Compute the mean error or deviation. Default is Mean Absolute Error. The mean error is given by:
$$ME = frac{1}{n}sum_{i=1}^{n}left| y_{s,i} - y_{o,i} right|^{p}$$
Where $n$ is the length of each array, $y_{s,i}$ is the ith simulated or predicted value, $y_{o,i}$ is the ith observed or true value, and $p$ is the exponent.
- Parameters:
y_true (array-like of shape (n_samples,), required) – Ground truth (correct) target values, also called observations, measurements, or observed values.
y_pred (array-like of shape (n_samples,), required) – Estimated target values, also called simulations or modeled values.
power (float, default 1.0) – Exponent for each mean error summation value.
root (bool, default False) – When True, return the root mean error.
- Returns:
mean_error – Mean error or root mean error.
- Return type:
float
- hydrotools.metrics.metrics.mean_error_skill_score(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_base: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], power: float = 1.0, normalized: bool = False) float ¶
Compute a generic mean error based model skill score. The mean error skill score is given by:
$$MESS = 1 - frac{sum_{i=1}^{n}left| y_{p,i} - y_{o,i} right|^{p}}{sum_{i=1}^{n}left| y_{b,i} - y_{o,i} right|^{p}}$$
Where $n$ is the length of each array, $y_{s,i}$ is the ith simulated or predicted value, $y_{b,i}$ is the ith baseline value, $y_{o,i}$ is the ith observed or true value, and $p$ is the exponent.
- Parameters:
y_true (array-like of shape (n_samples,), required) – Ground truth (correct) target values, also called observations, measurements, or observed values.
y_pred (array-like of shape (n_samples,), required) – Estimated target values, also called simulations or modeled values.
y_base (array-like of shape (n_samples,), required) – Baseline value(s) against which to assess skill of y_pred.
power (float, default 1.0) – Exponent for each mean error summation value.
normalized (bool, default False) – When True, normalize the final skill score using the method from Nossent & Bauwens, 2012.
- Returns:
score – Skill score of y_pred relative to y_base.
- Return type:
float
References
- Nossent, J., & Bauwens, W. (2012, April). Application of a normalized
Nash-Sutcliffe efficiency to improve the accuracy of the Sobol’sensitivity analysis of a hydrological model. In EGU General Assembly Conference Abstracts (p. 237).
- hydrotools.metrics.metrics.mean_squared_error(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], *, power: float = 2.0, root: bool = False) float ¶
Partial of hydrotools.metrics.mean_error with a default power value of 2.0 and root set to False.
See also
- hydrotools.metrics.metrics.nash_sutcliffe_efficiency(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], log: bool = False, power: float = 2.0, normalized: bool = False) float ¶
Compute the Nash-Sutcliffe model efficiency coefficient (NSE), also called the mean squared error skill score or the R^2 (coefficient of determination) regression score. The NSE compares model errors to observed variance. The default NSE ranges from -inf to 1.0, higher is better. A score of 0.0 indicates the model is as good a predictor as the mean of observations. A score of 1.0 indicates the model exactly matches the observations.
The “normalized” Nash-Sutcliffe model efficiency re-scales the NSE to a range from 0.0 to 1.0. In this case, A score of 0.5 indicates the model is as good a predictor as the mean of observations. A score of 1.0 still indicates the model exactly matches the observations.
- Parameters:
y_true (array-like of shape (n_samples,), required) – Ground truth (correct) target values, also called observations, measurements, or observed values.
y_pred (array-like of shape (n_samples,), required) – Estimated target values, also called simulations or modeled values.
log (bool, default False) – Apply numpy.log (natural logarithm) to y_true and y_pred before computing the NSE.
power (float, default 2.0) – Exponent for each mean error summation value.
normalized (bool, default False) – When True, normalize the final NSE value using the method from Nossent & Bauwens, 2012.
- Returns:
score – Nash-Sutcliffe model efficiency coefficient
- Return type:
float
References
- Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through
conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), 282-290.
- Nossent, J., & Bauwens, W. (2012, April). Application of a normalized
Nash-Sutcliffe efficiency to improve the accuracy of the Sobol’sensitivity analysis of a hydrological model. In EGU General Assembly Conference Abstracts (p. 237).
- hydrotools.metrics.metrics.percent_correct(contingency_table: dict | DataFrame | Series, true_positive_key: str = 'true_positive', false_positive_key: str = 'false_positive', false_negative_key: str = 'false_negative', true_negative_key: str = 'true_negative') float ¶
Compute percent correct (PC). PC is the sum of both true positives and true negatives compared to the total number of observations. PC is the portion of correctly predicted occurences and non-occurences. PC ranges from 0.0 to 1.0, higher is better.
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) – Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key, false_negative_key, true_negative_key; and int or float values
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
false_negative_key (str, optional, default 'false_negative') – Label to use for false negatives.
true_negative_key (str, optional, default 'true_negative') – Label to use for true negatives.
- Returns:
PC – Percent correct.
- Return type:
float
- hydrotools.metrics.metrics.probability_of_detection(contingency_table: dict | DataFrame | Series, true_positive_key: str = 'true_positive', false_negative_key: str = 'false_negative') float ¶
Compute probability of detection (POD), also called the “hit rate”. POD is the ratio of true positives to the number of observations. POD ranges from 0.0 to 1.0, higher is better. Note: that this statistic is easy to “hedge” if the model always indicates occurence. This statistic should be considered alongside some metric of false positives, like probability of false alarm or threat score.
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) – Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key, false_negative_key, true_negative_key; and int or float values
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_negative_key (str, optional, default 'false_negative') – Label to use for false negatives.
- Returns:
POD – Probability of detection.
- Return type:
float
- hydrotools.metrics.metrics.probability_of_false_alarm(contingency_table: dict | DataFrame | Series, true_positive_key: str = 'true_positive', false_positive_key: str = 'false_positive') float ¶
Compute probability of false alarm/false alarm ratio (POFA/FARatio). POFA indicates the portion of predictions or simulated values that were false alarms. POFA ranges from 0.0 to 1.0, lower is better. The complement of POFA (1.0 - POFA) is the ‘post-agreement (PAG).’
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) – Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key, false_negative_key, true_negative_key; and int or float values
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
- Returns:
POFA – Probability of false alarm.
- Return type:
float
- hydrotools.metrics.metrics.probability_of_false_detection(contingency_table: dict | DataFrame | Series, false_positive_key: str = 'false_positive', true_negative_key: str = 'true_negative') float ¶
Compute probability of false detection/false alarm rate (POFD/FARate). POFD indicates the portion of non-occurences that were false alarms. POFD ranges from 0.0 to 1.0, lower is better.
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) – Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key, false_negative_key, true_negative_key; and int or float values
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
true_negative_key (str, optional, default 'true_negative') – Label to use for true negatives.
- Returns:
POFD – Probability of false detection.
- Return type:
float
- hydrotools.metrics.metrics.root_mean_squared_error(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], *, power: float = 2.0, root: bool = True) float ¶
Partial of hydrotools.metrics.mean_error with a default power value of 2.0 and root set to True.
See also
- hydrotools.metrics.metrics.threat_score(contingency_table: dict | DataFrame | Series, true_positive_key: str = 'true_positive', false_positive_key: str = 'false_positive', false_negative_key: str = 'false_negative') float ¶
Compute threat score/critical success index (TS/CSI). CSI is the ratio true positives to the sum of true positives, false positives, and false negatives. CSI ranges from 0.0 to 1.0, higher is better. CSI is sensitive to event frequency, in which case the equitable threat score may be more suitable.
- Parameters:
contingency_table (dict, pandas.DataFrame, or pandas.Series, required) – Contingency table containing key-value pairs with the following keys: true_positive_key, false_positive_key, false_negative_key, true_negative_key; and int or float values
true_positive_key (str, optional, default 'true_positive') – Label to use for true positives.
false_positive_key (str, optional, default 'false_positive') – Label to use for false positives.
false_negative_key (str, optional, default 'false_negative') – Label to use for false negatives.
- Returns:
TS – Threat score.
- Return type:
float
- hydrotools.metrics.metrics.volumetric_efficiency(y_true: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y_pred: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], *, y_base: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] = 0.0, power: float = 1.0, normalized: bool = False) float ¶
Partial of hydrotools.metrics.mean_error_skill_score with a default y_base of 0.0 and a power value of 1.0. Volumetric efficiency ranges from -inf to 1.0, higher is better. According to the authors, volumetric efficiency indicates the “portion of water that arrives on time.” Note: that large over-predictions result in deeply negative values.
See also
References
- Criss, R. E., & Winston, W. E. (2008). Do Nash values have value? Discussion
and alternate proposals. Hydrological Processes: An International Journal, 22(14), 2723-2725.