Loading Datasets

Return to Homepage

Functions to load or create datasets

gval.utils.loading_datasets.adjust_memory_strategy(strategy: str)

Tells GVAL how to address handling memory. There are three modes currently available:

normal: Keeps all of xarray files in memory as usual moderate: Either creates cloud optimized geotiffs and stores as temporary files and reloads or reloads file to be in lazily loaded stated aggressive: Does the same as moderate except loads with no cache so everything is read from disk

There are tradeoffs with performance for choosing a strategy that conserves memory, adjust only as needed.

Parameters:

strategy (str, {'normal', 'moderate', 'aggressive'}) – Method to conserve memory

Raises:

ValueError

gval.utils.loading_datasets.get_current_memory_strategy() str

Gets the current memory_strategy

Returns:

Memory optimization strategy

Return type:

str

gval.utils.loading_datasets.stac_to_df(stac_items: ItemCollection, assets: list | None = None, attribute_allow_list: list | None = None, attribute_block_list: list | None = None) DataFrame

Convert STAC Items in to a DataFrame

Parameters:
  • stac_items (ItemCollection) – STAC Item Collection returned from pystac client

  • assets (list, default = None) – Assets to keep, (keep all if None)

  • attribute_allow_list (list, default = None) – List of columns to allow in the result DataFrame

  • attribute_block_list (list, default = None) – List of columns to remove in the result DataFrame

Returns:

A DataFrame with rows for each unique item/asset combination

Return type:

pd.DataFrame

Raises:
  • ValueError – Allow and block lists should be mutually exclusive

  • ValueError – No entries in DataFrame due to nonexistent asset

  • ValueError – There are no assets in this query to run a catalog comparison