hydrotools.nwis_client.iv module¶

USGS NWIS Instantaneous Values REST Client¶

This module provides an IVDataService class that provides a convenient interface to the USGS NWIS Instantaneous Values (IV) REST Service API. Classes ——-

IVDataService

class hydrotools.nwis_client.iv.IVDataService(*, enable_cache: bool = True, cache_expire_after: int = 43200, value_time_label: str = 'value_time', cache_filename: str | Path = 'nwisiv_cache')¶

Bases: object

Provides a programatic way to retrieve USGS Instantanous Value Service data in the canonical hydrotools pandas dataframe format. The IVDataService implements an sqlite3 request cache and asynchronous request backend that splits up and retrieves requests concurrently.

Parameters:

enable_cache (bool) – Toggle sqlite3 request caching
cache_expire_after (int) – Cached item life length in seconds
value_time_label (str, default 'value_time') – Label to use for datetime column returned by IVDataService.get
cache_filename (str or Path default 'nwisiv_cache') – Sqlite cache filename or filepath. Suffix ‘.sqlite’ will be added to file if not included.

Examples

>>> from hydrotools.nwis_client import IVDataService
>>> service = IVDataService()
>>> df = service.get(sites='01646500', startDT="2021-01-01", endDT="2021-02-01")

>>> # Retrieve discharge data from all sites in Alabama over the past 5 days
>>> df = service.get(stateCd='AL', period="P5D")

>>> # Retrieve latest discharge data from a list of sites
>>> sites = ['02339495', '02342500', '023432415', '02361000', '02361500', '02362240', '02363000', '02364500', '02369800', '02371500']
>>> # Also works with np array's, pd.Series, and comma seperated string. Try it out!
>>> # sites = np.array(['02339495', '02342500', '023432415', '02361000', '02361500', '02362240', '02363000', '02364500', '02369800', '02371500'])
>>> # sites = pd.array(['02339495', '02342500', '023432415', '02361000', '02361500', '02362240', '02363000', '02364500', '02369800', '02371500'])
>>> # sites = '02339495,02342500,023432415,02361000,02361500,02362240,02363000,02364500,02369800,02371500'
>>> df = service.get(sites=sites)

>>> # Retrieve discharge data from sites within a bounding box from a point in the past until the present
>>> #
>>> bbox = "-83.0,36.5,-81.0,38.5"
>>> # or specify in list. It's possible to specify multiple bounding boxes using a list of comma seperated string or nested lists
>>> # np.array's and pd.Series's are accepted too!
>>> # bbox = [-83.,36.5,-81.,38.5]
>>> #
>>> # You can also specify start and end times using datetime, np.datetime64, timestamps!
>>> from datetime import datetime
>>> start = datetime(2021, 5, 1, 12)
>>> df = service.get(bBox=bbox, startDT=start)

>>> # Retrieve stage height data from sites within two counties for the past day
>>> counties = [36109, 36107]
>>> # Can specify as collection(list, np.array, pd.Series) of strings or ints or a comma seperated list of strings.
>>> # counties = ["36109", "36107"]
>>> # counties = "36109,36107"
>>> df = service.get(countyCd=counties, period='P5D')

_base_url = 'https://waterservices.usgs.gov/nwis/iv/'¶

_datetime_format = '%Y-%m-%dT%H:%M%z'¶

Wrangle dates from a wide range of formats into a standard strftime string representation.

Float and integers timestamps are assumed to have units of seconds. See pandas.to_datetime documentation on parameter unit for more detail.

Parameters:: date (Union[str, datetime.datetime, np.datetime64, pd.Timestamp]) – Single date
Returns:: strftime string
Return type:: str, np.array[str]

static _handle_response(raw_response: ClientResponse, include_expanded_metadata: bool = False) → List[dict]¶

From a raw response, return a list of extracted sites in dictionary form. Relevant dictionary keys are:

“usgs_site_code” “variableName” “measurement_unit” “values” “series”

Parameters:: raw_response (aiohttp.ClientResponse) – Request GET response
Returns:: A list of handled responses
Return type:: List[dict]

_handle_start_end_period_url_params(startDT=None, endDT=None, period=None) → dict¶

Handle passed date or period ranges, returning valid parameters and parameter combinations. Valid parameters are returned as a dictionary with parameter name keys (i.e. startDT) and associated validated/transformed periods/datetime string values.

startDT’s and endDT’s will be converted to UTC and output in _datetime_format format (i.e. “2020-01-01” -> “2020-01-01T00:00+0000). See _handle_date for more information. Period strings are validated against regex, see _validate_period_string for more information.

The following are parameter combinations are valid:: _handle_start_end_period_url_params(startDT, endDT) _handle_start_end_period_url_params(startDT) _handle_start_end_period_url_params(period)

If a (startDT and period) or (endDT and period) are passed, a KeyError is raised. If an invalid period is passed, a KeyError is also raised.

Parameters:

startDT (int, float, str, datetime, optional) – _datetime_format datetime string, by default None
endDT (int, float, str, datetime, optional) – _datetime_format datetime string, by default None
period (str, optional) – iso 8601 period string, by default None

Returns:

Dictionary

Return type:

dict

Raises:

KeyError – If the input is malformed, e.g. {“period”: “P1DG”} (the G is not ISO 8601)
TypeError – If any input is non-string or

_headers = {'Accept-Encoding': 'gzip, compress'}¶

_validate_period_string(period: str) → bool¶

Validate if a string adheres to the duration format introduced in in ISO 8601.

Parameters:: period (str) – ISO 8601 period string.
Returns:: True if validates against regex
Return type:: bool

_value_time_label = None¶

property base_url: str¶: API Baseurl

property cache_enabled: bool¶: Is cache enabled

property datetime_format: str¶: API’s expected datetime format

Return Pandas DataFrame of NWIS IV data.

Parameters:

sites (str, List[str], pandas.Series[str], or numpy.Array[str], optional) – Single site, comma separated string list of sites, or iterable collection of string sites
stateCd (str, List[str], optional) – 2 character U.S. State of Territory abbreviation single, comma seperated string, or iterable collection of strings
huc (str, List[int], List[str], optional) – Hydrologic Unit Codes as single string, comma seperated string, or iterable collection of strings or ints. Full list https://water.usgs.gov/GIS/huc_name.html
bBox (str, List[str, int, float], List[List[str, int, float]], optional) – lat, lon in format: west, south, east, north. Accepted as comma seperated string, list of str, int, float, or nested list of str, int, float
countyCd (str, List[int, str]) – Single, comma seperated string, or iterable collection of strings or integers of U.S. county codes Full list: https://help.waterdata.usgs.gov/code/county_query?fmt=html
parameterCd (str, optional, default '00060' (Discharge)) – Comma separated list of parameter codes in string format. Full list: https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes?radio_pm_search=param_group&pm_group=All+–+include+all+parameter+groups&pm_search=&casrn_search=&srsname_search=&format=html_table&show=parameter_group_nm&show=parameter_nm&show=casrn&show=srsname&show=parameter_units
startDT (str, datetime.datetime, np.datetime64, pd.Timestamp, or None, optional, default None) – Observation record start time. If timezone information not provided, defaults to UTC.
endDT (str, datetime.datetime, np.datetime64, pd.Timestamp, or None, optional, default None) – Observation record end time. If timezone information not provided, defaults to UTC.
period (str, None) – Observation record for period until current time. Uses ISO 8601 period time.
siteStatus (str, optional, default 'all') – Site status in string format. Options: ‘all’, ‘active’, ‘inactive’
include_expanded_metadata (bool, default False) – Setting to True will add latitude, longitude, srs, hucCd, stateCd, countyCd, and siteName columns to the returned dataframe.
params – Additional parameters passed directly to service.

Returns:

DataFrame in semi-WRES compatible format

Return type:

pandas.DataFrame

Examples

>>> from hydrotools.nwis_client import IVDataService
>>> service = IVDataService()
>>> df = service.get(sites='01646500', startDT="2021-01-01", endDT="2021-02-01")

>>> # Retrieve discharge data from all sites in Alabama over the past 5 days
>>> df = service.get(stateCd='AL', period="P5D")

>>> # Retrieve latest discharge data from a list of sites
>>> sites = ['02339495', '02342500', '023432415', '02361000', '02361500', '02362240', '02363000', '02364500', '02369800', '02371500']
>>> # Also works with np array's, pd.Series, and comma seperated string. Try it out!
>>> # sites = np.array(['02339495', '02342500', '023432415', '02361000', '02361500', '02362240', '02363000', '02364500', '02369800', '02371500'])
>>> # sites = pd.array(['02339495', '02342500', '023432415', '02361000', '02361500', '02362240', '02363000', '02364500', '02369800', '02371500'])
>>> # sites = '02339495,02342500,023432415,02361000,02361500,02362240,02363000,02364500,02369800,02371500'
>>> df = service.get(sites=sites)

>>> # Retrieve discharge data from sites within a bounding box from a point in the past until the present
>>> #
>>> bbox = "-83.0,36.5,-81.0,38.5"
>>> # or specify in list. It's possible to specify multiple bounding boxes using a list of comma seperated string or nested lists
>>> # np.array's and pd.Series's are accepted too!
>>> # bbox = [-83.,36.5,-81.,38.5]
>>> #
>>> # You can also specify start and end times using datetime, np.datetime64, timestamps!
>>> from datetime import datetime
>>> start = datetime(2021, 5, 1, 12)
>>> df = service.get(bBox=bbox, startDT=start)

>>> # Retrieve stage height data from sites within two counties for the past day
>>> counties = [36109, 36107]
>>> # Can specify as collection(list, np.array, pd.Series) of strings or ints or a comma seperated list of strings.
>>> # counties = ["36109", "36107"]
>>> # counties = "36109,36107"
>>> df = service.get(countyCd=counties, period='P5D')

get_raw(sites: str | List[str] | ndarray | Series | None = None, stateCd: str | List[str] | ndarray | Series | None = None, huc: str | List[str | int] | ndarray | Series | None = None, bBox: str | List[str | int] | ndarray | Series | List[List[str | int]] | None = None, countyCd: str | List[str | int] | None = None, parameterCd: str = '00060', startDT: str | datetime | datetime64 | Timestamp | None = None, endDT: str | datetime | datetime64 | Timestamp | None = None, period: str | None = None, siteStatus: str = 'all', max_sites_per_request: int = 20, include_expanded_metadata: bool = False, **params) → List[ClientResponse]¶: Return raw requests data from the NWIS IV Rest API in a list. See IVDataService.get for argument documentation.

property headers: dict¶: HTTP GET Headers

static simplify_variable_name(variable_name: str, split_delimiter: str = ',') → str¶

Split an input string by a delimiter and return only the first split result lowered.

Parameters:

variable_name (str) – String to simplify.
split_delimiter (str) – Delimiter used to split data.

Returns:

variable_name – Simplified variable name

Return type:

str

property value_time_label: str¶: Label to use for datetime column

hydrotools.nwis_client.iv._bbox_split(values: str | list | tuple | Series | ndarray) → List[str]¶

hydrotools.nwis_client.iv._create_empty_canonical_df() → DataFrame¶: Returns an empty hydrotools canonical dataframe with correct field datatypes.

hydrotools.nwis_client.iv._verify_case_insensitive_kwargs_handler(m: str) → None¶

hydrotools.nwis_client.iv.sequence_scientific_array_like(values: T) → bool¶

hydrotools.nwis_client.iv.split(key: str, values, split_threshold: int, join_on: str | None = None) → List[Dict[str, List[str]]]¶

hydrotools.nwis_client.iv.validate_optional_combinations(arg_mapping: ~typing.Dict[str, ~hydrotools.nwis_client.iv.T], valid_arg_keys: ~typing.List[~typing.Set[str]], sentinel=None, exception: Exception = <class 'KeyError'>, exception_message=None) → Dict[str, T]¶