Building your own subset

We are now at the point where we have a common understanding of how hydrofabrics are sourced, manipulated for a given modeling task, and generated on a VPU basis.

This aim of this section is that each of us can build a subset network for a location of interest. Doing this requires an understanding of the following:

The CONUS network
Finding an Origin
The R based hfsubsetR::get
The CLI utility for micro service subsets.
RESTFUL API

The CONUS network file

A network file is distributed with each version/type of a hydrofabric. For more on data access patterns see the Data Vignette.

local   <- "/Users/mjohnson/hydrofabric"
s3      <- "s3://lynker-spatial/hydrofabric"
version <-  'v2.1.1'
type    <- "nextgen"
domain  <- "conus"


network_path = glue("{local}/{version}/{type}/conus_network")
net = open_dataset(network_path)

In the above schema you’ll see that every relationship between the features in the current hydrofabric, the source hydrofabric, and the conus hydrolocations, have been exploded into a “many-to-many” table.

For example, we can look at the flowpath wb-1002 and find that it is defined by the aggregation of NHDPlusV2 COMID 1712220, 1712230, and 1712238.

glimpse(filter(net, id == "wb-1002"))

## FileSystemDataset with 22 Parquet files (query)
## 3 rows x 18 columns
## $ id                    <string> "wb-1002", "wb-1002", "wb-1002"
## $ toid                  <string> "nex-1003", "nex-1003", "nex-1003"
## $ divide_id             <string> "cat-1002", "cat-1002", "cat-1002"
## $ ds_id                 <double> NA, NA, NA
## $ mainstem              <double> 1816996, 1816996, 1816996
## $ hl_id                 <string> NA, NA, NA
## $ hydroseq               <int32> 3730, 3730, 3730
## $ hl_uri                <string> NA, NA, NA
## $ hf_source             <string> "NHDPlusV2", "NHDPlusV2", "NHDPlusV2"
## $ hf_id                 <double> 1712220, 1712230, 1712238
## $ lengthkm              <double> 4.787772, 4.787772, 4.787772
## $ areasqkm              <double> 6.36435, 6.36435, 6.36435
## $ tot_drainage_areasqkm <double> 706.779, 706.779, 706.779
## $ type                  <string> "nexus", "nexus", "nexus"
## $ hf_areasqkm           <double> 0.7902, 0.1431, 5.4423
## $ hf_hydroseq            <int32> 2245165, 2245183, 2245201
## $ topo                  <string> "fl-nex", "fl-nex", "fl-nex"
## $ vpuid                 <string> "01", "01", "01"
## Call `print()` for query details

Or, that the terminal outflow of ‘HUC12-010100100101’ occurs at tnx-1000000569 which is fed by an aggregate flowpath made up of three source flowpaths (hf_id={816563, 816417, 816415})

glimpse(filter(net, hl_uri == 'HUC12-010100100101'))

## FileSystemDataset with 22 Parquet files (query)
## 3 rows x 18 columns
## $ id                    <string> "wb-280", "wb-280", "wb-280"
## $ toid                  <string> "tnx-1000000569", "tnx-1000000569", "tnx-100000…
## $ divide_id             <string> "cat-280", "cat-280", "cat-280"
## $ ds_id                 <double> NA, NA, NA
## $ mainstem              <double> 1780414, 1780414, 1780414
## $ hl_id                 <string> "01236", "01236", "01236"
## $ hydroseq               <int32> 18917, 18917, 18917
## $ hl_uri                <string> "HUC12-010100100101", "HUC12-010100100101", "HU…
## $ hf_source             <string> "NHDPlusV2", "NHDPlusV2", "NHDPlusV2"
## $ hf_id                 <double> 816563, 816417, 816415
## $ lengthkm              <double> 5.135779, 5.135779, 5.135779
## $ areasqkm              <double> 13.2813, 13.2813, 13.2813
## $ tot_drainage_areasqkm <double> 51.68745, 51.68745, 51.68745
## $ type                  <string> "terminal", "terminal", "terminal"
## $ hf_areasqkm           <double> 9.3258, 1.9071, 2.0484
## $ hf_hydroseq            <int32> 2243683, 2243682, 2243689
## $ topo                  <string> "fl-nex", "fl-nex", "fl-nex"
## $ vpuid                 <string> "01", "01", "01"
## Call `print()` for query details

Finding an Origin

By known COMID

findOrigin(network_path, comid = 101)

## # A tibble: 1 × 4
##   id         vpuid topo   hydroseq
##   <chr>      <chr> <chr>     <int>
## 1 wb-2430837 12    fl-nex      139

By known ID

findOrigin(network_path, id = 'wb-2430837')

## # A tibble: 1 × 4
##   id         vpuid topo   hydroseq
##   <chr>      <chr> <chr>     <int>
## 1 wb-2430837 12    fl-nex      139

By location (XY)

here = AOI::geocode("National Water Center, Alabama")

findOrigin(network_path, xy = c(here$x, here$y))

## # A tibble: 1 × 4
##   id        vpuid topo   hydroseq
##   <chr>     <chr> <chr>     <int>
## 1 wb-489658 03W   fl-nex     4118

By Hydrolocation URI

# For a gage in Calfornia
findOrigin(network_path, hl_uri = "Gages-11123000")

## # A tibble: 1 × 4
##   id         vpuid topo   hydroseq
##   <chr>      <chr> <chr>     <int>
## 1 wb-3314811 18    fl-nex    28597

# For the HUC12 of Atascadero Creek in Santa Barbara:
findOrigin(network_path, hl_uri = "HUC12-180600130201")

## # A tibble: 1 × 4
##   id         vpuid topo   hydroseq
##   <chr>      <chr> <chr>     <int>
## 1 wb-3315273 18    fl-nex    16028

# For the dam on Horsetooth Reservoir
findOrigin(network_path, hl_uri = "NID-CO01659-1")

## # A tibble: 1 × 4
##   id         vpuid topo   hydroseq
##   <chr>      <chr> <chr>     <int>
## 1 wb-1560663 10L   fl-nex    50415

By NLDI Feature

# For a gage in Calfornia
findOrigin(network_path, 
           nldi_feature  = list(featureSource = "nwissite", 
                                featureID = "USGS-05428500"))

## # A tibble: 1 × 4
##   id         vpuid topo   hydroseq
##   <chr>      <chr> <chr>     <int>
## 1 wb-1089011 07    fl-nex    33345

hfsubetR

The hfsubsetR library is core module in the NOAA-OWP/hydrofabric suite. It expedites the process of (1) finding an origin (2) traversing the network and (3) extracting the relevant features from the requested.

For example, lets get a basic subset of the network upstream of comid=101 complete with divides, nexus locations and flowlines:

subset101 = get_subset(comid = 101, 
                       lyrs = c("divides", "nexus", "flowlines"),
                       source   = "/Users/mjohnson/hydrofabric",
                       hf_version =  '2.1.1',
                       type    = "nextgen")

mapview::mapview(subset101)

The same request can be appended to request attribute information to enrich the network. For example, we can grab our precomputed divide attributes and the forcing weights needed for NGIAB.

subset101_enhanced = get_subset(comid = 101, 
           lyrs = c("divides", "nexus", "flowlines", "forcing-weights", "model-attributes"),
           source   = "/Users/mjohnson/hydrofabric",
           hf_version =  '2.1.1',
           type    = "nextgen",
           domain  = "conus")

subset101_enhanced$`model-attributes`

## # A tibble: 12 × 44
##    divide_id  elevation_mean slope_mean impervious_mean aspect_c_mean twi_dist_4
##    <chr>               <dbl>      <dbl>           <dbl>         <dbl> <chr>     
##  1 cat-24308…           75.2      0.467          2.55            163. "[{\"v\":…
##  2 cat-24308…           57.8      0.476          1.75            185. "[{\"v\":…
##  3 cat-24308…           48.4      0.392          0.965           173. "[{\"v\":…
##  4 cat-24308…           42.5      0.268          0.0194          177. "[{\"v\":…
##  5 cat-24308…           40.9      0.273          0.254           109. "[{\"v\":…
##  6 cat-24308…           79.3      0.467          1.22            163. "[{\"v\":…
##  7 cat-24308…           58.2      0.460          1.20            210. "[{\"v\":…
##  8 cat-24308…           50.1      0.672          0.0122          204. "[{\"v\":…
##  9 cat-24308…           61.0      0.402          0.845           147. "[{\"v\":…
## 10 cat-24308…           46.5      0.285          0.364           150. "[{\"v\":…
## 11 cat-24308…           50.1      0.503          0.594           189. "[{\"v\":…
## 12 cat-24308…           75.6      0.508          0.818           208. "[{\"v\":…
## # ℹ 38 more variables: X <dbl>, Y <dbl>, gw_Coeff <dbl>, gw_Zmax <dbl>,
## #   gw_Expon <dbl>, `bexp_soil_layers_stag=1` <dbl>,
## #   `bexp_soil_layers_stag=2` <dbl>, `bexp_soil_layers_stag=3` <dbl>,
## #   `bexp_soil_layers_stag=4` <dbl>, ISLTYP <int>, IVGTYP <int>,
## #   `dksat_soil_layers_stag=1` <dbl>, `dksat_soil_layers_stag=2` <dbl>,
## #   `dksat_soil_layers_stag=3` <dbl>, `dksat_soil_layers_stag=4` <dbl>,
## #   `psisat_soil_layers_stag=1` <dbl>, `psisat_soil_layers_stag=2` <dbl>, …

subset101_enhanced$`forcing-weights`

## # A tibble: 328 × 5
##    divide_id       cell coverage_fraction grid_id                    vpuid
##    <chr>          <dbl>             <dbl> <chr>                      <chr>
##  1 cat-2430835 13282774             0.262 medium_range.forcing.conus 12   
##  2 cat-2430835 13282775             0.603 medium_range.forcing.conus 12   
##  3 cat-2430835 13282776             0.582 medium_range.forcing.conus 12   
##  4 cat-2430835 13282777             0.112 medium_range.forcing.conus 12   
##  5 cat-2430835 13287382             0.145 medium_range.forcing.conus 12   
##  6 cat-2430835 13287383             0.980 medium_range.forcing.conus 12   
##  7 cat-2430835 13287384             1     medium_range.forcing.conus 12   
##  8 cat-2430835 13287385             0.945 medium_range.forcing.conus 12   
##  9 cat-2430835 13287386             0.308 medium_range.forcing.conus 12   
## 10 cat-2430835 13291990             0.397 medium_range.forcing.conus 12   
## # ℹ 318 more rows

REST Service (BETA)

For workflows regardless of programming language, we offer an in-beta REST API. This API wraps the hfsubsetR library to provide the same subsetting capabilities across the web.

Currently, the API offers one endpoint /subset. To query this endpoint, we can use a tool like cURL:

API_URL="https://www.lynker-spatial.com/hydrofabric/hfsubset"
curl -o hydrofabric.gpkg "${API_URL}/subset?identifier=101&identifier_type=comid&version=2.1.1&subset_type=nextgen"

Running the above outputs a GeoPackage subset of the v2.1.1 NextGen hydrofabric, containing the following layers of upstream features from COMID 101:

  layer_name     geometry_type features fields             crs_name
1    divides           Polygon       20     10 NAD83 / Conus Albers
2  flowlines Multi Line String       20     11 NAD83 / Conus Albers
3      nexus             Point        8      6 NAD83 / Conus Albers
4    network                NA       86     18                 <NA>

Additionally, we can also subset forcing weights in two ways:

subset the pre-computed NextGen layer
compute the weight grid on-demand

For (1), we can accomplish this by explicitly setting the layer query parameter:

curl -o hydrofabric.gpkg "${API_URL}/subset?identifier=101&identifier_type=comid&version=2.1.1&subset_type=nextgen&layer=divides&layer=forcing-weights"

For (2), we can set the weights query parameter:

curl -o hydrofabric.gpkg "${API_URL}/subset?identifier=101&identifier_type=comid&version=2.1.1&subset_type=nextgen&layer=divides&weights=medium_range"

CLI Option

For those interested in using the NOAA NextGen fabric, without directly needing R, or within a non-interactive pipeline, we provide pre-built binaries for a Go-based CLI tool that creates and forwards requests to the REST API, preventing the need to construct URLs and use cURL as in the examples above.

The help output for this tool is as follows:

hfsubset - Hydrofabric Subsetter

Usage:
  hfsubset [OPTIONS] identifiers...
  hfsubset (-h | --help)

Examples:
  hfsubset -o ./divides_nexus.gpkg \
           -r "2.2"                \
           -t hl                   \
           "Gages-06752260"

  hfsubset -o ./poudre.gpkg -t hl "Gages-06752260"

  # Using network-linked data index identifiers
  hfsubset -o ./poudre.gpkg -t nldi "nwissite:USGS-08279500"
  
  # Specifying hydrofabric version and subset type
  hfsubset -o ./divides_nexus.gpkg -l divides,flowlines,nexus -r "2.1.1" -s "nextgen" -t hl "Gages-06752260"
  
  # Finding data around a coordinate point
  hfsubset -o ./sacramento_flowlines.gpkg -l flowlines -t xy -121.494400,38.581573

Environment Variables:
  ${HFSUBSET_ENDPOINT} - Endpoint to use for subsetting,
                         defaults to 'https://www.lynker-spatial.com/hydrofabric/hfsubset/'.
                         Note: the endpoint must end with a trailing slash.

Details:
  * Finding POI identifiers can be done visually
    through https://www.lynker-spatial.com/hydrolocations.html

  * When using identifier type 'xy', the coordinates are in OGC:CRS84 order,
    which is the same reference system as EPSG:4326 (WGS84), but uses
    longitude-latitude axis order rather than latitude-longitude.

  * When using identifier type 'nldi', the identifiers follow the syntax

      <featureSource>:<featureID>

    For example, USGS-08279500 is accessed with featureSource 'nwissite',
    so this gives the form 'nwissite:USGS-08279500'

Options:
  -debug
        Run in debug mode
  -dryrun
        Perform a dry run, only outputting the request that will be sent
  -l string
        Comma-delimited list of layers to subset. (default "divides,flowlines,network,nexus")
  -o string
        Output file name (default "hydrofabric.gpkg")
  -quiet
        Disable logging
  -s string
        Hydrofabric type, only "reference" is supported (default "reference")
  -t string
        One of: "hf", "comid", "hl", "poi", "nldi", or "xy" (default "hf")
  -v string
        Hydrofabric version (NOTE: omit the preceeding v) (default "2.2")
  -verify
        Verify that endpoint is available (default true)
  -w string
        Comma-delimited list of weights to generate over the subset.

Mike Johnson

Justin Singh