Building your own subset
Mike Johnson
Lynker, NOAA-AffiliateJustin Singh
Lynker, NOAA-AffiliateSource:
vignettes/05-subsetting.Rmd
05-subsetting.Rmd
We are now at the point where we have a common understanding of how hydrofabrics are sourced, manipulated for a given modeling task, and generated on a VPU basis.
This aim of this section is that each of us can build a subset network for a location of interest. Doing this requires an understanding of the following:
- The CONUS network
- Finding an Origin
- The R based
hfsubsetR::get
- The CLI utility for micro service subsets.
- RESTFUL API
The CONUS network file
A network file is distributed with each
version
/type
of a hydrofabric. For more on
data access patterns see the Data
Vignette.
local <- "/Users/mjohnson/hydrofabric"
s3 <- "s3://lynker-spatial/hydrofabric"
version <- 'v2.1.1'
type <- "nextgen"
domain <- "conus"
network_path = glue("{local}/{version}/{type}/conus_network")
net = open_dataset(network_path)
In the above schema
you’ll see that every relationship
between the features in the current hydrofabric, the source hydrofabric,
and the conus hydrolocations, have been exploded into a “many-to-many”
table.
For example, we can look at the flowpath wb-1002
and
find that it is defined by the aggregation of NHDPlusV2 COMID 1712220,
1712230, and 1712238.
glimpse(filter(net, id == "wb-1002"))
## FileSystemDataset with 22 Parquet files (query)
## 3 rows x 18 columns
## $ id <string> "wb-1002", "wb-1002", "wb-1002"
## $ toid <string> "nex-1003", "nex-1003", "nex-1003"
## $ divide_id <string> "cat-1002", "cat-1002", "cat-1002"
## $ ds_id <double> NA, NA, NA
## $ mainstem <double> 1816996, 1816996, 1816996
## $ hl_id <string> NA, NA, NA
## $ hydroseq <int32> 3730, 3730, 3730
## $ hl_uri <string> NA, NA, NA
## $ hf_source <string> "NHDPlusV2", "NHDPlusV2", "NHDPlusV2"
## $ hf_id <double> 1712220, 1712230, 1712238
## $ lengthkm <double> 4.787772, 4.787772, 4.787772
## $ areasqkm <double> 6.36435, 6.36435, 6.36435
## $ tot_drainage_areasqkm <double> 706.779, 706.779, 706.779
## $ type <string> "nexus", "nexus", "nexus"
## $ hf_areasqkm <double> 0.7902, 0.1431, 5.4423
## $ hf_hydroseq <int32> 2245165, 2245183, 2245201
## $ topo <string> "fl-nex", "fl-nex", "fl-nex"
## $ vpuid <string> "01", "01", "01"
## Call `print()` for query details
Or, that the terminal outflow of ‘HUC12-010100100101’ occurs at
tnx-1000000569
which is fed by an aggregate flowpath made
up of three source flowpaths
(hf_id={816563, 816417, 816415}
)
glimpse(filter(net, hl_uri == 'HUC12-010100100101'))
## FileSystemDataset with 22 Parquet files (query)
## 3 rows x 18 columns
## $ id <string> "wb-280", "wb-280", "wb-280"
## $ toid <string> "tnx-1000000569", "tnx-1000000569", "tnx-100000…
## $ divide_id <string> "cat-280", "cat-280", "cat-280"
## $ ds_id <double> NA, NA, NA
## $ mainstem <double> 1780414, 1780414, 1780414
## $ hl_id <string> "01236", "01236", "01236"
## $ hydroseq <int32> 18917, 18917, 18917
## $ hl_uri <string> "HUC12-010100100101", "HUC12-010100100101", "HU…
## $ hf_source <string> "NHDPlusV2", "NHDPlusV2", "NHDPlusV2"
## $ hf_id <double> 816563, 816417, 816415
## $ lengthkm <double> 5.135779, 5.135779, 5.135779
## $ areasqkm <double> 13.2813, 13.2813, 13.2813
## $ tot_drainage_areasqkm <double> 51.68745, 51.68745, 51.68745
## $ type <string> "terminal", "terminal", "terminal"
## $ hf_areasqkm <double> 9.3258, 1.9071, 2.0484
## $ hf_hydroseq <int32> 2243683, 2243682, 2243689
## $ topo <string> "fl-nex", "fl-nex", "fl-nex"
## $ vpuid <string> "01", "01", "01"
## Call `print()` for query details
Finding an Origin
By known COMID
findOrigin(network_path, comid = 101)
## # A tibble: 1 × 4
## id vpuid topo hydroseq
## <chr> <chr> <chr> <int>
## 1 wb-2430837 12 fl-nex 139
By known ID
findOrigin(network_path, id = 'wb-2430837')
## # A tibble: 1 × 4
## id vpuid topo hydroseq
## <chr> <chr> <chr> <int>
## 1 wb-2430837 12 fl-nex 139
By location (XY)
here = AOI::geocode("National Water Center, Alabama")
findOrigin(network_path, xy = c(here$x, here$y))
## # A tibble: 1 × 4
## id vpuid topo hydroseq
## <chr> <chr> <chr> <int>
## 1 wb-489658 03W fl-nex 4118
By Hydrolocation URI
# For a gage in Calfornia
findOrigin(network_path, hl_uri = "Gages-11123000")
## # A tibble: 1 × 4
## id vpuid topo hydroseq
## <chr> <chr> <chr> <int>
## 1 wb-3314811 18 fl-nex 28597
# For the HUC12 of Atascadero Creek in Santa Barbara:
findOrigin(network_path, hl_uri = "HUC12-180600130201")
## # A tibble: 1 × 4
## id vpuid topo hydroseq
## <chr> <chr> <chr> <int>
## 1 wb-3315273 18 fl-nex 16028
# For the dam on Horsetooth Reservoir
findOrigin(network_path, hl_uri = "NID-CO01659-1")
## # A tibble: 1 × 4
## id vpuid topo hydroseq
## <chr> <chr> <chr> <int>
## 1 wb-1560663 10L fl-nex 50415
By NLDI Feature
# For a gage in Calfornia
findOrigin(network_path,
nldi_feature = list(featureSource = "nwissite",
featureID = "USGS-05428500"))
## # A tibble: 1 × 4
## id vpuid topo hydroseq
## <chr> <chr> <chr> <int>
## 1 wb-1089011 07 fl-nex 33345
hfsubetR
The hfsubsetR
library is core module in the
NOAA-OWP/hydrofabric
suite. It expedites the process of (1)
finding an origin (2) traversing the network and (3) extracting the
relevant features from the requested.
For example, lets get a basic subset of the network upstream of comid=101 complete with divides, nexus locations and flowlines:
subset101 = get_subset(comid = 101,
lyrs = c("divides", "nexus", "flowlines"),
source = "/Users/mjohnson/hydrofabric",
hf_version = '2.1.1',
type = "nextgen")
mapview::mapview(subset101)
The same request can be appended to request attribute information to enrich the network. For example, we can grab our precomputed divide attributes and the forcing weights needed for NGIAB.
subset101_enhanced = get_subset(comid = 101,
lyrs = c("divides", "nexus", "flowlines", "forcing-weights", "model-attributes"),
source = "/Users/mjohnson/hydrofabric",
hf_version = '2.1.1',
type = "nextgen",
domain = "conus")
subset101_enhanced$`model-attributes`
## # A tibble: 12 × 44
## divide_id elevation_mean slope_mean impervious_mean aspect_c_mean twi_dist_4
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 cat-24308… 75.2 0.467 2.55 163. "[{\"v\":…
## 2 cat-24308… 57.8 0.476 1.75 185. "[{\"v\":…
## 3 cat-24308… 48.4 0.392 0.965 173. "[{\"v\":…
## 4 cat-24308… 42.5 0.268 0.0194 177. "[{\"v\":…
## 5 cat-24308… 40.9 0.273 0.254 109. "[{\"v\":…
## 6 cat-24308… 79.3 0.467 1.22 163. "[{\"v\":…
## 7 cat-24308… 58.2 0.460 1.20 210. "[{\"v\":…
## 8 cat-24308… 50.1 0.672 0.0122 204. "[{\"v\":…
## 9 cat-24308… 61.0 0.402 0.845 147. "[{\"v\":…
## 10 cat-24308… 46.5 0.285 0.364 150. "[{\"v\":…
## 11 cat-24308… 50.1 0.503 0.594 189. "[{\"v\":…
## 12 cat-24308… 75.6 0.508 0.818 208. "[{\"v\":…
## # ℹ 38 more variables: X <dbl>, Y <dbl>, gw_Coeff <dbl>, gw_Zmax <dbl>,
## # gw_Expon <dbl>, `bexp_soil_layers_stag=1` <dbl>,
## # `bexp_soil_layers_stag=2` <dbl>, `bexp_soil_layers_stag=3` <dbl>,
## # `bexp_soil_layers_stag=4` <dbl>, ISLTYP <int>, IVGTYP <int>,
## # `dksat_soil_layers_stag=1` <dbl>, `dksat_soil_layers_stag=2` <dbl>,
## # `dksat_soil_layers_stag=3` <dbl>, `dksat_soil_layers_stag=4` <dbl>,
## # `psisat_soil_layers_stag=1` <dbl>, `psisat_soil_layers_stag=2` <dbl>, …
subset101_enhanced$`forcing-weights`
## # A tibble: 328 × 5
## divide_id cell coverage_fraction grid_id vpuid
## <chr> <dbl> <dbl> <chr> <chr>
## 1 cat-2430835 13282774 0.262 medium_range.forcing.conus 12
## 2 cat-2430835 13282775 0.603 medium_range.forcing.conus 12
## 3 cat-2430835 13282776 0.582 medium_range.forcing.conus 12
## 4 cat-2430835 13282777 0.112 medium_range.forcing.conus 12
## 5 cat-2430835 13287382 0.145 medium_range.forcing.conus 12
## 6 cat-2430835 13287383 0.980 medium_range.forcing.conus 12
## 7 cat-2430835 13287384 1 medium_range.forcing.conus 12
## 8 cat-2430835 13287385 0.945 medium_range.forcing.conus 12
## 9 cat-2430835 13287386 0.308 medium_range.forcing.conus 12
## 10 cat-2430835 13291990 0.397 medium_range.forcing.conus 12
## # ℹ 318 more rows
REST Service (BETA)
For workflows regardless of programming language, we offer an in-beta
REST
API. This API wraps the hfsubsetR
library to provide
the same subsetting capabilities across the web.
Currently, the API offers one endpoint /subset
. To query
this endpoint, we can use a tool like cURL:
API_URL="https://www.lynker-spatial.com/hydrofabric/hfsubset"
curl -o hydrofabric.gpkg "${API_URL}/subset?identifier=101&identifier_type=comid&version=2.1.1&subset_type=nextgen"
Running the above outputs a GeoPackage subset of the v2.1.1 NextGen hydrofabric, containing the following layers of upstream features from COMID 101:
layer_name geometry_type features fields crs_name
1 divides Polygon 20 10 NAD83 / Conus Albers
2 flowlines Multi Line String 20 11 NAD83 / Conus Albers
3 nexus Point 8 6 NAD83 / Conus Albers
4 network NA 86 18 <NA>
Additionally, we can also subset forcing weights in two ways:
- subset the pre-computed NextGen layer
- compute the weight grid on-demand
For (1), we can accomplish this by explicitly setting the
layer
query parameter:
curl -o hydrofabric.gpkg "${API_URL}/subset?identifier=101&identifier_type=comid&version=2.1.1&subset_type=nextgen&layer=divides&layer=forcing-weights"
For (2), we can set the weights
query parameter:
CLI Option
For those interested in using the NOAA NextGen fabric, without directly needing R, or within a non-interactive pipeline, we provide pre-built binaries for a Go-based CLI tool that creates and forwards requests to the REST API, preventing the need to construct URLs and use cURL as in the examples above.
The help output for this tool is as follows:
hfsubset - Hydrofabric Subsetter
Usage:
hfsubset [OPTIONS] identifiers...
hfsubset (-h | --help)
Examples:
hfsubset -o ./divides_nexus.gpkg \
-r "2.2" \
-t hl \
"Gages-06752260"
hfsubset -o ./poudre.gpkg -t hl "Gages-06752260"
# Using network-linked data index identifiers
hfsubset -o ./poudre.gpkg -t nldi "nwissite:USGS-08279500"
# Specifying hydrofabric version and subset type
hfsubset -o ./divides_nexus.gpkg -l divides,flowlines,nexus -r "2.1.1" -s "nextgen" -t hl "Gages-06752260"
# Finding data around a coordinate point
hfsubset -o ./sacramento_flowlines.gpkg -l flowlines -t xy -121.494400,38.581573
Environment Variables:
${HFSUBSET_ENDPOINT} - Endpoint to use for subsetting,
defaults to 'https://www.lynker-spatial.com/hydrofabric/hfsubset/'.
Note: the endpoint must end with a trailing slash.
Details:
* Finding POI identifiers can be done visually
through https://www.lynker-spatial.com/hydrolocations.html
* When using identifier type 'xy', the coordinates are in OGC:CRS84 order,
which is the same reference system as EPSG:4326 (WGS84), but uses
longitude-latitude axis order rather than latitude-longitude.
* When using identifier type 'nldi', the identifiers follow the syntax
<featureSource>:<featureID>
For example, USGS-08279500 is accessed with featureSource 'nwissite',
so this gives the form 'nwissite:USGS-08279500'
Options:
-debug
Run in debug mode
-dryrun
Perform a dry run, only outputting the request that will be sent
-l string
Comma-delimited list of layers to subset. (default "divides,flowlines,network,nexus")
-o string
Output file name (default "hydrofabric.gpkg")
-quiet
Disable logging
-s string
Hydrofabric type, only "reference" is supported (default "reference")
-t string
One of: "hf", "comid", "hl", "poi", "nldi", or "xy" (default "hf")
-v string
Hydrofabric version (NOTE: omit the preceeding v) (default "2.2")
-verify
Verify that endpoint is available (default true)
-w string
Comma-delimited list of weights to generate over the subset.