Network Manipulation
Mike Johnson
Lynker, NOAA-AffiliateDavid Blodgett
USGS WMASource:
vignettes/03-processing-deep-dive.Rmd
      03-processing-deep-dive.RmdGetting the reference fabric
All reference and (precomputed) refactored data products live on
ScienceBase. They can be accessed with the web interface or can be
downloaded programatically. The hydrofab::get_hydrofabric()
utility will download the most current geofabric for a Vector Processing
Unit (VPU). Options include downloading the “refactored” (default) or
“reference” data. If the requested file already exists, the file path
will be returned.
Example
- To exemplify this process we do the following:
- Define a terminal location of interest
- Subset the upstream reference fabric using
hydrofab::subset_reference
 
library(hydrofabric)
using_local_example = "/Users/mjohnson/hydrofabric"
# Define starting feature by source and ID
(gage = list(featureSource = "nwis", featureID = "06752260"))
# Use subset_network to build a reference subset
hf = hfsubsetR::get_subset(nldi_feature = gage, 
                           source  = using_local_example, 
                           outfile = "tutorial/poudre.gpkg",
                           overwrite = FALSE)
d = read_hydrofabric("tutorial/poudre.gpkg")
pois = open_dataset(glue("{using_local_example}/v2.2/conus_hl")) %>% 
  filter(hf_id %in% d$flowpaths$id, hl_source == 'GFv20') %>% 
  collect() %>% 
  st_as_sf(coords = c("X", "Y"), crs = 5070)1) Running the Process: Refactoring
Concept
- 
Refactoring is a geoprocessing workflow that seeks to - Split large or long catchments into a more uniform catchment size distribution and
- collapse catchment topology to eliminate small catchments
 
- The key is that no network resolution is lost! That means the total path length of the network going in, is what comes out. 
- The workflow is can be parameterized using three primary values: 
| Parameter | Purpose | Elected Value | 
|---|---|---|
| split_flines_meters | the maximum length flowpath desired in the output. | 10,000 | 
| collapse_flines_meters | the minimum length of inter-confluence flowpath desired in the output. | 1,000 | 
| collapse_flines_main_meters | the minimum length of between-confluence flowpaths. | 1,000 | 
You might also have areas where you want to avoid, or, enforce a splitting event. These can be defined with the following values:
| Parameter | Purpose | 
|---|---|
| exclude_cats | integer vector of COMIDs to be excluded from collapse modifications. | 
| events | data.frame containing events as generated by get_flowline_index() | 
- POI definition and selection is model application specific. Here, we will ignore this aspect. 
- With only information about the network, refactor can only refactor the flowpath network. 
- In order to reconcile the catchment network,a set of flow accumulation (FAC) and flow direction (FDR) grids must be provided. 
- For the reference fabric (e.g. NHDPlusV2), we supply a national VRT for each of these that can be accessed at: - s3://nextgen-hydrofabric/DEM-products/{product}.vrt
- These (and other gridded products) can be found here 
Example
refactored = refactor(hf,
                      split_flines_meters = 10000, 
                      collapse_flines_meters = 1000, 
                      collapse_flines_main_meters = 1000,
                      pois = pois,
                      fac = '/vsis3/lynker-spatial/gridded-resources/fac.vrt',
                      fdr = '/vsis3/lynker-spatial/gridded-resources/fdr.vrt',
                      outfile = "tutorial/refactor.gpkg")Outputs
To get a high level understanding of what happened with this “refactor”, we can look at the length distributions:

And the area distributions:

Lastly, we look at the feature count of the network:

- Finally, we can zoom into a layer of this network to see what changes exist. 
- In the figure below, the white edges represent the - referencecatchment network, while the black edges represent the refactored network
- Since refactoring requires the preservation of the flowpath network, the blue lines are representative of both the reference and refactored network with the caveat they are broken and different places. 

2. Running the Process: Aggregating
- Aggregation is a primarily a divide oriented workflow. It collapses the network to provide a new discretization. 
- 
Two aggregation methods: - To POIs - you define network outlets (NHM and SPARROW)
- To a statistical distribution, with or w/o enforced POIs (NextGen).
 

| Parameter | Purpose | Elected Value | 
|---|---|---|
| ideal_size_sqkm | the maximum length flowpath desired in the output. | 10 | 
| min_length_km | the minimum length of inter-confluence flowpath desired in the output. | 1 | 
| min_area_sqkm | the minimum length of between-confluence flowpaths. | 3 | 
Here, a hydrolocation POINT layer can be passed to help
direct the aggregation, but for simplicity is ignored here:
Example
hydrolocations = read_sf("tutorial/refactored.gpkg", 'lookup_table') %>% 
  inner_join(pois, by = c("NHDPlusV2_COMID" = "hf_id")) %>% 
  select(poi_id, NHDPlusV2_COMID, id = reconciled_ID) %>% 
  distinct()
aggregate_to_distribution(gpkg = "tutorial/refactored.gpkg",
                          hydrolocations = mutate(pois, id = hf_id),
                          ideal_size_sqkm = 10, 
                          min_length_km = 1, 
                          min_area_sqkm = 3, 
                          outfile = "tutorial/aggregated.gpkg", 
                          overwrite = TRUE)Outputs
To get a high level understanding of what happens with this “refactor”, we can look at the length distributions:

And the area distributions:

- Finally, we can zoom into a layer of this network to see what changes exist. 
- In the left-hand figure below, the white edges represent the - referencecatchment network, the black edges represent the refactored network, and the red represnet the aggregated network
- In the right-hand figure, we can see the blue flowlines (from reference and refactor) that were prunned in the aggregation process. 


Lastly, we can look at the cumulative network traits of each fabric:

r = read_hydrofabric("tutorial/aggregated.gpkg")
mapview::mapview(r) + poisFull Run-through
hf = get_subset(nldi_feature = list(featureSource = "nwis", featureID = "06752260"), 
                source = "/Users/mjohnson/hydrofabric",
                outfile = "tutorial/poudre.gpkg") |>
     refactor(fac = '/vsis3/lynker-spatial/gridded-resources/fac.vrt',
              fdr = '/vsis3/lynker-spatial/gridded-resources/fdr.vrt',
              outfile = "tutorial/refactor.gpkg") |>
     aggregate_to_distribution(outfile = "tutorial/aggregate.gpkg")With all network manipulations, fundamental network traits change. This requires the utilities to rapidly and efficiently recompute key network metric. The nhdplusTools package provides the option to regenerate all or some of these on the fly using graph algorithms and logic (see Blodget et al (2023))
We will return to the get_sorted() utility in the
subsetting section
nhdplusTools::add_plus_network_attributes()
nhdplusTools::get_streamorder()
nhdplusTools::calculate_total_drainage_area()
nhdplusTools::get_sorted()
