Process_Description:
Creation of the National Land Cover Database (NLCD) TCC dataset (main process). The NLCD dataset is generated from the FS Science product. The FS Science 2011 TCC dataset was created for the CONUS. For CONUS, 54 tiles were used in a 5x5 moving window where model calibration data was gathered from the moving windows and random forest models were created. The random forest models were applied applied to the center tiles. The final dataset is a mosaic of TCC values for all the moving window tiles.Six major steps were employed to map TCC and produce the NLCD product: 1) collection of reference data, 2) acquisition and/or creation of predictor layers, 3) calibration of random forests regression models for each mapping area using response data and predictor layers, 4) application of those models to predict per-pixel TCC across the entire mapping area, 5) a series of data quality filtering steps to generate the NCLD TCC product, and 6) exporting NLCD images from Google Earth Engine (GEE) to local computers for further post-processing that includes the creation of the CONUS-wide mosaic. The methodology is described further below, in the technical methods document (Housman et al., 2023), and in an upcoming manuscript in preparation (Heyer et al., 2023). For the NLCD product, additional post-processing steps were performed.Step 1: Reference data, consisting of estimated TCC at each of the 63,010 FIA plot locations, were generated via aerial image interpretation of high spatial resolution images collected and supplied by the U.S. Forest Service Forest Inventory and Analysis (FIA) program. The spatial distribution of the sample points follows the FIA systematic grid (Brand et al. 2000). Low quality FIA PI observations were removed for a total of 55,242 FIA plots used in modelingStep 2: Predictor layers include two sets of LandTrendr fitted images spectral derivatives. Set 1 (no Landsat 7 data after 2002) includes all optical bands and indices. Set 2 (includes all Landsat 7 data through 2015) excludes Landsat 7 visible bands to avoid stripping artifacts. Other predictor layers include a binary agriculture layer (1=agriculture, 0 = non-agriculture), elevation data, and terrain derivatives (slope, aspect, sine of aspect, cosine of aspect). The processes for creating the derived layers are described separately (see related Process Steps).Step 3: For each 480 km x 480 km moving window tile, a random forest model was built from 2011 response and predictor data that fell over a 5x5 tile neighborhood for that tile. For each model, the variable selection R package VSURF (Genuer et al., 2015) was used to determine the number of variables to randomly sample at tree splits (mtry). Models were generated locally using the random forest regression algorithm "sklearn.ensemble.RandomForestRegressor" from the Scikit-Learn package in python (Pedregosa et al. 2011).Step 4: In GEE, models were applied to each tile for CONUS, producing a 2-layered Science image. The first layer was the random forests mean predicted TCC value and the second layer was the standard error, which is the per-pixel standard error of the random forests regression predictions from the individual regression trees.Step 5: From the Science TCC product the NLCD TCC product was generated following a series of post-processing steps, including various masking of non-treed pixels, a minimum-mapping unit (MMU) to reduce single pixel speckle, and a process to reduce interannual noise. For masking, a three-year moving window tree mask was produced from the Landscape Change and Monitoring System (LCMS) landcover product tree classes (Housman et al., 2022). A three-year moving window ensured TCC predictions in forested pixels were used. Next, the annual Crop Data Layer (CDL) (USDA National Agricultural Statistics Service Cropland Data Layer, 2007-2022) and the NLCD water layers from 2011, 2013, 2016 and 2019 (Dewitz and U.S. Geological Survey, 2021) were used to mask non-treed agricultural crops and water from the three-year moving window LCMS tree masks. To reduce single pixel speckle a one way (pixels can be converted from tree to non-tree but not visa versa) MMU was then applied to the LCMS tree masks outside of urban areas. The MMU-updated treed pixels (less than 4 pixels) surrounded by non-treed pixels to non-treed pixels. In order to avoid masking highly fragmented tree cover common over urban areas, a separate urban tree mask was produced. The urban TCC mask includes the TIGER U.S. Census Block 2018 data, LCMS land use developed data, and statistic that normalized the expected error, which we refer to as tau (Coulston et al., 2016), calculated for each CONUS 5x5 tile moving window processing area. The TIGER and LCMS developed data were used to separate urban TCC from non-urban TCC. The tau statistic at the 87 percentile percent confidence level (or quantile) was used to threshold the TCC values. If a TCC value subtracted from the tau multiplied by the standard error value was less than 0, the TCC value was changed to 0. The final urban TCC mask was the combination of the TIGER, LCMS land use developed data and tau thresholded mask. The LCMS tree mask and urban TCC masks were applied to annual TCC images to produce the NLCD TCC v2021-4 product. For each image, the non-area processing value is 254, and the background value is 255.