Generalized Additive Models

GAMs/GAMMs handle nonlinear relationships and can include random effects (e.g., site or tree identity) to account for hierarchical structures and temporal or spatial dependencies, making them well-suited for modeling complex dendrochronological data. in growthTrendR package, a suite of GAM/GAMM models has been implemented to accommodate different types of datasets. Here, we use a single model, gamm_spatial, to demonstrate how to generate a fitting and diagnostic model report from raw data.

prepare data for model:



# loading processed ring measurement
dt.samples_trt <- readRDS(system.file("extdata", "dt.samples_trt.rds", package = "growthTrendR"))

# climate
dt.clim <- fread(system.file("extdata", "dt.clim.csv", package = "growthTrendR"))

# merge data
dt.samples_clim <- merge(dt.samples_trt$tr_all_wide[, c("uid_site", "site_id","latitude", "longitude",  "species", "uid_tree", "uid_radius")], dt.samples_trt$tr_all_long$tr_7_ring_widths, by = "uid_radius")

# # Calculate BAI
dt.samples_clim <- calc_bai(dt.samples_clim)

dt.samples_clim <- merge(dt.samples_clim, dt.clim, by = c("site_id", "year"))

fitting model

This example uses gamm_spatial; other functions with the same arguments (gamm_radius, gamm_site, bam_spatial, gam_mod) may be used depending on the data and analysis goals.

setorder(dt.samples_clim, uid_tree, year)

# Remove ageC == 1 prior to fitting log-scale models.
dt.samples_clim <- dt.samples_clim[ageC > 1]
m.sp <- gamm_spatial(data = dt.samples_clim, resp_scale = "resp_log",
                     m.candidates =c( "bai_cm2 ~ log(ba_cm2_t_1) + s(ageC) + s(FFD)",
                                      "bai_cm2 ~ log(ba_cm2_t_1) + s(ageC) + FFD")
)

arguments

resp_scale The function provides three options for specifying the response variable, and the user must choose the one that best suits their modelling purpose:

“resp_gaussian”: the response variable is used on its original scale and is modelled under a Gaussian distribution with an identity link (no transformation applied).

“resp_log”: the response variable is log-transformed prior to modelling. The transformed response is then assumed to follow a Gaussian distribution and is fitted using an identity link.

“resp_gamma”: the response variable is kept on its original scale, and the model is fitted under a Gamma distribution with a log link, appropriate for strictly positive and right-skewed data.

m.candidates

The list of all candidate equations. Note that the response variable is kept on its original scale in all cases, even when using the option “resp_log”.

model summary and diagnostic

# generate_report(robj = m.sp)
gam_model <- m.sp$model$gam

# summary of the model
  summary(gam_model)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> log(bai_cm2) ~ log(ba_cm2_t_1) + s(ageC) + FFD + s(uid_site.fac, 
#>     bs = "re")
#> 
#> Parametric coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     -0.240826   0.343068  -0.702   0.4834    
#> log(ba_cm2_t_1)  0.540556   0.078397   6.895 5.04e-11 ***
#> FFD             -0.003385   0.001400  -2.418   0.0164 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                   edf Ref.df     F  p-value    
#> s(ageC)         5.885  5.885 4.194 0.000709 ***
#> s(uid_site.fac) 1.768  2.000 5.625 0.001394 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.924   
#>   Scale est. = 0.069672  n = 243
# smooth term importance
 term_important <- sterm_imp( gam_model)
 print(term_important)
#>               term importance_pct method
#>             <char>          <num> <char>
#> 1:         s(ageC)           59.7    ssq
#> 2: s(uid_site.fac)           40.3    ssq
#