% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/spm_smooth.R
\name{spm_smooth}
\alias{spm_smooth}
\alias{spm_smooth,sspm_dataset,formula,sspm_discrete_boundary-method}
\title{Smooth a variable in a sspm dataset}
\usage{
spm_smooth(
  sspm_object,
  formula,
  boundaries,
  keep_fit = TRUE,
  predict = TRUE,
  ...
)

\S4method{spm_smooth}{sspm_dataset,formula,sspm_discrete_boundary}(
  sspm_object,
  formula,
  boundaries,
  keep_fit = TRUE,
  predict = TRUE,
  ...
)
}
\arguments{
\item{sspm_object}{\strong{[sspm_dataset]} An object of class
\link[=sspm_dataset-class]{sspm_dataset}.}

\item{formula}{\strong{[formula]} A formula definition of the form
response ~ smoothing_terms + ...}

\item{boundaries}{\strong{[sspm_boundary]} An object of class
\link[=sspm_boundary-class]{sspm_discrete_boundary}.}

\item{keep_fit}{\strong{[logical]} Whether or not to keep the fitted values and
model (default to TRUE, set to FALSE to reduce memory footprint).}

\item{predict}{\strong{[logical]} Whether or not to generate the smoothed
predictions (necessary to fit the final SPM model, default to TRUE).}

\item{...}{
  Arguments passed on to \code{\link[mgcv:bam]{mgcv::bam}}
  \describe{
    \item{\code{family}}{
This is a family object specifying the distribution and link to use in
fitting etc. See \code{\link{glm}} and \code{\link{family}} for more
details. The extended families listed in \code{\link[mgcv]{family.mgcv}} can also be used.
}
    \item{\code{data}}{ A data frame or list containing the model response variable and 
covariates required by the formula. By default the variables are taken 
from \code{environment(formula)}: typically the environment from 
which \code{gam} is called.}
    \item{\code{weights}}{  prior weights on the contribution of the data to the log likelihood. Note that a weight of 2, for example, 
                is equivalent to having made exactly the same observation twice. If you want to reweight the contributions 
                of each datum without changing the overall magnitude of the log likelihood, then you should normalize the weights
                (e.g. \code{weights <- weights/mean(weights)}).}
    \item{\code{subset}}{ an optional vector specifying a subset of observations to be
          used in the fitting process.}
    \item{\code{na.action}}{ a function which indicates what should happen when the data
          contain `NA's.  The default is set by the `na.action' setting
          of `options', and is `na.fail' if that is unset.  The
          ``factory-fresh'' default is `na.omit'.}
    \item{\code{offset}}{Can be used to supply a model offset for use in fitting. Note
that this offset will always be completely ignored when predicting, unlike an offset 
included in \code{formula} (this used to conform to the behaviour of
\code{lm} and \code{glm}).}
    \item{\code{method}}{The smoothing parameter estimation method. \code{"GCV.Cp"} to use GCV for unknown scale parameter and
Mallows' Cp/UBRE/AIC for known scale. \code{"GACV.Cp"} is equivalent, but using GACV in place of GCV. \code{"REML"} 
for REML estimation, including of unknown scale, \code{"P-REML"} for REML estimation, but using a Pearson estimate 
of the scale. \code{"ML"} and \code{"P-ML"} are similar, but using maximum likelihood in place of REML. Default 
\code{"fREML"} uses fast REML computation.}
    \item{\code{control}}{A list of fit control parameters to replace defaults returned by 
\code{\link[mgcv]{gam.control}}. Any control parameters not supplied stay at their default values.}
    \item{\code{select}}{Should selection penalties be added to the smooth effects, so that they can in principle be 
penalized out of the model? See \code{gamma} to increase penalization.  Has the side effect that smooths no longer have a fixed effect component (improper prior from a Bayesian perspective) allowing REML comparison of models with the same fixed effect structure. 
}
    \item{\code{scale}}{ If this is positive then it is taken as the known scale parameter. Negative signals that the 
scale paraemter is unknown. 0 signals that the scale parameter is 1  for Poisson and binomial and unknown otherwise. 
Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases.    
}
    \item{\code{gamma}}{Increase above 1 to force smoother fits. \code{gamma} is used to multiply the effective degrees of freedom in the GCV/UBRE/AIC score (so \code{log(n)/2} is BIC like). \code{n/gamma} can be viewed as an effective sample size, which allows it to play a similar role for RE/ML smoothing parameter estimation.}
    \item{\code{knots}}{this is an optional list containing user specified knot values to be used for basis construction. 
For most bases the user simply supplies the knots to be used, which must match up with the \code{k} value
supplied (note that the number of knots is not always just \code{k}). 
See \code{\link[mgcv]{tprs}} for what happens in the \code{"tp"/"ts"} case. 
Different terms can use different numbers of knots, unless they share a covariate.
}
    \item{\code{sp}}{A vector of smoothing parameters can be provided here.
 Smoothing parameters must be supplied in the order that the smooth terms appear in the model 
formula. Negative elements indicate that the parameter should be estimated, and hence a mixture 
of fixed and estimated parameters is possible. If smooths share smoothing parameters then \code{length(sp)} 
must correspond to the number of underlying smoothing parameters.}
    \item{\code{min.sp}}{Lower bounds can be supplied for the smoothing parameters. Note
that if this option is used then the smoothing parameters \code{full.sp}, in the 
returned object, will need to be added to what is supplied here to get the 
 smoothing parameters actually multiplying the penalties. \code{length(min.sp)} should 
always be the same as the total number of penalties (so it may be longer than \code{sp},
if smooths share smoothing parameters).}
    \item{\code{paraPen}}{optional list specifying any penalties to be applied to parametric model terms. 
\code{\link[mgcv]{gam.models}} explains more.}
    \item{\code{chunk.size}}{The model matrix is created in chunks of this size, rather than ever being formed whole. 
Reset to \code{4*p} if \code{chunk.size < 4*p} where \code{p} is the number of coefficients.}
    \item{\code{rho}}{An AR1 error model can be used for the residuals (based on dataframe order), of Gaussian-identity 
           link models. This is the AR1 correlation parameter. Standardized residuals (approximately 
           uncorrelated under correct model) returned in 
           \code{std.rsd} if non zero. Also usable with other models when \code{discrete=TRUE}, in which case the AR model
           is applied to the working residuals and corresponds to a GEE approximation.}
    \item{\code{AR.start}}{logical variable of same length as data, \code{TRUE} at first observation of an independent
section of AR1 correlation. Very first observation in data frame does not need this. If \code{NULL} then 
there are no breaks in AR1 correlaion.}
    \item{\code{discrete}}{with \code{method="fREML"} it is possible to discretize covariates for storage and efficiency reasons.
If \code{discrete} is \code{TRUE}, a number or a vector of numbers for each smoother term, then discretization happens. If numbers are supplied they give the number of discretization bins.}
    \item{\code{cluster}}{\code{bam} can compute the computationally dominant QR decomposition in parallel using \link[parallel:clusterApply]{parLapply}
from the \code{parallel} package, if it is supplied with a cluster on which to do this (a cluster here can be some cores of a 
single machine). See details and example code. 
}
    \item{\code{nthreads}}{Number of threads to use for non-cluster computation (e.g. combining results from cluster nodes).
If \code{NA} set to \code{max(1,length(cluster))}. See details.}
    \item{\code{gc.level}}{to keep the memory footprint down, it can help to call the garbage collector often, but this takes 
a substatial amount of time. Setting this to zero means that garbage collection only happens when R decides it should. Setting to 2 gives frequent garbage collection. 1 is in between. Not as much of a problem as it used to be.
}
    \item{\code{use.chol}}{By default \code{bam} uses a very stable QR update approach to obtaining the QR decomposition
of the model matrix. For well conditioned models an alternative accumulates the crossproduct of the model matrix
and then finds its Choleski decomposition, at the end. This is somewhat more efficient, computationally.}
    \item{\code{samfrac}}{For very large sample size Generalized additive models the number of iterations needed for the model fit can 
be reduced by first fitting a model to a random sample of the data, and using the results to supply starting values. This initial fit is run with sloppy convergence tolerances, so is typically very low cost. \code{samfrac} is the sampling fraction to use. 0.1 is often reasonable. }
    \item{\code{coef}}{initial values for model coefficients}
    \item{\code{drop.unused.levels}}{by default unused levels are dropped from factors before fitting. For some smooths 
involving factor variables you might want to turn this off. Only do so if you know what you are doing.}
    \item{\code{G}}{if not \code{NULL} then this should be the object returned by a previous call to \code{bam} with 
\code{fit=FALSE}. Causes all other arguments to be ignored except \code{sp}, \code{chunk.size}, \code{gamma},\code{nthreads}, \code{cluster}, \code{rho}, \code{gc.level}, \code{samfrac}, \code{use.chol}, \code{method} and \code{scale} (if >0).}
    \item{\code{fit}}{if \code{FALSE} then the model is set up for fitting but not estimated, and an object is returned, suitable for passing as the \code{G} argument to \code{bam}.}
    \item{\code{drop.intercept}}{Set to \code{TRUE} to force the model to really not have the a constant in the parametric model part,
even with factor variables present.}
  }}
}
\value{
An updated \link[=sspm_dataset-class]{sspm_dataset}.
}
\description{
With a formula, smooth a variable in a sspm dataset. See Details for
more explanations.
}
\details{
This functions allows to specify a model formula for a given discrete sspm
object. The formula makes use of specific smoothing terms \code{smooth_time()},
\code{smooth_space()}, \code{smooth_space_time()}. The formula can also contain fixed
effects and custom smooths, and can make use of specific smoothing terms
\code{smooth_time()}, \code{smooth_space()}, \code{smooth_space_time()}.
}
\examples{
\dontrun{
biomass_smooth <- biomass_dataset \%>\%
    spm_smooth(weight_per_km2 ~ sfa + smooth_time(by = sfa) +
               smooth_space() +
               smooth_space_time(),
               boundaries = bounds_voronoi,
               family = tw)
}

}
