% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pre.R
\name{pre}
\alias{pre}
\title{Derive a prediction rule ensemble}
\usage{
pre(formula, data, type = "both", weights = rep(1, times = nrow(data)),
  sampfrac = 0.5, seed = 42, maxdepth = 3, learnrate = 0.01,
  removeduplicates = TRUE, maxrules = 2000, mtry = Inf, thres = 1e-07,
  standardize = FALSE, winsfrac = 0.025, normalize = TRUE, nfolds = 10,
  mod.sel.crit = "deviance", verbose = TRUE,
  ctreecontrol = ctree_control(), ...)
}
\arguments{
\item{formula}{a symbolic description of the model to be fit of the form 
\code{y ~ x1 + x2 + ...+ xn}. If the output variable (left-hand side of the 
formala) is a factor, an ensemble for binary classification is created.
Otherwise, an ensemble for prediction of a continuous variable is created. 
Note that input variables may not have 'rule' as (part of) their name, and 
the formula may not exclude the intercept (that is \code{+ 0} or \code{- 1} 
may not be used in the right-hand side of the formula).}

\item{data}{matrix or data.frame containing the variables in the model. When a
matrix is specified, it must be of class \code{"numeric"} (the input and output 
variable must be continuous; the input variables may be 0-1 coded variables). 
When a data.frame is specified, the output variable must be of 
class \code{"numeric"} and must be a continuous variable; the input variables 
must be of class \code{"numeric"} (for continuous input variables), 
\code{"logical"} (for binary variables), \code{"factor"} (for nominal input 
variables with 2 or more levels), or \code{"ordered" "factor"} (for 
ordered input variables).}

\item{type}{character. Type of base learners to be included in ensemble. 
Defaults to "both" (intial ensemble included both rules and linear functions). 
Other option may be "rules" (for prediction rules only) or "linear" (for 
linear functions only).}

\item{weights}{an optional vector of observation weights to be used for 
deriving the ensemble.}

\item{sampfrac}{numeric value greater than 0, and smaller than or equal to 1. 
Fraction of randomly selected training observations used to produce each tree. 
Setting this to values < 1 will result in subsamples being drawn without 
replacement (i.e., subsampling). Setting this equal to 1 will result in 
bootstrap sampling.}

\item{seed}{numeric. Random seed to be used in deriving the final ensemble 
(for reproducability).}

\item{maxdepth}{numeric. Maximal depth of trees to be grown. Defaults to 3,
resulting in trees with max 15 nodes (8 terminal and 7 inner nodes), and 
therefore max 15 rules.}

\item{learnrate}{numeric. Learning rate for sequentially induced trees.}

\item{removeduplicates}{logical. Remove rules from the ensemble which have 
the exact same support in training data?}

\item{maxrules}{numeric. Approximate maximum number of rules to be generated. 
The number of rules in the final ensemble will be smaller, due to the omission 
of rules with identical conditions or support.}

\item{mtry}{numeric. Number of randomly selected predictor variables for 
creating each split in each tree. Ignored for nominal output variables if
\code{learnrate} > 0.}

\item{thres}{numeric. Threshold for convergence.}

\item{standardize}{logical. Standardize rules and linear terms before 
estimating the regression model? As this will also standardize dummy coded
factors, users are adviced to use the default: \code{standardize = FALSE}.}

\item{winsfrac}{numeric. Quantiles of data distribution to be used for 
winsorizing linear terms. If set to 0, no winsorizing is performed. Note 
that ordinal variables are included as linear terms in estimating the
regression model, and will also be winsorized.}

\item{normalize}{logical. Normalize linear variables before estimating the 
regression model? Normalizing gives linear terms the same a priori influence 
as a typical rule.}

\item{nfolds}{numeric. Number of folds to be used in performing cross 
validation for determining penalty parameter.}

\item{mod.sel.crit}{character. Model selection criterion to be used for 
deriving the final ensemble. The default is \code{type.measure = "deviance"}, 
which uses squared-error for gaussian models (a.k.a. \code{type.measure = 
"mse"}). \code{type.measure = "mse"} or \code{type.measure = "mae"} (mean 
absolute error) measure the deviation from the fitted mean to the response.}

\item{verbose}{logical. Should information on the initial and final ensemble 
be printed to the command line?}

\item{ctreecontrol}{A list with control parameters, see 
\code{link[partykit]{ctree_control}}. Ignored for nominal output variables 
when \code{learnrate} > 0.}

\item{...}{Additional arguments to be passed to 
\code{\link[glmnet]{cv.glmnet}}.}
}
\value{
an object of class \code{pre}, which is a list with many elements
}
\description{
\code{pre} derives a sparse ensemble of rules and/or linear functions for 
prediction of a continuous or binary outcome.
}
\details{
Inputs can be continuous, ordered or factor variables. Continuous 
variables
}
\note{
The code for deriving rules from the nodes of trees was taken from an 
internal function of the \code{partykit} package of Achim Zeileis and Torsten 
Hothorn.
}
\examples{
\donttest{
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),])}
}

