% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/reprTrees.R
\encoding{UTF-8}
\name{reprTrees}
\alias{reprTrees}
\title{Select and visualize covariate-representative tree roots (CRTRs)}
\usage{
reprTrees(
  object,
  vars = NULL,
  numvars = 5,
  indvars = NULL,
  num.threads = NULL,
  plotit = TRUE,
  highlight_relevant = TRUE,
  box_plots = TRUE,
  density_plots = TRUE,
  scatter_plots = TRUE,
  add_split_line = TRUE,
  verbose = TRUE
)
}
\arguments{
\item{object}{Object of class \code{unityfor}.}

\item{vars}{This is an optional vector of variable names, for which CRTRs should be obtained}

\item{numvars}{The number of the variables with the largest unity VIM values for which CRTRs should be obtained.}

\item{indvars}{The indices of the variables with the largest unity VIM values for which CRTRs should be obtained. For example, if \code{indvars = c(1, 3)}, the CRTRs for the variables with the largest and third-largest unity VIM values are obtained.}

\item{num.threads}{Number of threads to use.
The default is to use at most 2 threads (and at most the number of available CPU cores).
This conservative default avoids unintentionally using many cores on shared computing resources
(e.g., CI systems, servers, or HPC login/compute nodes).

For typical use on a personal computer, setting \code{num.threads = 0} is strongly recommended,
as it uses all available CPU cores, which typically substantially reduces runtime.}

\item{plotit}{Whether or not the CRTRs should be plotted or merely returned (invisibly). Default is \code{TRUE}.}

\item{highlight_relevant}{Whether or not the nodes not containing the top-scoring splits for the variables of interest or their ancestor nodes should be shaded out. Default is \code{TRUE}. See the 'Details' section below for explanation.}

\item{box_plots}{Whether boxplots should be used to show the outcome class-specific distributions of the variables values in the nodes with top-scoring splits (see 'Details' section for explanation). For classification only. Default is \code{TRUE}.}

\item{density_plots}{Whether kernel density plots should be used to show the outcome class-specific distributions of the variable values in the nodes with top-scoring splits (see 'Details' section for explanation). For classification only. Default is \code{TRUE}.}

\item{scatter_plots}{Whether scatter plots should be used to investigate the relationship between the variables of interest and the outcome in the nodes with top-scoring splits (see 'Details' section for explanation). For continuous outcomes only. Default is \code{TRUE}.}

\item{add_split_line}{Whether in the boxplots/density plots/scatter plots a line at the split point of the corresponding node should be drawn. Default is \code{TRUE}.}

\item{verbose}{Verbose output on or off. Default is \code{TRUE}.}
}
\value{
Object of class \code{unityfor.reprTrees} with elements
  \item{\code{rules}}{List. In-bag statistics on the outcome at each node in the CRTRs. For classification, this provides the class frequencies and the numbers of observations representing each class.}
  \item{\code{plots}}{List. Generated ggplot2 plots.}
  \item{\code{var.names}}{Labels of the variables for which CRTRs were selected.} 
  \item{\code{independent.variable.names}}{Names of all independent variables in the dataset.}
  \item{\code{num.independent.variables}}{Number of independent variables in the dataset.}
  \item{\code{num.samples}}{Number of observations in the dataset.}
  \item{\code{treetype}}{Tree type.}
  \item{\code{forest}}{Sub-forest that contains only the CRTRs.}
}
\description{
Implements the algorithm for selecting and visualizing covariate-representative tree roots (CRTRs) as described in Hornung & Hapfelmeier (2026).\cr
CRTRs are tree roots extracted from a unity forest that characterize the conditions under which a given variable exhibits its strongest effect on the outcome. The function selects one representative tree root for each variable and visualizes its structure to facilitate interpretation. CRTRs are essential for analyzing the effects identified by the unity VIM (\code{\link{unityfor}}). See the 'Details' section below for more details.
}
\details{
Further details on the descriptions below are provided in Hornung & Hapfelmeier (2026).

\strong{Covariate-representative tree roots (CRTRs).}
Covariate-representative tree roots (CRTRs) (Hornung & Hapfelmeier, 2026) are tree fragments (or 'tree roots' - the first few splits in the trees) extracted from a fitted unity forest (\code{\link{unityfor}}) that characterize for given variables the conditions under which each variable exerts its strongest influence on the prediction.

Technically, for a given variable, the algorithm identifies tree roots in which this variable attains particularly high split scores (top-scoring splits). From these tree roots, a representative root is extracted (Laabs et al., 2024) that best reflects the conditions under which this variable has its strongest effect.

\strong{Interpretation and subgroup effects.}
If a variable has a strong marginal effect, the corresponding CRTR typically contains a split on this variable at the root node (first split in the tree). In contrast, if a variable has little marginal effect but interacts with another variable, the CRTR may first split on that other variable, thereby defining a subgroup in which the variable of interest exhibits a strong conditional effect.

From a substantive perspective, CRTRs enable the exploration of variable effects that are generally not detectable by conventional methods focusing on marginal associations. In particular, CRTRs can reveal variables that have weak marginal effects but act strongly within specific subgroups defined by interactions with other variables.

\strong{Relation to unity VIM.}
CRTRs are closely related to the unity variable importance measure (unity VIM) (\code{\link{unityfor}}). The unity VIM quantifies the strength of variable effects under the conditions in which they are strongest. Analogously, CRTRs visualize these conditions by displaying the tree structures that give rise to the respective unity VIM values.

Accordingly, the CRTR algorithm can be used to visualize and interpret the effects identified by the unity VIM. By default, CRTRs are constructed and visualized for the five variables with the largest unity VIM values.

\strong{Scope of applicability.}
CRTRs should primarily be examined for variables with sufficiently large unity VIM values. Constructing CRTRs for variables with negligible importance may lead to overinterpretation, as apparent patterns may reflect random structure rather than meaningful effects.

\strong{Shaded regions in the visualization.}
For improved interpretability, parts of the CRTRs are shaded out by default. Specifically, only the nodes containing the top-scoring splits for the variable of interest and their ancestor nodes are shown prominently.

This design is motivated by two considerations. First, the purpose of CRTRs is to depict the conditions under which a variable exhibits its strongest effects - conditions that are defined by the ancestors of the nodes with top-scoring splits. Second, the remaining regions of the tree are of limited interpretive value. Since each CRTR is derived from tree roots selected for strong effects of a specific variable, the splitting patterns along the highlighted paths are specific  for that variable. In contrast, shaded regions reflect arbitrary aspects of the overall association structure in the data and may include splits on non-informative variables, as each tree root is grown from a (small) random subset of all available variables.

Note that additional splits on the variable of interest may occur within shaded regions and can still be relevant. However, these splits do not represent the conditions under which the variable attains its strongest effects.

\strong{In-bag data for top-scoring split visualizations.}
The boxplots/density plots/scatter plots illustrating the discriminatory power of the top-scoring splits are computed exclusively based on the in-bag observations of the corresponding trees. This is consistent with the construction of the CRTRs themselves, which are derived from in-bag data only.

NOTE: The empirical evaluation of the unity forest framework (including the unity forest algorithm, the unity VIM, and covariate-representative tree roots) in Hornung & Hapfelmeier (2026) focused on categorical outcomes. Its performance for continuous outcomes has not yet been systematically investigated. Results for continuous outcomes should therefore be interpreted with appropriate caution.
}
\examples{
\donttest{

## IMPORTANT NOTE on parallelization:
## The default uses at most 2 threads (num.threads) to avoid unintentionally
## using many cores on shared systems.
## However, for typical runs on a personal computer, set num.threads = 0 to 
## use all available CPU cores; this is strongly recommended and can 
## substantially reduce runtime.
## Note: num.threads = 1 is used in the examples to avoid parallel
## execution during package checks.


## Load package:

library("unityForest")



## Categorical outcome:
#######################

## Set seed to make results reproducible:

set.seed(1234)


## Load wine dataset:

data(wine)


## Construct unity forest and calculate unity VIM values:

model <- unityfor(dependent.variable.name = "C", data = wine,
                  importance = "unity", num.trees = 2000, num.threads = 1)

# NOTE: num.trees = 2000 (in the above) would be too small for practical 
# purposes. This quite small number of trees was simply used to keep the
# runtime of the example short.
# The default number of trees is num.trees = 20000.


## Visualize the CRTRs for the five variables with the largest unity VIM
## values:

reprTrees(model, box_plots = FALSE, density_plots = FALSE,
          num.threads = 1)


## Visualize the CRTRs for the variables with the largest and third-largest 
## unity VIM values:

reprTrees(model, indvars = c(2, 3), box_plots = FALSE, density_plots = FALSE,
          num.threads = 1)


## Visualize the CRTRs for the variables with the largest and third-largest 
## unity VIM values, where density plots are shown to visualize the 
## outcome class-specific distributions of the variables values in the 
## nodes with top-scoring splits:

reprTrees(model, indvars = c(2, 3), box_plots = FALSE, density_plots = TRUE,
          num.threads = 1)


## Visualize the CRTRs for the variables with the largest and third-largest 
## unity VIM values, where both density plots and boxplots are shown to 
## visualize the outcome class-specific distributions of the variables values 
## in the top-scoring splits; the split points are not indicated in these
## plots:
ps <- reprTrees(model, indvars = c(2, 3), add_split_line = FALSE, 
                num.threads = 1)


## Save one of the CRTRs with the corresponding density plot:

library("patchwork")
library("ggplot2")

p <- ps$plots[[1]]$tree_plot / ps$plots[[1]]$density_plot +
     patchwork::plot_layout(heights = c(2, 1))
p

# outfile <- file.path(tempdir(), "figure_xy.pdf")
# ggsave(outfile, device = cairo_pdf, plot = p, width = 18, 
#        height = 14)


# Note: The plots can be manipulated with the usual ggplot2 syntax, e.g.:

ps$plots[[1]]$density_plot + xlab("Proline") + labs(title = NULL, y = NULL) +
  theme(
    legend.position = c(0.95, 0.95),
    legend.justification = c(1, 1)
  )




## Continuous outcome:
######################


## Set seed to make results reproducible:

set.seed(1234)


## Load stock dataset:

data(stock)


## Construct unity forest and calculate unity VIM values:

model <- unityfor(dependent.variable.name = "company10", data = stock,
                  importance = "unity", num.trees = 2000, num.threads = 1)

# NOTE: num.trees = 2000 (in the above) would be too small for practical 
# purposes. This quite small number of trees was simply used to keep the
# runtime of the example short.
# The default number of trees is num.trees = 20000.


## Visualize the CRTRs for the variables "company1" und "company7", where
## scatter plots are shown to  visualize the effect of the variables in the 
## top-scoring splits:

ps <- reprTrees(model, vars = c("company1", "company7"), num.threads = 1)


## Visualize the CRTRs for the variables "company1" and "company7" without 
## scatter plots:

reprTrees(model, vars = c("company1", "company7"), scatter_plots = FALSE,
          num.threads = 1)



# As also shown above, the plots can be manipulated with the usual 
# ggplot2 syntax, e.g.:

library("ggplot2")

p <- ps$plots[[1]]$scatter_plot + labs(x = "Stock price of company 1",
                                       y = "Stock price of company 10") + 
  ggtitle("Marginal influence of the stock price of company 1")
p

p$layers[[1]]$aes_params$shape <- 1
p$layers[[1]]$aes_params$colour <- "red"
p

}

}
\references{
\itemize{
  \item Hornung, R., Hapfelmeier, A. (2026). Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests. arXiv:2601.07003, <\doi{10.48550/arXiv.2601.07003}>.
  \item Laabs, B.-H., Westenberger, A., & König, I. R. (2024). Identification of representative trees in random forests based on a new tree-based distance measure. Advances in Data Analysis and Classification 18(2):363-380, <\doi{10.1007/s11634-023-00537-7}>.
  }
}
\seealso{
\code{\link{unityfor}}
}
\author{
Roman Hornung
}
